--- id: "2023-12-18" aliases: - December 18, 2023 - Bias tags: - link-note - Data-Science - Machine-Learning - Bias-and-Variance --- # Bias ## Training Data (80-90%) vs. Test Data (10-20%) ## Complexity - Complexity increases from linear to non-linear models - Under-fitting: occurs when there is a lot of data - Over-fitting: occurs when there is little data ## Bias and Variance - Bias and variance are both types of error in an algorithm. - $\begin{align} MSE (\hat{\theta}) \equiv E_{\theta} ((\hat{\theta} - \theta)^2) & = E((\hat{\theta} - E(\hat{\theta}) + E(\hat{\theta} - \theta)^2) \\ & = E((\hat{\theta} - E(\hat{\theta}))^2 + 2((\hat{\theta} - E(\hat{\theta}))(E(\hat{\theta}) - \theta)) + (E(\hat{\theta}) - \theta)^2) \\ & = E((\hat{\theta} - E(\hat{\theta}))^2) + 2(E(\hat{\theta}) - \theta)E(\hat{\theta} - E(\hat{\theta})) + (E(\hat{\theta}) - \theta)^2 \\ & = E((\hat{\theta} - E(\hat{\theta}))^2) + (E(\hat{\theta}) - \theta)^2 \\ & = Var_{\theta}(\hat{\theta}) + Bias_{\theta}(\hat{\theta}, \theta)^2 \end{align}$ - $\bbox[teal,5px,border:2px solid red] { MSE (\hat{\theta}) = E((\hat{\theta} - E(\hat{\theta}))^2) + (E(\hat{\theta}) - \theta)^2 = Var_{\theta}(\hat{\theta}) + Bias_{\theta}(\hat{\theta}, \theta)^2 }$ - Bias: under-fitting - Variance: over-fitting ![[Pasted image 20231218005054.png]] ![[Pasted image 20231218005035.png]] ## Trade-off - Solution - Use validation data set - $\bbox[teal,5px,border:2px solid red]{\text{Train data (80\%)+ Valid data (10\%) + Test data (10\%)}}$ - Cannot directly participate in model training - Continuously evaluates in the learning base, and stores the best existing performance - K-fold cross validation - **Leave-One-Out Cross-Validation (LOOCV)** - a special case of k-fold cross-validation where **K** is equal to the number of data points in the dataset. - What if **K** becomes bigger? 1. train data $\uparrow$ 2. bias error $\downarrow$ and variance error $\uparrow$ 3. cost $\uparrow$ - [[Regularization]] loss function