diff options
Diffstat (limited to 'SI/Resource/Data Science/Machine Learning/Contents/Bias and Variance.md')
| -rw-r--r-- | SI/Resource/Data Science/Machine Learning/Contents/Bias and Variance.md | 45 |
1 files changed, 24 insertions, 21 deletions
diff --git a/SI/Resource/Data Science/Machine Learning/Contents/Bias and Variance.md b/SI/Resource/Data Science/Machine Learning/Contents/Bias and Variance.md index 1a59925..294f138 100644 --- a/SI/Resource/Data Science/Machine Learning/Contents/Bias and Variance.md +++ b/SI/Resource/Data Science/Machine Learning/Contents/Bias and Variance.md @@ -1,15 +1,18 @@ --- -id: 2023-12-18 -aliases: December 18, 2023 +id: "2023-12-18" +aliases: + - December 18, 2023 + - Bias tags: -- link-note -- Data-Science -- Machine-Learning -- Bias-and-Variance + - link-note + - Data-Science + - Machine-Learning + - Bias-and-Variance --- + # Bias -## Training Data (80~90%) vs. Test Data (10~20%) +## Training Data (80-90%) vs. Test Data (10-20%) ## Complexity @@ -24,21 +27,21 @@ tags: - $\bbox[teal,5px,border:2px solid red] { MSE (\hat{\theta}) = E((\hat{\theta} - E(\hat{\theta}))^2) + (E(\hat{\theta}) - \theta)^2 = Var_{\theta}(\hat{\theta}) + Bias_{\theta}(\hat{\theta}, \theta)^2 }$ - Bias: under-fitting - Variance: over-fitting -![[Pasted image 20231218005054.png]] -![[Pasted image 20231218005035.png]] + ![[Pasted image 20231218005054.png]] + ![[Pasted image 20231218005035.png]] ## Trade-off - Solution - - Use validation data set - - $\bbox[teal,5px,border:2px solid red]{\text{Train data (80\%)+ Valid data (10\%) + Test data (10\%)}}$ - - Cannot directly participate in model training - - Continuously evaluates in the learning base, and stores the best existing performance - - K-fold cross validation - - **Leave-One-Out Cross-Validation (LOOCV)** - - a special case of k-fold cross-validation where **K** is equal to the number of data points in the dataset. - - What if **K** becomes bigger? - 1. train data $\uparrow$ - 2. bias error $\downarrow$ and variance error $\uparrow$ - 3. cost $\uparrow$ - - [[Regularization]] loss function + - Use validation data set + - $\bbox[teal,5px,border:2px solid red]{\text{Train data (80\%)+ Valid data (10\%) + Test data (10\%)}}$ + - Cannot directly participate in model training + - Continuously evaluates in the learning base, and stores the best existing performance + - K-fold cross validation + - **Leave-One-Out Cross-Validation (LOOCV)** + - a special case of k-fold cross-validation where **K** is equal to the number of data points in the dataset. + - What if **K** becomes bigger? + 1. train data $\uparrow$ + 2. bias error $\downarrow$ and variance error $\uparrow$ + 3. cost $\uparrow$ + - [[Regularization]] loss function |
