summaryrefslogtreecommitdiff
diff options
context:
space:
mode:
-rw-r--r--SI/Resource/Data Science/Machine Learning/Contents/Bias and Variance.md45
1 files changed, 24 insertions, 21 deletions
diff --git a/SI/Resource/Data Science/Machine Learning/Contents/Bias and Variance.md b/SI/Resource/Data Science/Machine Learning/Contents/Bias and Variance.md
index 1a59925..294f138 100644
--- a/SI/Resource/Data Science/Machine Learning/Contents/Bias and Variance.md
+++ b/SI/Resource/Data Science/Machine Learning/Contents/Bias and Variance.md
@@ -1,15 +1,18 @@
---
-id: 2023-12-18
-aliases: December 18, 2023
+id: "2023-12-18"
+aliases:
+ - December 18, 2023
+ - Bias
tags:
-- link-note
-- Data-Science
-- Machine-Learning
-- Bias-and-Variance
+ - link-note
+ - Data-Science
+ - Machine-Learning
+ - Bias-and-Variance
---
+
# Bias
-## Training Data (80~90%) vs. Test Data (10~20%)
+## Training Data (80-90%) vs. Test Data (10-20%)
## Complexity
@@ -24,21 +27,21 @@ tags:
- $\bbox[teal,5px,border:2px solid red] { MSE (\hat{\theta}) = E((\hat{\theta} - E(\hat{\theta}))^2) + (E(\hat{\theta}) - \theta)^2 = Var_{\theta}(\hat{\theta}) + Bias_{\theta}(\hat{\theta}, \theta)^2 }$
- Bias: under-fitting
- Variance: over-fitting
-![[Pasted image 20231218005054.png]]
-![[Pasted image 20231218005035.png]]
+ ![[Pasted image 20231218005054.png]]
+ ![[Pasted image 20231218005035.png]]
## Trade-off
- Solution
- - Use validation data set
- - $\bbox[teal,5px,border:2px solid red]{\text{Train data (80\%)+ Valid data (10\%) + Test data (10\%)}}$
- - Cannot directly participate in model training
- - Continuously evaluates in the learning base, and stores the best existing performance
- - K-fold cross validation
- - **Leave-One-Out Cross-Validation (LOOCV)**
- - a special case of k-fold cross-validation where **K** is equal to the number of data points in the dataset.
- - What if **K** becomes bigger?
- 1. train data $\uparrow$
- 2. bias error $\downarrow$ and variance error $\uparrow$
- 3. cost $\uparrow$
- - [[Regularization]] loss function
+ - Use validation data set
+ - $\bbox[teal,5px,border:2px solid red]{\text{Train data (80\%)+ Valid data (10\%) + Test data (10\%)}}$
+ - Cannot directly participate in model training
+ - Continuously evaluates in the learning base, and stores the best existing performance
+ - K-fold cross validation
+ - **Leave-One-Out Cross-Validation (LOOCV)**
+ - a special case of k-fold cross-validation where **K** is equal to the number of data points in the dataset.
+ - What if **K** becomes bigger?
+ 1. train data $\uparrow$
+ 2. bias error $\downarrow$ and variance error $\uparrow$
+ 3. cost $\uparrow$
+ - [[Regularization]] loss function