summaryrefslogtreecommitdiff
path: root/SI/Resource/Data Science/Machine Learning/Contents/Bias and Variance.md
diff options
context:
space:
mode:
Diffstat (limited to 'SI/Resource/Data Science/Machine Learning/Contents/Bias and Variance.md')
-rw-r--r--SI/Resource/Data Science/Machine Learning/Contents/Bias and Variance.md45
1 files changed, 45 insertions, 0 deletions
diff --git a/SI/Resource/Data Science/Machine Learning/Contents/Bias and Variance.md b/SI/Resource/Data Science/Machine Learning/Contents/Bias and Variance.md
new file mode 100644
index 0000000..938bf62
--- /dev/null
+++ b/SI/Resource/Data Science/Machine Learning/Contents/Bias and Variance.md
@@ -0,0 +1,45 @@
+---
+id: 2023-12-18
+aliases: December 18, 2023
+tags:
+- link-note
+- Data-Science
+- Machine-Learning
+- Bias-and-Variance
+---
+
+# Bias
+
+## Training Data (80~90%) vs. Test Data (10~20%)
+
+## Complexity
+
+- Complexity increases from linear to non-linear models
+- Under-fitting: occurs when there is a lot of data
+- Over-fitting: occurs when there is little data
+
+## Bias and Variance
+
+- Bias and variance are both types of error in an algorithm.
+- $\begin{align} MSE (\hat{\theta}) \equiv E_{\theta} ((\hat{\theta} - \theta)^2) & = E((\hat{\theta} - E(\hat{\theta}) + E(\hat{\theta} - \theta)^2) \\ & = E((\hat{\theta} - E(\hat{\theta}))^2 + 2((\hat{\theta} - E(\hat{\theta}))(E(\hat{\theta}) - \theta)) + (E(\hat{\theta}) - \theta)^2) \\ & = E((\hat{\theta} - E(\hat{\theta}))^2) + 2(E(\hat{\theta}) - \theta)E(\hat{\theta} - E(\hat{\theta})) + (E(\hat{\theta}) - \theta)^2 \\ & = E((\hat{\theta} - E(\hat{\theta}))^2) + (E(\hat{\theta}) - \theta)^2 \\ & = Var_{\theta}(\hat{\theta}) + Bias_{\theta}(\hat{\theta}, \theta)^2 \end{align}$
+- $\bbox[teal,5px,border:2px solid red] { MSE (\hat{\theta}) = E((\hat{\theta} - E(\hat{\theta}))^2) + (E(\hat{\theta}) - \theta)^2 = Var_{\theta}(\hat{\theta}) + Bias_{\theta}(\hat{\theta}, \theta)^2 }$
+- Bias: under-fitting
+- Variance: over-fitting
+![[Pasted image 20231218005054.png]]
+![[Pasted image 20231218005035.png]]
+
+## Trade-off
+
+- Solution
+ - Use validation data set
+ - $\bbox[teal,5px,border:2px solid red]{\text{Train data (80\%)+ Valid data (10\%) + Test data (10\%)}}$
+ - Cannot directly participate in model training
+ - Continuously evaluates in the learning base, and stores the best existing performance
+ - K-fold cross validation
+ - **Leave-One-Out Cross-Validation (LOOCV)**
+ - a special case of k-fold cross-validation where **K** is equal to the number of data points in the dataset.
+ - What if **K** becomes bigger?
+ 1. train data $\uparrow$
+ 2. bias error $\downarrow$ and variance error $\uparrow$
+ 3. cost $\uparrow$
+ - [[Regularization]] loss function \ No newline at end of file