From 4d53fa14ee0cd615444aca6f6ba176e0ccc1b5be Mon Sep 17 00:00:00 2001
From: TheSiahxyz <164138827+TheSiahxyz@users.noreply.github.com>
Date: Mon, 29 Apr 2024 22:06:12 -0400
Subject: init

---
 .../Machine Learning/Contents/Bias and Variance.md | 45 ++++++++++++++++++++++
 1 file changed, 45 insertions(+)
 create mode 100644 SI/Resource/Data Science/Machine Learning/Contents/Bias and Variance.md

(limited to 'SI/Resource/Data Science/Machine Learning/Contents/Bias and Variance.md')

diff --git a/SI/Resource/Data Science/Machine Learning/Contents/Bias and Variance.md b/SI/Resource/Data Science/Machine Learning/Contents/Bias and Variance.md
new file mode 100644
index 0000000..938bf62
--- /dev/null
+++ b/SI/Resource/Data Science/Machine Learning/Contents/Bias and Variance.md	
@@ -0,0 +1,45 @@
+---
+id: 2023-12-18
+aliases: December 18, 2023
+tags:
+- link-note 
+- Data-Science
+- Machine-Learning
+- Bias-and-Variance 
+---
+
+# Bias
+
+## Training Data (80~90%) vs. Test Data (10~20%)
+
+## Complexity
+
+- Complexity increases from linear to non-linear models
+- Under-fitting: occurs when there is a lot of data
+- Over-fitting: occurs when there is little data
+
+## Bias and Variance
+
+- Bias and variance are both types of error in an algorithm.
+- $\begin{align} MSE (\hat{\theta}) \equiv E_{\theta} ((\hat{\theta} - \theta)^2) & = E((\hat{\theta} - E(\hat{\theta}) + E(\hat{\theta} - \theta)^2) \\ & = E((\hat{\theta} - E(\hat{\theta}))^2 + 2((\hat{\theta} - E(\hat{\theta}))(E(\hat{\theta}) - \theta)) + (E(\hat{\theta}) - \theta)^2) \\ & = E((\hat{\theta} - E(\hat{\theta}))^2) + 2(E(\hat{\theta}) - \theta)E(\hat{\theta} - E(\hat{\theta})) + (E(\hat{\theta}) - \theta)^2 \\ & = E((\hat{\theta} - E(\hat{\theta}))^2) + (E(\hat{\theta}) - \theta)^2 \\ &  = Var_{\theta}(\hat{\theta}) + Bias_{\theta}(\hat{\theta}, \theta)^2  \end{align}$
+- $\bbox[teal,5px,border:2px solid red] { MSE (\hat{\theta}) = E((\hat{\theta} - E(\hat{\theta}))^2) + (E(\hat{\theta}) - \theta)^2 = Var_{\theta}(\hat{\theta}) + Bias_{\theta}(\hat{\theta}, \theta)^2 }$
+- Bias: under-fitting
+- Variance: over-fitting
+![[Pasted image 20231218005054.png]]
+![[Pasted image 20231218005035.png]]
+
+## Trade-off
+
+- Solution
+	- Use validation data set
+		- $\bbox[teal,5px,border:2px solid red]{\text{Train data (80\%)+ Valid data (10\%) + Test data (10\%)}}$
+		- Cannot directly participate in model training
+		- Continuously evaluates in the learning base, and stores the best existing performance
+	- K-fold cross validation
+		- **Leave-One-Out Cross-Validation (LOOCV)**
+			- a special case of k-fold cross-validation where **K** is equal to the number of data points in the dataset.
+		- What if **K** becomes bigger?
+		  1. train data $\uparrow$
+		  2. bias error $\downarrow$ and variance error $\uparrow$
+		  3. cost $\uparrow$
+	- [[Regularization]] loss function
\ No newline at end of file
-- 
cgit v1.2.3