SI/Resource/Data Science/Machine Learning/Contents/Bias and Variance.md


1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47

---
id: "2023-12-18"
aliases:
  - December 18, 2023
  - Bias
tags:
  - link-note
  - Data-Science
  - Machine-Learning
  - Bias-and-Variance
---

# Bias

## Training Data (80-90%) vs. Test Data (10-20%)

## Complexity

- Complexity increases from linear to non-linear models
- Under-fitting: occurs when there is a lot of data
- Over-fitting: occurs when there is little data

## Bias and Variance

- Bias and variance are both types of error in an algorithm.
- $\begin{align} MSE (\hat{\theta}) \equiv E_{\theta} ((\hat{\theta} - \theta)^2) & = E((\hat{\theta} - E(\hat{\theta}) + E(\hat{\theta} - \theta)^2) \\ & = E((\hat{\theta} - E(\hat{\theta}))^2 + 2((\hat{\theta} - E(\hat{\theta}))(E(\hat{\theta}) - \theta)) + (E(\hat{\theta}) - \theta)^2) \\ & = E((\hat{\theta} - E(\hat{\theta}))^2) + 2(E(\hat{\theta}) - \theta)E(\hat{\theta} - E(\hat{\theta})) + (E(\hat{\theta}) - \theta)^2 \\ & = E((\hat{\theta} - E(\hat{\theta}))^2) + (E(\hat{\theta}) - \theta)^2 \\ &  = Var_{\theta}(\hat{\theta}) + Bias_{\theta}(\hat{\theta}, \theta)^2  \end{align}$
- $\bbox[teal,5px,border:2px solid red] { MSE (\hat{\theta}) = E((\hat{\theta} - E(\hat{\theta}))^2) + (E(\hat{\theta}) - \theta)^2 = Var_{\theta}(\hat{\theta}) + Bias_{\theta}(\hat{\theta}, \theta)^2 }$
- Bias: under-fitting
- Variance: over-fitting
  ![[Pasted image 20231218005054.png]]
  ![[Pasted image 20231218005035.png]]

## Trade-off

- Solution
  - Use validation data set
    - $\bbox[teal,5px,border:2px solid red]{\text{Train data (80\%)+ Valid data (10\%) + Test data (10\%)}}$
    - Cannot directly participate in model training
    - Continuously evaluates in the learning base, and stores the best existing performance
  - K-fold cross validation
    - **Leave-One-Out Cross-Validation (LOOCV)**
      - a special case of k-fold cross-validation where **K** is equal to the number of data points in the dataset.
    - What if **K** becomes bigger?
      1. train data $\uparrow$
      2. bias error $\downarrow$ and variance error $\uparrow$
      3. cost $\uparrow$
  - [[Regularization]] loss function