SI/Resource/Data Science/Machine Learning/Contents/Regularization.md


1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46

---
id: 2023-12-18
aliases: December 18, 2023
tags:
- link-note 
- Data-Science
- Machine-Learning
- Regularization 
---

# Regularization

## Regularization Loss Function

- The complexity of the model **$\uparrow$** == the number of model parameters **$\uparrow$**
- As the complexity of the model **$\uparrow$** == overfitting **$\uparrow$**
- Define a model with high complexity, learn only important parameters, and set unnecessary parameter values to **0**

## Regularization Types

### Ridge Regression (L2 Regression)

- $L = \bbox[orange,3px] {\sum_{i=1}^{n} (y_i - (\beta_0 + \sum_{j=1}^{D} \beta_j x_{ij}))^{2}} + \bbox[blue,3px] {\lambda \sum_{j=1}^{D} \beta_j^2}$
	- $\bbox[orange,3px]{\text{MSE}}$
	- $\bbox[blue,3px]{\text{Ridge}}$
	- If MSE loss is not reduced, the loss value of the penalty term becomes larger
	- Lambda $\lambda$ is a hyperparameter that controls the impact of regularization
	- Normalization function expressed as sum of squares

### Lasso Regression (L1 Regression)

- $L = \sum\limits_{i=1}^{n}(y_{i}- (\beta_{0}+ \sum\limits_{j=1}^{D} \beta_{j}x_{ij}))^{2}+ \lambda \sum\limits_{j=1}^{D} |\beta_j|$
	- If MSE loss is not reduced, the loss value of the penalty term becomes larger
	- Lambda $\lambda$ is a hyperparameter that controls the impact of regularization
	- Normalization function expressed as sum of absolute

![[Pasted image 20231218032332.png]]

## Question

- $\lambda \uparrow$ == Bias error $\uparrow$ and Variance error $\downarrow$
- Sparsity: Ridge regression $<$ Lasso regression
- How to make more parameters that have 0 values?
	1. $\lambda \uparrow$
	2. Exponent $\downarrow$
	- Good? or Bad?: don't know