diff options
| author | TheSiahxyz <164138827+TheSiahxyz@users.noreply.github.com> | 2024-04-29 22:06:12 -0400 |
|---|---|---|
| committer | TheSiahxyz <164138827+TheSiahxyz@users.noreply.github.com> | 2024-04-29 22:06:12 -0400 |
| commit | 4d53fa14ee0cd615444aca6f6ba176e0ccc1b5be (patch) | |
| tree | 4d9f0527d9e6db4f92736ead0aa9bb3f840a0f89 /SI/Resource/Data Science/Machine Learning/Contents | |
init
Diffstat (limited to 'SI/Resource/Data Science/Machine Learning/Contents')
11 files changed, 318 insertions, 0 deletions
diff --git a/SI/Resource/Data Science/Machine Learning/Contents/Bias and Variance.md b/SI/Resource/Data Science/Machine Learning/Contents/Bias and Variance.md new file mode 100644 index 0000000..938bf62 --- /dev/null +++ b/SI/Resource/Data Science/Machine Learning/Contents/Bias and Variance.md @@ -0,0 +1,45 @@ +--- +id: 2023-12-18 +aliases: December 18, 2023 +tags: +- link-note +- Data-Science +- Machine-Learning +- Bias-and-Variance +--- + +# Bias + +## Training Data (80~90%) vs. Test Data (10~20%) + +## Complexity + +- Complexity increases from linear to non-linear models +- Under-fitting: occurs when there is a lot of data +- Over-fitting: occurs when there is little data + +## Bias and Variance + +- Bias and variance are both types of error in an algorithm. +- $\begin{align} MSE (\hat{\theta}) \equiv E_{\theta} ((\hat{\theta} - \theta)^2) & = E((\hat{\theta} - E(\hat{\theta}) + E(\hat{\theta} - \theta)^2) \\ & = E((\hat{\theta} - E(\hat{\theta}))^2 + 2((\hat{\theta} - E(\hat{\theta}))(E(\hat{\theta}) - \theta)) + (E(\hat{\theta}) - \theta)^2) \\ & = E((\hat{\theta} - E(\hat{\theta}))^2) + 2(E(\hat{\theta}) - \theta)E(\hat{\theta} - E(\hat{\theta})) + (E(\hat{\theta}) - \theta)^2 \\ & = E((\hat{\theta} - E(\hat{\theta}))^2) + (E(\hat{\theta}) - \theta)^2 \\ & = Var_{\theta}(\hat{\theta}) + Bias_{\theta}(\hat{\theta}, \theta)^2 \end{align}$ +- $\bbox[teal,5px,border:2px solid red] { MSE (\hat{\theta}) = E((\hat{\theta} - E(\hat{\theta}))^2) + (E(\hat{\theta}) - \theta)^2 = Var_{\theta}(\hat{\theta}) + Bias_{\theta}(\hat{\theta}, \theta)^2 }$ +- Bias: under-fitting +- Variance: over-fitting +![[Pasted image 20231218005054.png]] +![[Pasted image 20231218005035.png]] + +## Trade-off + +- Solution + - Use validation data set + - $\bbox[teal,5px,border:2px solid red]{\text{Train data (80\%)+ Valid data (10\%) + Test data (10\%)}}$ + - Cannot directly participate in model training + - Continuously evaluates in the learning base, and stores the best existing performance + - K-fold cross validation + - **Leave-One-Out Cross-Validation (LOOCV)** + - a special case of k-fold cross-validation where **K** is equal to the number of data points in the dataset. + - What if **K** becomes bigger? + 1. train data $\uparrow$ + 2. bias error $\downarrow$ and variance error $\uparrow$ + 3. cost $\uparrow$ + - [[Regularization]] loss function
\ No newline at end of file diff --git a/SI/Resource/Data Science/Machine Learning/Contents/Classification.md b/SI/Resource/Data Science/Machine Learning/Contents/Classification.md new file mode 100644 index 0000000..12c125e --- /dev/null +++ b/SI/Resource/Data Science/Machine Learning/Contents/Classification.md @@ -0,0 +1,17 @@ +--- +id: 2023-12-17 +aliases: December 17, 2023 +tags: +- link-note +- Data-Science +- Machine-Learning +- Classification +--- + +# Classification + +Classification in the context of machine learning and statistics is a type of supervised learning approach where the output variable is a category, such as "spam" or "not spam", or "disease" and "no disease". In classification, an algorithm is trained on a dataset of labeled examples, learning to associate input data points with the corresponding category label. Once trained, the model can then categorize new, unseen data points. + +1. Input: Continuous (float), Discrete (categorical), etc. +2. Output: Discrete (categorical) +3. Model types: Binary - [[Sigmoid]], polynomial - [[softmax]]
\ No newline at end of file diff --git a/SI/Resource/Data Science/Machine Learning/Contents/Gradient descent.md b/SI/Resource/Data Science/Machine Learning/Contents/Gradient descent.md new file mode 100644 index 0000000..fdf8905 --- /dev/null +++ b/SI/Resource/Data Science/Machine Learning/Contents/Gradient descent.md @@ -0,0 +1,21 @@ +--- +id: 2023-12-17 +aliases: December 17, 2023 +tags: +- link-note +- Data-Science +- Machine-Learning +- Gradient-descent +--- + +# Gradient Descent + +- Update parameters that minimize values of loss functions +- **Instantaneous is always _0_** +- Therefore, update parameter that a derivative of loss function is equal to 0 + +## Pseudo Code + +1. Find the derivative of the loss function at the current parameters. +2. Update parameters in the opposite direction of the derivative +3. Repeat steps 1 and 2 as many epochs (hyperparameter) until the differential value becomes 0.
\ No newline at end of file diff --git a/SI/Resource/Data Science/Machine Learning/Contents/Hyperparameter.md b/SI/Resource/Data Science/Machine Learning/Contents/Hyperparameter.md new file mode 100644 index 0000000..895783f --- /dev/null +++ b/SI/Resource/Data Science/Machine Learning/Contents/Hyperparameter.md @@ -0,0 +1,15 @@ +--- +id: 2023-12-18 +aliases: December 18, 2023 +tags: +- link-note +- Data-Science +- Machine-Learning +- Hyperparameter +--- + +# Hyperparameter + +- A hyperparameter is a parameter whose value is set before the learning process begins +- Unlike model parameters, which are learned during training, hyperparameters are not learned from the data +- Human has to set the hyperparameters
\ No newline at end of file diff --git a/SI/Resource/Data Science/Machine Learning/Contents/Linear Regression.md b/SI/Resource/Data Science/Machine Learning/Contents/Linear Regression.md new file mode 100644 index 0000000..fdd175c --- /dev/null +++ b/SI/Resource/Data Science/Machine Learning/Contents/Linear Regression.md @@ -0,0 +1,26 @@ +--- +id: 2023-12-17 +aliases: December 17, 2023 +tags: +- link-note +- Data-Science +- Machine-Learning +- Linear-Regression +--- + +# Linear Regression + +## Simple Linear Regression + +- A model has one feature of data +- $y = w_0 + w_1*x$ + +## Multiple Linear Regression + +- A model has several features of data +- $y = w_0 + w_1*x + \dots + w_D*x_D$ + +## Polynomial Regression + +- A model has increased degrees of features +- $y = w_0 + w_1*x + w_2*x^2 + w_m*x^m$
\ No newline at end of file diff --git a/SI/Resource/Data Science/Machine Learning/Contents/Logistic Regression.md b/SI/Resource/Data Science/Machine Learning/Contents/Logistic Regression.md new file mode 100644 index 0000000..24b714e --- /dev/null +++ b/SI/Resource/Data Science/Machine Learning/Contents/Logistic Regression.md @@ -0,0 +1,47 @@ +--- +id: 2023-12-18 +aliases: December 18, 2023 +tags: +- link-note +- Data-Science +- Machine-Learning +- Logistic-Regression +--- + +# Logistic Regression + +$$Y = \begin{cases} 1 & \text{if korean} \\ 2 & \text{if american} \\ 3 & \text{if japanese} \end{cases} \qquad\qquad\qquad Y = \begin{cases} 1 & \text{if american} \\ 2 & \text{if korean} \\ 3 & \text{if japanese} \end{cases}$$ + +- In general regression, the results vary depending on the order (size) of the labels +- A different loss function or model is needed +- The logistic regression model is a regression model in the form of a logistic function. +- The predicted value changes depending on the value of wX. + 1. If $w^{T}X > 0$: classified as 1. + 2. If $w^{T} X< 0$: classified as 0. +- How should the loss function be defined to find the optimal value of the parameter *w*? + +## Odds + +- The odds ratio represents how many times higher the probability of success (y=1) is compared to the probability of failure (y=0) +- $odds = \dfrac{p(y=1|x)}{1-p(y=1|x)}$ + +## Logit + +- The function form of taking the logarithm of odds +- When the range of input probability (p) is [0,1], it outputs [$-\infty$, $+\infty$] +- $logit(p) = log(odds) = log\dfrac{p(y=1|x)}{1-p(y=1|x)}$ + +## Logistic Function + +- The inverse function of the logit transformation +- $logit(p) = log(odds) = log\dfrac{p(y=1|x)}{1-p(y=1|x)} = w_{0}+ w_1x_{1}+ \dots + w_Dx_{D}= w^TX$ +- $p(y = 1|x) = \dfrac{e^{w^{T}X}}{1 + e^{w^{T}X}} = \dfrac{1}{1 + e^{-w^TX}}$ +- Therefore, the logistic function is a combination of linear regression and the sigmoid function + +## Bayes' Theorem + +- $P(w|X) = \dfrac{P(X|w)P(w)}{P(X)} \propto P(X|w)P(w)$ +- **[[Posterior]]** probability, $P(w|X)$: The probability distribution of a hypothesis given the data (reliability). +- **Likelihood** probability, $P(X|W)$: The distribution of given data assuming a hypothesis is known, albeit not well understood. +- **Prior** probability, $P(w)$: The probability of a hypothesis known in general before looking at the data. +- There are two methods to estimate the hypothesis (model parameters) using these probabilities: [[Maximum Likelihood Estimation]] (**MLE**) and [[Maximum A Posteriori]] (**MAP**)
\ No newline at end of file diff --git a/SI/Resource/Data Science/Machine Learning/Contents/Optimization.md b/SI/Resource/Data Science/Machine Learning/Contents/Optimization.md new file mode 100644 index 0000000..2283442 --- /dev/null +++ b/SI/Resource/Data Science/Machine Learning/Contents/Optimization.md @@ -0,0 +1,53 @@ +--- +id: 2023-12-17 +aliases: December 17, 2023 +tags: +- link-note +- Data-Science +- Machine-Learning +- Optimization +--- + +# Optimization + +## Math + +### Partial Differentiation/Derivative + +- Differentiate about a specific variable +- Consider others as constants +- $\dfrac{\partial y}{\partial x}$ +- e.g., $f(x,y) = x^2 + xy + 3$ + +### Chain Rule + +- $\dfrac{dy}{dx} = \dfrac{dy}{du}*\dfrac{du}{dx}$ +- e.g., $y = ln(u), u = 2x + 4$ + +## Loss Function + +### Mean Squared Error (MSE) + +- $L = \frac{1}{N} \sum_{i=1}^{N} (y_i - \hat{y_i})^2$ + +## Parameter Calculation + +### Least Square Method (LSM) + +- Minimize error of data +- a: slope (coefficient) +- b: intercept +- $L = \sum_{i=1}^{N} (y_i - (ax_i + b))^2$ + +#### Method 1. + +- $0 = \dfrac{\partial L}{\partial a} = \sum_{i=1}^{N} 2(y_i - (ax_i + b))(-x_i) = 2(a\sum_{i=1}^{N} x_i^2 + b\sum_{i=1}^{N} x_i - \sum_{i=1}^{N} x_iy_i)$ +- $0 = \dfrac{\partial L}{\partial b} = \sum_{i=1}^{N} 2(y_i - (ax_i + b))(-1) = 2(a\sum_{i=1}^{N} x_i + b\sum_{i=1}^{N}1 - \sum_{i=1}^{N} y_i)$ +- $a^* = \dfrac{\sum_{i=1}^{N}(x-\bar{x})(y-\bar{y})}{\sum_{i=1}^{N}(x-\bar{x})^2}$ +- $b^* = \bar{y} - a^*\bar{x}$ + +#### Method 2. + +- Partial differentiation with respect to matrix $||Y - WX||^2$ +- $-2X^T(Y-WX) = 0$ +- $W = (X^TX)^{-1}X^TY$
\ No newline at end of file diff --git a/SI/Resource/Data Science/Machine Learning/Contents/Regression.md b/SI/Resource/Data Science/Machine Learning/Contents/Regression.md new file mode 100644 index 0000000..8e2a1df --- /dev/null +++ b/SI/Resource/Data Science/Machine Learning/Contents/Regression.md @@ -0,0 +1,17 @@ +--- +id: 2023-12-17 +aliases: December 17, 2023 +tags: +- link-note +- Data-Science +- Machine-Learning +- Regression +--- + +# Regression + +Regression is a statistical method used in data analysis that models the relationship between a dependent variable and one or more independent variables. The main goal of regression is to predict the value of the dependent variable based on the values of the independent variables. It's widely used in various fields like economics, finance, biology, engineering, and more, for forecasting, estimating, and identifying relationships among variables. + +1. Input: Continuous (float), Discrete (categorical), etc. +2. Output: Continuous (float) +3. Model types: a function (e.g., $y = w_1x + w_0$)
\ No newline at end of file diff --git a/SI/Resource/Data Science/Machine Learning/Contents/Regularization.md b/SI/Resource/Data Science/Machine Learning/Contents/Regularization.md new file mode 100644 index 0000000..7475105 --- /dev/null +++ b/SI/Resource/Data Science/Machine Learning/Contents/Regularization.md @@ -0,0 +1,46 @@ +--- +id: 2023-12-18 +aliases: December 18, 2023 +tags: +- link-note +- Data-Science +- Machine-Learning +- Regularization +--- + +# Regularization + +## Regularization Loss Function + +- The complexity of the model **$\uparrow$** == the number of model parameters **$\uparrow$** +- As the complexity of the model **$\uparrow$** == overfitting **$\uparrow$** +- Define a model with high complexity, learn only important parameters, and set unnecessary parameter values to **0** + +## Regularization Types + +### Ridge Regression (L2 Regression) + +- $L = \bbox[orange,3px] {\sum_{i=1}^{n} (y_i - (\beta_0 + \sum_{j=1}^{D} \beta_j x_{ij}))^{2}} + \bbox[blue,3px] {\lambda \sum_{j=1}^{D} \beta_j^2}$ + - $\bbox[orange,3px]{\text{MSE}}$ + - $\bbox[blue,3px]{\text{Ridge}}$ + - If MSE loss is not reduced, the loss value of the penalty term becomes larger + - Lambda $\lambda$ is a hyperparameter that controls the impact of regularization + - Normalization function expressed as sum of squares + +### Lasso Regression (L1 Regression) + +- $L = \sum\limits_{i=1}^{n}(y_{i}- (\beta_{0}+ \sum\limits_{j=1}^{D} \beta_{j}x_{ij}))^{2}+ \lambda \sum\limits_{j=1}^{D} |\beta_j|$ + - If MSE loss is not reduced, the loss value of the penalty term becomes larger + - Lambda $\lambda$ is a hyperparameter that controls the impact of regularization + - Normalization function expressed as sum of absolute + +![[Pasted image 20231218032332.png]] + +## Question + +- $\lambda \uparrow$ == Bias error $\uparrow$ and Variance error $\downarrow$ +- Sparsity: Ridge regression $<$ Lasso regression +- How to make more parameters that have 0 values? + 1. $\lambda \uparrow$ + 2. Exponent $\downarrow$ + - Good? or Bad?: don't know
\ No newline at end of file diff --git a/SI/Resource/Data Science/Machine Learning/Contents/Sigmoid.md b/SI/Resource/Data Science/Machine Learning/Contents/Sigmoid.md new file mode 100644 index 0000000..41ad25a --- /dev/null +++ b/SI/Resource/Data Science/Machine Learning/Contents/Sigmoid.md @@ -0,0 +1,17 @@ +--- +id: 2023-12-17 +aliases: December 17, 2023 +tags: +- link-note +- Data-Science +- Machine-Learning +- Sigmoid +--- + +# Sigmoid + +- Non-linear function for binary classification problem +- $y = \dfrac{1}{1+e^{-x}}$ +- $0 \le output \le 1$, mean = 0.5 + +![[Pasted image 20231218035418.png]]
\ No newline at end of file diff --git a/SI/Resource/Data Science/Machine Learning/Contents/Softmax Regression.md b/SI/Resource/Data Science/Machine Learning/Contents/Softmax Regression.md new file mode 100644 index 0000000..c886fac --- /dev/null +++ b/SI/Resource/Data Science/Machine Learning/Contents/Softmax Regression.md @@ -0,0 +1,14 @@ +--- +id: 2023-12-17 +aliases: December 17, 2023 +tags: +- link-note +- Data-Science +- Machine-Learning +- Softmax-Regression +--- + +# Softmax Regression + +- Non-linear function for polynomial classification problem +- $y_i = \dfrac{e^{X_i}}{\sum_{k=1}^{K} e^{X_k}}$, k: # of class
\ No newline at end of file |
