diff options
Diffstat (limited to 'SI/Resource/Data Science')
18 files changed, 850 insertions, 0 deletions
diff --git a/SI/Resource/Data Science/Machine Learning/Contents/Bias and Variance.md b/SI/Resource/Data Science/Machine Learning/Contents/Bias and Variance.md new file mode 100644 index 0000000..938bf62 --- /dev/null +++ b/SI/Resource/Data Science/Machine Learning/Contents/Bias and Variance.md @@ -0,0 +1,45 @@ +--- +id: 2023-12-18 +aliases: December 18, 2023 +tags: +- link-note +- Data-Science +- Machine-Learning +- Bias-and-Variance +--- + +# Bias + +## Training Data (80~90%) vs. Test Data (10~20%) + +## Complexity + +- Complexity increases from linear to non-linear models +- Under-fitting: occurs when there is a lot of data +- Over-fitting: occurs when there is little data + +## Bias and Variance + +- Bias and variance are both types of error in an algorithm. +- $\begin{align} MSE (\hat{\theta}) \equiv E_{\theta} ((\hat{\theta} - \theta)^2) & = E((\hat{\theta} - E(\hat{\theta}) + E(\hat{\theta} - \theta)^2) \\ & = E((\hat{\theta} - E(\hat{\theta}))^2 + 2((\hat{\theta} - E(\hat{\theta}))(E(\hat{\theta}) - \theta)) + (E(\hat{\theta}) - \theta)^2) \\ & = E((\hat{\theta} - E(\hat{\theta}))^2) + 2(E(\hat{\theta}) - \theta)E(\hat{\theta} - E(\hat{\theta})) + (E(\hat{\theta}) - \theta)^2 \\ & = E((\hat{\theta} - E(\hat{\theta}))^2) + (E(\hat{\theta}) - \theta)^2 \\ & = Var_{\theta}(\hat{\theta}) + Bias_{\theta}(\hat{\theta}, \theta)^2 \end{align}$ +- $\bbox[teal,5px,border:2px solid red] { MSE (\hat{\theta}) = E((\hat{\theta} - E(\hat{\theta}))^2) + (E(\hat{\theta}) - \theta)^2 = Var_{\theta}(\hat{\theta}) + Bias_{\theta}(\hat{\theta}, \theta)^2 }$ +- Bias: under-fitting +- Variance: over-fitting +![[Pasted image 20231218005054.png]] +![[Pasted image 20231218005035.png]] + +## Trade-off + +- Solution + - Use validation data set + - $\bbox[teal,5px,border:2px solid red]{\text{Train data (80\%)+ Valid data (10\%) + Test data (10\%)}}$ + - Cannot directly participate in model training + - Continuously evaluates in the learning base, and stores the best existing performance + - K-fold cross validation + - **Leave-One-Out Cross-Validation (LOOCV)** + - a special case of k-fold cross-validation where **K** is equal to the number of data points in the dataset. + - What if **K** becomes bigger? + 1. train data $\uparrow$ + 2. bias error $\downarrow$ and variance error $\uparrow$ + 3. cost $\uparrow$ + - [[Regularization]] loss function
\ No newline at end of file diff --git a/SI/Resource/Data Science/Machine Learning/Contents/Classification.md b/SI/Resource/Data Science/Machine Learning/Contents/Classification.md new file mode 100644 index 0000000..12c125e --- /dev/null +++ b/SI/Resource/Data Science/Machine Learning/Contents/Classification.md @@ -0,0 +1,17 @@ +--- +id: 2023-12-17 +aliases: December 17, 2023 +tags: +- link-note +- Data-Science +- Machine-Learning +- Classification +--- + +# Classification + +Classification in the context of machine learning and statistics is a type of supervised learning approach where the output variable is a category, such as "spam" or "not spam", or "disease" and "no disease". In classification, an algorithm is trained on a dataset of labeled examples, learning to associate input data points with the corresponding category label. Once trained, the model can then categorize new, unseen data points. + +1. Input: Continuous (float), Discrete (categorical), etc. +2. Output: Discrete (categorical) +3. Model types: Binary - [[Sigmoid]], polynomial - [[softmax]]
\ No newline at end of file diff --git a/SI/Resource/Data Science/Machine Learning/Contents/Gradient descent.md b/SI/Resource/Data Science/Machine Learning/Contents/Gradient descent.md new file mode 100644 index 0000000..fdf8905 --- /dev/null +++ b/SI/Resource/Data Science/Machine Learning/Contents/Gradient descent.md @@ -0,0 +1,21 @@ +--- +id: 2023-12-17 +aliases: December 17, 2023 +tags: +- link-note +- Data-Science +- Machine-Learning +- Gradient-descent +--- + +# Gradient Descent + +- Update parameters that minimize values of loss functions +- **Instantaneous is always _0_** +- Therefore, update parameter that a derivative of loss function is equal to 0 + +## Pseudo Code + +1. Find the derivative of the loss function at the current parameters. +2. Update parameters in the opposite direction of the derivative +3. Repeat steps 1 and 2 as many epochs (hyperparameter) until the differential value becomes 0.
\ No newline at end of file diff --git a/SI/Resource/Data Science/Machine Learning/Contents/Hyperparameter.md b/SI/Resource/Data Science/Machine Learning/Contents/Hyperparameter.md new file mode 100644 index 0000000..895783f --- /dev/null +++ b/SI/Resource/Data Science/Machine Learning/Contents/Hyperparameter.md @@ -0,0 +1,15 @@ +--- +id: 2023-12-18 +aliases: December 18, 2023 +tags: +- link-note +- Data-Science +- Machine-Learning +- Hyperparameter +--- + +# Hyperparameter + +- A hyperparameter is a parameter whose value is set before the learning process begins +- Unlike model parameters, which are learned during training, hyperparameters are not learned from the data +- Human has to set the hyperparameters
\ No newline at end of file diff --git a/SI/Resource/Data Science/Machine Learning/Contents/Linear Regression.md b/SI/Resource/Data Science/Machine Learning/Contents/Linear Regression.md new file mode 100644 index 0000000..fdd175c --- /dev/null +++ b/SI/Resource/Data Science/Machine Learning/Contents/Linear Regression.md @@ -0,0 +1,26 @@ +--- +id: 2023-12-17 +aliases: December 17, 2023 +tags: +- link-note +- Data-Science +- Machine-Learning +- Linear-Regression +--- + +# Linear Regression + +## Simple Linear Regression + +- A model has one feature of data +- $y = w_0 + w_1*x$ + +## Multiple Linear Regression + +- A model has several features of data +- $y = w_0 + w_1*x + \dots + w_D*x_D$ + +## Polynomial Regression + +- A model has increased degrees of features +- $y = w_0 + w_1*x + w_2*x^2 + w_m*x^m$
\ No newline at end of file diff --git a/SI/Resource/Data Science/Machine Learning/Contents/Logistic Regression.md b/SI/Resource/Data Science/Machine Learning/Contents/Logistic Regression.md new file mode 100644 index 0000000..24b714e --- /dev/null +++ b/SI/Resource/Data Science/Machine Learning/Contents/Logistic Regression.md @@ -0,0 +1,47 @@ +--- +id: 2023-12-18 +aliases: December 18, 2023 +tags: +- link-note +- Data-Science +- Machine-Learning +- Logistic-Regression +--- + +# Logistic Regression + +$$Y = \begin{cases} 1 & \text{if korean} \\ 2 & \text{if american} \\ 3 & \text{if japanese} \end{cases} \qquad\qquad\qquad Y = \begin{cases} 1 & \text{if american} \\ 2 & \text{if korean} \\ 3 & \text{if japanese} \end{cases}$$ + +- In general regression, the results vary depending on the order (size) of the labels +- A different loss function or model is needed +- The logistic regression model is a regression model in the form of a logistic function. +- The predicted value changes depending on the value of wX. + 1. If $w^{T}X > 0$: classified as 1. + 2. If $w^{T} X< 0$: classified as 0. +- How should the loss function be defined to find the optimal value of the parameter *w*? + +## Odds + +- The odds ratio represents how many times higher the probability of success (y=1) is compared to the probability of failure (y=0) +- $odds = \dfrac{p(y=1|x)}{1-p(y=1|x)}$ + +## Logit + +- The function form of taking the logarithm of odds +- When the range of input probability (p) is [0,1], it outputs [$-\infty$, $+\infty$] +- $logit(p) = log(odds) = log\dfrac{p(y=1|x)}{1-p(y=1|x)}$ + +## Logistic Function + +- The inverse function of the logit transformation +- $logit(p) = log(odds) = log\dfrac{p(y=1|x)}{1-p(y=1|x)} = w_{0}+ w_1x_{1}+ \dots + w_Dx_{D}= w^TX$ +- $p(y = 1|x) = \dfrac{e^{w^{T}X}}{1 + e^{w^{T}X}} = \dfrac{1}{1 + e^{-w^TX}}$ +- Therefore, the logistic function is a combination of linear regression and the sigmoid function + +## Bayes' Theorem + +- $P(w|X) = \dfrac{P(X|w)P(w)}{P(X)} \propto P(X|w)P(w)$ +- **[[Posterior]]** probability, $P(w|X)$: The probability distribution of a hypothesis given the data (reliability). +- **Likelihood** probability, $P(X|W)$: The distribution of given data assuming a hypothesis is known, albeit not well understood. +- **Prior** probability, $P(w)$: The probability of a hypothesis known in general before looking at the data. +- There are two methods to estimate the hypothesis (model parameters) using these probabilities: [[Maximum Likelihood Estimation]] (**MLE**) and [[Maximum A Posteriori]] (**MAP**)
\ No newline at end of file diff --git a/SI/Resource/Data Science/Machine Learning/Contents/Optimization.md b/SI/Resource/Data Science/Machine Learning/Contents/Optimization.md new file mode 100644 index 0000000..2283442 --- /dev/null +++ b/SI/Resource/Data Science/Machine Learning/Contents/Optimization.md @@ -0,0 +1,53 @@ +--- +id: 2023-12-17 +aliases: December 17, 2023 +tags: +- link-note +- Data-Science +- Machine-Learning +- Optimization +--- + +# Optimization + +## Math + +### Partial Differentiation/Derivative + +- Differentiate about a specific variable +- Consider others as constants +- $\dfrac{\partial y}{\partial x}$ +- e.g., $f(x,y) = x^2 + xy + 3$ + +### Chain Rule + +- $\dfrac{dy}{dx} = \dfrac{dy}{du}*\dfrac{du}{dx}$ +- e.g., $y = ln(u), u = 2x + 4$ + +## Loss Function + +### Mean Squared Error (MSE) + +- $L = \frac{1}{N} \sum_{i=1}^{N} (y_i - \hat{y_i})^2$ + +## Parameter Calculation + +### Least Square Method (LSM) + +- Minimize error of data +- a: slope (coefficient) +- b: intercept +- $L = \sum_{i=1}^{N} (y_i - (ax_i + b))^2$ + +#### Method 1. + +- $0 = \dfrac{\partial L}{\partial a} = \sum_{i=1}^{N} 2(y_i - (ax_i + b))(-x_i) = 2(a\sum_{i=1}^{N} x_i^2 + b\sum_{i=1}^{N} x_i - \sum_{i=1}^{N} x_iy_i)$ +- $0 = \dfrac{\partial L}{\partial b} = \sum_{i=1}^{N} 2(y_i - (ax_i + b))(-1) = 2(a\sum_{i=1}^{N} x_i + b\sum_{i=1}^{N}1 - \sum_{i=1}^{N} y_i)$ +- $a^* = \dfrac{\sum_{i=1}^{N}(x-\bar{x})(y-\bar{y})}{\sum_{i=1}^{N}(x-\bar{x})^2}$ +- $b^* = \bar{y} - a^*\bar{x}$ + +#### Method 2. + +- Partial differentiation with respect to matrix $||Y - WX||^2$ +- $-2X^T(Y-WX) = 0$ +- $W = (X^TX)^{-1}X^TY$
\ No newline at end of file diff --git a/SI/Resource/Data Science/Machine Learning/Contents/Regression.md b/SI/Resource/Data Science/Machine Learning/Contents/Regression.md new file mode 100644 index 0000000..8e2a1df --- /dev/null +++ b/SI/Resource/Data Science/Machine Learning/Contents/Regression.md @@ -0,0 +1,17 @@ +--- +id: 2023-12-17 +aliases: December 17, 2023 +tags: +- link-note +- Data-Science +- Machine-Learning +- Regression +--- + +# Regression + +Regression is a statistical method used in data analysis that models the relationship between a dependent variable and one or more independent variables. The main goal of regression is to predict the value of the dependent variable based on the values of the independent variables. It's widely used in various fields like economics, finance, biology, engineering, and more, for forecasting, estimating, and identifying relationships among variables. + +1. Input: Continuous (float), Discrete (categorical), etc. +2. Output: Continuous (float) +3. Model types: a function (e.g., $y = w_1x + w_0$)
\ No newline at end of file diff --git a/SI/Resource/Data Science/Machine Learning/Contents/Regularization.md b/SI/Resource/Data Science/Machine Learning/Contents/Regularization.md new file mode 100644 index 0000000..7475105 --- /dev/null +++ b/SI/Resource/Data Science/Machine Learning/Contents/Regularization.md @@ -0,0 +1,46 @@ +--- +id: 2023-12-18 +aliases: December 18, 2023 +tags: +- link-note +- Data-Science +- Machine-Learning +- Regularization +--- + +# Regularization + +## Regularization Loss Function + +- The complexity of the model **$\uparrow$** == the number of model parameters **$\uparrow$** +- As the complexity of the model **$\uparrow$** == overfitting **$\uparrow$** +- Define a model with high complexity, learn only important parameters, and set unnecessary parameter values to **0** + +## Regularization Types + +### Ridge Regression (L2 Regression) + +- $L = \bbox[orange,3px] {\sum_{i=1}^{n} (y_i - (\beta_0 + \sum_{j=1}^{D} \beta_j x_{ij}))^{2}} + \bbox[blue,3px] {\lambda \sum_{j=1}^{D} \beta_j^2}$ + - $\bbox[orange,3px]{\text{MSE}}$ + - $\bbox[blue,3px]{\text{Ridge}}$ + - If MSE loss is not reduced, the loss value of the penalty term becomes larger + - Lambda $\lambda$ is a hyperparameter that controls the impact of regularization + - Normalization function expressed as sum of squares + +### Lasso Regression (L1 Regression) + +- $L = \sum\limits_{i=1}^{n}(y_{i}- (\beta_{0}+ \sum\limits_{j=1}^{D} \beta_{j}x_{ij}))^{2}+ \lambda \sum\limits_{j=1}^{D} |\beta_j|$ + - If MSE loss is not reduced, the loss value of the penalty term becomes larger + - Lambda $\lambda$ is a hyperparameter that controls the impact of regularization + - Normalization function expressed as sum of absolute + +![[Pasted image 20231218032332.png]] + +## Question + +- $\lambda \uparrow$ == Bias error $\uparrow$ and Variance error $\downarrow$ +- Sparsity: Ridge regression $<$ Lasso regression +- How to make more parameters that have 0 values? + 1. $\lambda \uparrow$ + 2. Exponent $\downarrow$ + - Good? or Bad?: don't know
\ No newline at end of file diff --git a/SI/Resource/Data Science/Machine Learning/Contents/Sigmoid.md b/SI/Resource/Data Science/Machine Learning/Contents/Sigmoid.md new file mode 100644 index 0000000..41ad25a --- /dev/null +++ b/SI/Resource/Data Science/Machine Learning/Contents/Sigmoid.md @@ -0,0 +1,17 @@ +--- +id: 2023-12-17 +aliases: December 17, 2023 +tags: +- link-note +- Data-Science +- Machine-Learning +- Sigmoid +--- + +# Sigmoid + +- Non-linear function for binary classification problem +- $y = \dfrac{1}{1+e^{-x}}$ +- $0 \le output \le 1$, mean = 0.5 + +![[Pasted image 20231218035418.png]]
\ No newline at end of file diff --git a/SI/Resource/Data Science/Machine Learning/Contents/Softmax Regression.md b/SI/Resource/Data Science/Machine Learning/Contents/Softmax Regression.md new file mode 100644 index 0000000..c886fac --- /dev/null +++ b/SI/Resource/Data Science/Machine Learning/Contents/Softmax Regression.md @@ -0,0 +1,14 @@ +--- +id: 2023-12-17 +aliases: December 17, 2023 +tags: +- link-note +- Data-Science +- Machine-Learning +- Softmax-Regression +--- + +# Softmax Regression + +- Non-linear function for polynomial classification problem +- $y_i = \dfrac{e^{X_i}}{\sum_{k=1}^{K} e^{X_k}}$, k: # of class
\ No newline at end of file diff --git a/SI/Resource/Data Science/Machine Learning/Machine Learning.md b/SI/Resource/Data Science/Machine Learning/Machine Learning.md new file mode 100644 index 0000000..0e1d5bc --- /dev/null +++ b/SI/Resource/Data Science/Machine Learning/Machine Learning.md @@ -0,0 +1,99 @@ +--- +id: 2023-12-17 +aliases: December 17, 2023 +tags: +- Machine-Learning +- Data-Science +--- +# Machine Learning + +Machine Learning (ML) is a subset of artificial intelligence (AI) that focuses on the development of systems and algorithms that can learn from and make decisions or predictions based on data. The core idea is to enable a machine to make intelligent decisions or predictions without being explicitly programmed to perform the task. + +## Machine + +Machine is a model or a function derived from data given by human + +## Learning + +Learning is to find the best model represented data, meaning optimization of parameter + +- Optimization of parameter: By statistical method or [[Gradient descent | gradient descent]] + +## Goal + +[[Optimization]]: Find optimal parameters + +- Optimal: is the best representation of data + - A model with the smallest difference between predictions $\hat{y}$ and actual values $y$ + - A model parameter makes the smallest loss + +## Supervised Learning + +1. [[Regression]] + - [[Linear Regression]] and [[Nonlinear Regression]] + - [[Gradient descent]] + - [[Bias and Variance]] Trade-off +2. [[Classification]] + - [[Sigmoid]] + - [[Logistic Regression]] and [[Softmax Regression]] + - [[Support Vector Machine]] ([[Support Vector Machine |SVM]]) + - [[Decision Tree]] + - [[Linear Discriminant Analysis]] ([[Linear Discriminant Analysis |LDA]]) + 1. [[Ensemble]] + - [[Bagging]] + - [[Boosting]] + +## Unsupervised Learning + +1. [[Preprocessing]] + - [[Principal Component Analysis]] ([[Principal Component Analysis |PCA]]) + - [[Singular Value Decomposition]] ([[Singular Value Decomposition |SVD]]) +2. [[Clustering]] + - [[K-Means]] + - [[Mean Shift]] + - [[Gaussian Mixture Model]] + - [[DBSCAN]] + +## Notations + +- Data Properties + - Features (= attributes, independent variables): X + - characteristics of data or items + - N: # of data sample + - D: # of features + - Label (dependent variables): y + - if there is a label, it is supervised. Otherwise, it is unsupervised + - Parameter (=weight): learnable parameters that a model have, not given data + - [[Hyperparameter]]: parameters that human has to decide +- Input vs. Output + - Input ($X$): values, parts of features, are put into a model + - Output ($\hat{y}$): values of prediction derived from model +- Linear vs. Nonlinear + - Linear regression: a model can be implemented by a linear function + - Simple Linear Regression: Involves two variables — one independent variable and one dependent variable. The relationship between these variables is modeled as a straight line. + - Multiple Linear Regression: Uses more than one independent variable to predict a dependent variable. The relationship is still linear in nature, meaning it assumes a straight-line relationship between each independent variable and the dependent variable. + - ex) $y = w_0 + w_1*x_1 + w_2*x_2 + \dots + w_D*x_D, y = w_0 + w_1*x_1 + w_2*x^2$ + - Non-linear regression: a model can't be implemented by a linear function + - ex) $log(y) = w_0 + w_1*log(x), y = max(x, 0)$ + +## Basic Math for ML + +- Function + - Relationships or rules between two groups + - $y = f(x)$, $x$ = input, $y$ = output +- Linear function + - $y = a*x +b$, $(a \ne 0)$ + - a: coefficient (slope), b = intercept +- Instantaneous Rate of Change + - is **the change in the rate at a particular instant**, and it is same as the change in the derivative value at a specific point. + - Slope (coefficient) where a $x (=a)$ meets a graph +- Derivation + - finds the instantaneous rate of change + - $f'(x)$ or $\dfrac{d}{dx}f(x)$ +- Minimum of function + - **The instantaneous rate of change at the minimum of function is always *0*.** + - Using this property, can find optimal parameter value +- Exponential + - $e = \lim_{n \to \infty} (1 + \dfrac{1}{n})^n$ + - a function or growth pattern that increases at a rate proportional to its current value + - $\dfrac{d}{dx} e^x = e^x$ diff --git a/SI/Resource/Data Science/Machine Learning/Screenshots/Pasted image 20231218004948.png b/SI/Resource/Data Science/Machine Learning/Screenshots/Pasted image 20231218004948.png Binary files differnew file mode 100644 index 0000000..1cec75a --- /dev/null +++ b/SI/Resource/Data Science/Machine Learning/Screenshots/Pasted image 20231218004948.png diff --git a/SI/Resource/Data Science/Machine Learning/Screenshots/Pasted image 20231218005035.png b/SI/Resource/Data Science/Machine Learning/Screenshots/Pasted image 20231218005035.png Binary files differnew file mode 100644 index 0000000..1bfb03b --- /dev/null +++ b/SI/Resource/Data Science/Machine Learning/Screenshots/Pasted image 20231218005035.png diff --git a/SI/Resource/Data Science/Machine Learning/Screenshots/Pasted image 20231218005054.png b/SI/Resource/Data Science/Machine Learning/Screenshots/Pasted image 20231218005054.png Binary files differnew file mode 100644 index 0000000..f8b24a0 --- /dev/null +++ b/SI/Resource/Data Science/Machine Learning/Screenshots/Pasted image 20231218005054.png diff --git a/SI/Resource/Data Science/Machine Learning/Screenshots/Pasted image 20231218032332.png b/SI/Resource/Data Science/Machine Learning/Screenshots/Pasted image 20231218032332.png Binary files differnew file mode 100644 index 0000000..95960cd --- /dev/null +++ b/SI/Resource/Data Science/Machine Learning/Screenshots/Pasted image 20231218032332.png diff --git a/SI/Resource/Data Science/Machine Learning/Screenshots/Pasted image 20231218035418.png b/SI/Resource/Data Science/Machine Learning/Screenshots/Pasted image 20231218035418.png Binary files differnew file mode 100644 index 0000000..17f6a0e --- /dev/null +++ b/SI/Resource/Data Science/Machine Learning/Screenshots/Pasted image 20231218035418.png diff --git a/SI/Resource/Data Science/SQL/MySQL/MySQL.md b/SI/Resource/Data Science/SQL/MySQL/MySQL.md new file mode 100644 index 0000000..cc23029 --- /dev/null +++ b/SI/Resource/Data Science/SQL/MySQL/MySQL.md @@ -0,0 +1,433 @@ +--- +id: 2023-12-19 +aliases: + - December 19 + - "2023" +tags: + - link-note + - MySQL + - Data-Science + - SQL +--- +# [[MySQL]] +## Variables + +```sql +INT +DECIMAL(3,2) -- gpa 3.70 -> 3 digits and 2 decimal +VARCHAR(10) +BLOB -- Binary Large Object, stores large data, e.g., images +DATE -- YYYY-MM-DD +TIMESTAMP -- YYYY-MM-DD HH:MM:SS +``` + +## Basic +### Table +- Create tables + +```sql +CREATE TABLE table_name ( +-- attributes variables (KEY types) (options) + id INT PRIMARY KEY AUTO_INCREMENT, + name VARCHAR(46) NOT NULL, + major VARCHAR(10) DEFAULT value, + column_name VARCHAR(10) UNIQUE, +); +``` + +- Drop tables + +```sql +DROP TABLE table_name; +``` + +- Describe tables + +```sql +DESCRIBE table_name; + +DESC table_name; +``` + +- RENAME table + +```sql +ALTER TABLE table_name +RENAME TO new_table_name; +``` + +- UPDATE table + +```sql +UPDATE table_name +SET 1_column_name = new_value, (2_column_name = new_value); +(WHERE other_column_name = value;) +``` + +### Column +- ALTER columns + +```sql +ALTER TABLE table_name ADD column_name DECIMAL(3,2); + +ALTER TABLE table_name DROP COLUMN column_name; +``` + +- ADD column + +```sql +ALTER TABLE table_name +ADD COLUMN new_column_name DATA_TYPE; +``` + +- CHANGE column_name + +```sql +ALTER TABLE table_name +CHANGE COLUMN column_name (DATA_TYPE); +``` + +- DROP column_name + +```sql +ALTER TABLE table_name +DROP COLUMN column_name; +``` + +### Data +- INSERT INTO data + +```sql +INSERT INTO table_name VALUES(data); + +INSERT INTO table_name(column_name) VALUES(data); +``` + +- SELECT */data + +```sql +SELECT * +FROM table_name; + +SELECT column_name (AS col_name) +FROM table_name; +``` + +- DELETE FROM data + +```sql +DELETE FROM table_name; +WHERE a_column_name = value; +``` + +### Condition +- ORDER BY ... ASCending & DESCending + +```sql +SELECT * +FROM table_name; +ORDER BY column_name ASC / DESC; +``` + +- LIMIT int + +```sql +SELECT * +FROM table_name; +LIMIT int; +``` + +- condition (cmp: <, <=, >, >=, =, <>) + +```sql +SELECT * +FROM table_name; +WHERE column_name cmp value; +``` + +- OR, AND + +```sql +SELECT * +FROM table_name; +WHERE column_name + OR/AND + condition; +``` + +- (NOT) IN (condition_1, condition_2, ...) + +```sql +SELECT * +FROM table_name; +WHERE column_name IN (column_name_1, column_name_2, ...); +``` + +- BETWEEN ... AND ... + +```sql +SELECT * +FROM table_name; +WHERE column_name BETWEEN int AND int; +``` + +#### Filtering +- LIKE (target: char%, %char, %str%, _char%, char_%_%, char%char) + +```sql +SELECT * +FROM table_name; +WHERE column_name LIKE target; +``` + +- CASE WHEN condition(cmp) ELSE condition(cmp) END + +```sql +SELECT * + CASE WHEN condition THEN result + WHEN condition THEN result + ELSE result + END +FROM table_name; +``` + +## Functions +- Calculation (fn: SUM(), COUNT(), MIN(), MAX(), AVG()) + +```sql +SELECT fn(column_name) variable +FROM table_name; +``` + +- Digit (fn: TRUNCATE(digit, n), ROUND(digit, n), CEIL(), FLOOR(), ABS(), SIGN()) + +```sql +SELECT fn() +``` + +- String (fn: LENGTH(), TRIM(), UPPER(), LOWER(), LEFT(str, n), RIGHT(str, n), REPLACE(), LPAD(), RPAD(), SUBSTRING(), CONCAT(), CONCAT_WS()) + +```sql +SELECT column_name, LENGHT(column_name) +FROM table_name; + +SELECT * +FROM table_name; +WHERE column_name TRIM(str); + +SELECT UPPER/LOWER(column_name) +FROM table_name; + +SELECT LEFT/RIGHT(column_name, int) +FROM table_name; + +SELECT REPLACE(column_name, target, result) +FROM table_name; + +SELECT column_name, LPAD/RPAD(column_name, target, result) +FROM table_name; + +SELECT SUBSTRING(column_name, target, number_of_result) +FROM table_name; + +SELECT CONCAT(str_1, str_2, ...) +FROM table_name; +``` + +### Date +- Date & Time (fn: CURDATE(), CURTIME(), NOW(), YEAR(), MONTH(), DAY(), LAST_DAY(), WEEKDAY(), DAYNAME()) + +```sql +SELECT fn() +FROM table_name; +``` + +- fn: ADDDATE(target, int), SUBDATE(target, int), DATEDIFF(target_1, target_2), TIMEDIFF(target_1, target_2) + +```sql +SELECT column_name + fn(target, int) +FROM table_name; +``` + +#### Date Format +- DATE_FORMAT (fm: %Y, %y, %M, %b, %W, %a, %i, %T, %m, %c, %d, %e, %I, %H, %r, %S) + +```sql +SELECT column_name, DATE_FORMAT(column_name, fm_1, fm_2, ...) +FROM table_name; +``` + +## GROUP BY +- GROUP BY column_name_1, column_name_2, ..., Calculation_1, Calculation_2, ... + +```sql +SELECT column_name, fn() +FROM table_name; +GROUP BY column_name; +``` + +- HAVING condition (cmp) after GROUP BY + +```sql +SELECT column_name +FROM table_name; +GROUP BY column_name; +HAVING condition; +``` + +## Division & Analysis +- RANK()/DENSE_RANK() OVER (ORDER BY column_name) + +```sql +SELECT * + RANK()/DENSE_RANK() OVER(ORDER BY column_name (ASC/DESC)) +FROM table_name; +``` + +### Division +- PARTITION BY + +```sql +SELECT * + RANK()/DENSE_RANK()(PARTITION BY column_name ORDER BY column_name (ASC/DESC)))))))))) +FROM table_name +``` + +### Before/After +- LAG(column_name, int): before /LEAD(column_name, int): after + +```sql +SELECT * + LAG(column_name, int)/LEAD(column_name, int) OVER(PARTITION BY ORDER BY column_name) +FROM table_nam e +ORDER BY column_name +``` + +### Border +- ROWS (FRAME: CURRENT ROW, int PRECEDING, int FOLLOWING, UNBOUNDED PRECEDING, UNBOUNDED FOLLOWING) AND (FRAME) + +```sql +SELECT column_name_1, column_name_2, column_name_3 + AVG(column_name_3) OVER (PARTITION BY column_name_1 ORDER BY column_name_2 + ROWS BETWEEN int PRECEDING AND CURRENT ROW) new_column_name + AVG(column_name_3) OVER (PARTITION BY column_name_1 ORDER BY column_name_2 + ROWS BETWEEN int FOLLOWING AND CURRENT ROW) new_column_name +FROM table_name; +ORDER BY column_name_1, column_name_2; + +SELECT column_name_1, column_name_2, column_name_3 + AVG(column_name_3) OVER (PARTITION BY column_name_1 ORDER BY column_name_2 + ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) new_column_name + AVG(column_name_3) OVER (PARTITION BY column_name_1 ORDER BY column_name_2 + ROWS BETWEEN UNBOUNDED FOLLOWING AND CURRENT ROW) new_column_name +FROM table_name; +ORDER BY column_name_1, column_name_2; +``` + +### Merge +- (INNER) JOIN + +```sql +SELECT */(alias_1.)column_name_1, (alias_2.)column_name_2 +FROM table_name_1 (as alias_1) JOIN table_name_2 (as alias_2); +ON (alias_1/)table_name_1.column_name_1 = (alias_2/)table_name_2.column_name_2; +``` + +- LEFT/RIGHT JOIN + +```sql +SELECT * +FROM table_name_1 as tb1 LEFT/RIGHT JOIN table_name_2 as tb2; +ON tb1.column_name_1 = tb2.column_name_2; +ORDER BY column_name; +``` + +- CROSS JOIN + +```sql +SELECT * +FROM table_name_1 CROSS + JOIN table_name_2; + (JOIN table_name_...) + (...) +``` + +## Sub Query / Temporary Table +### Multi Columns +- () + +```sql +SELECT * +FROM +( + SELECT *, RANK() OVER(PARTITION BY column_name_1 ORDER BY column_name_2 DESC) + FROM table_name; +) alias; +(WHERE condition;) +``` + +### Uni Column & Uni Data +- IN + +```sql +SELECT * +FROM table_name_1; + (JOIN column_name_1 column_name_2 + ON column_name_1 = column_name_2;) +WHERE column_name_2 IN + (SELECT column_name_2 + FROM table_name_2; + WHERE condition;); + +SELECT * +FROM table_name_1; +WHERE condition =(/IN) + (SELECT column_name_1 + FROM table_name_2;) +``` + +### VIEW / WITH +- Create + +```sql +CREATE VIEW temp_column_name as alias_1 +SELECT * +FROM table_name_1; +``` + +- Drop + +```sql +DROP VIEW alias_1; +``` + +- WITH + +```sql +WITH temp_table_name as( + SELECT *, + RANK() OVER(PARITION BY column_name_1 ORDER BY column_name_2 DESC) new_column_name; + FROM table_name +) +SELECT * +FROM table_name +(WHERE condition) +``` + +## Custom Function +- Nominal Return Value + +```sql +DELIMITER $$ +CREATE FUNCTION fn_name( + num1 INT, + num2 INT, +) RETURN INT +BEGIN + DECLARE sum_ INT; + SET sum_ = num1 + num2; + RETURN sum_; +end $$ +DELIMITER ; +``` |
