--- id: 2023-12-18 aliases: December 18, 2023 tags: - link-note - Data-Science - Machine-Learning - Logistic-Regression --- # Logistic Regression $$Y = \begin{cases} 1 & \text{if korean} \\ 2 & \text{if american} \\ 3 & \text{if japanese} \end{cases} \qquad\qquad\qquad Y = \begin{cases} 1 & \text{if american} \\ 2 & \text{if korean} \\ 3 & \text{if japanese} \end{cases}$$ - In general regression, the results vary depending on the order (size) of the labels - A different loss function or model is needed - The logistic regression model is a regression model in the form of a logistic function. - The predicted value changes depending on the value of wX. 1. If $w^{T}X > 0$: classified as 1. 2. If $w^{T} X< 0$: classified as 0. - How should the loss function be defined to find the optimal value of the parameter *w*? ## Odds - The odds ratio represents how many times higher the probability of success (y=1) is compared to the probability of failure (y=0) - $odds = \dfrac{p(y=1|x)}{1-p(y=1|x)}$ ## Logit - The function form of taking the logarithm of odds - When the range of input probability (p) is [0,1], it outputs [$-\infty$, $+\infty$] - $logit(p) = log(odds) = log\dfrac{p(y=1|x)}{1-p(y=1|x)}$ ## Logistic Function - The inverse function of the logit transformation - $logit(p) = log(odds) = log\dfrac{p(y=1|x)}{1-p(y=1|x)} = w_{0}+ w_1x_{1}+ \dots + w_Dx_{D}= w^TX$ - $p(y = 1|x) = \dfrac{e^{w^{T}X}}{1 + e^{w^{T}X}} = \dfrac{1}{1 + e^{-w^TX}}$ - Therefore, the logistic function is a combination of linear regression and the sigmoid function ## Bayes' Theorem - $P(w|X) = \dfrac{P(X|w)P(w)}{P(X)} \propto P(X|w)P(w)$ - **[[Posterior]]** probability, $P(w|X)$: The probability distribution of a hypothesis given the data (reliability). - **Likelihood** probability, $P(X|W)$: The distribution of given data assuming a hypothesis is known, albeit not well understood. - **Prior** probability, $P(w)$: The probability of a hypothesis known in general before looking at the data. - There are two methods to estimate the hypothesis (model parameters) using these probabilities: [[Maximum Likelihood Estimation]] (**MLE**) and [[Maximum A Posteriori]] (**MAP**)