SI/Resource/Data Science/Machine Learning/Contents/Logistic Regression.md


1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47

---
id: 2023-12-18
aliases: December 18, 2023
tags:
- link-note 
- Data-Science
- Machine-Learning 
- Logistic-Regression 
---

# Logistic Regression

$$Y = \begin{cases} 1  & \text{if korean} \\ 2   & \text{if american} \\ 3   & \text{if japanese} \end{cases} \qquad\qquad\qquad Y = \begin{cases} 1  & \text{if american} \\ 2   & \text{if korean} \\ 3   & \text{if japanese} \end{cases}$$

- In general regression, the results vary depending on the order (size) of the labels
- A different loss function or model is needed
- The logistic regression model is a regression model in the form of a logistic function.
- The predicted value changes depending on the value of wX.
	1. If $w^{T}X > 0$: classified as 1.
	2. If $w^{T} X< 0$: classified as 0.
- How should the loss function be defined to find the optimal value of the parameter *w*?

## Odds

- The odds ratio represents how many times higher the probability of success (y=1) is compared to the probability of failure (y=0)
- $odds = \dfrac{p(y=1|x)}{1-p(y=1|x)}$

## Logit

- The function form of taking the logarithm of odds
- When the range of input probability (p) is [0,1], it outputs [$-\infty$, $+\infty$]
- $logit(p) = log(odds) = log\dfrac{p(y=1|x)}{1-p(y=1|x)}$

## Logistic Function

- The inverse function of the logit transformation
- $logit(p) = log(odds) = log\dfrac{p(y=1|x)}{1-p(y=1|x)} = w_{0}+ w_1x_{1}+ \dots + w_Dx_{D}= w^TX$
- $p(y = 1|x) = \dfrac{e^{w^{T}X}}{1 + e^{w^{T}X}} = \dfrac{1}{1 + e^{-w^TX}}$
- Therefore, the logistic function is a combination of linear regression and the sigmoid function

## Bayes' Theorem

- $P(w|X) = \dfrac{P(X|w)P(w)}{P(X)} \propto P(X|w)P(w)$
- **[[Posterior]]** probability, $P(w|X)$: The probability distribution of a hypothesis given the data (reliability).
- **Likelihood** probability, $P(X|W)$: The distribution of given data assuming a hypothesis is known, albeit not well understood.
- **Prior** probability, $P(w)$: The probability of a hypothesis known in general before looking at the data.
- There are two methods to estimate the hypothesis (model parameters) using these probabilities: [[Maximum Likelihood Estimation]] (**MLE**) and [[Maximum A Posteriori]] (**MAP**)