diff options
Diffstat (limited to 'SI/Resource/Fundamentals of Data Mining/Content/Kernel K-Means.md')
| -rw-r--r-- | SI/Resource/Fundamentals of Data Mining/Content/Kernel K-Means.md | 39 |
1 files changed, 39 insertions, 0 deletions
diff --git a/SI/Resource/Fundamentals of Data Mining/Content/Kernel K-Means.md b/SI/Resource/Fundamentals of Data Mining/Content/Kernel K-Means.md new file mode 100644 index 0000000..828a053 --- /dev/null +++ b/SI/Resource/Fundamentals of Data Mining/Content/Kernel K-Means.md @@ -0,0 +1,39 @@ +--- +id: Kernel K-Means +aliases: + - *Kernel K-Means Clustering* +tags: [] +--- + +## _Kernel K-Means Clustering_ + +- _Kernel K-Means_ can be used to detect non-convex clusters + - A region is **convex** if it contains all the line segments connecting any + pair of its points. Otherwise, it is **concave** + - _K-Means_ can only detect clusters that are **linearly** separable +- <u>Idea</u>: Project data onto the high-dimensional kernel space, and then + perform _K-Means_ clustering + - Map data points in the input space onto a high-demensional feature space + using the kernel function ![[CleanShot 2023-10-24 at 21.42.19@2x.png]] + - Perform _K-Means_ on the mapped feature space +- Computational complexity (time and space) is higher than K-Means + - Need to compute and store _n x n_ kernel matrix generated from the kernel + function on the original data, where _n_ is the number of points + +## Kernel Functions and Kernel K-Means Clustering + +- Typical kernel functions: + - Polynomial kernel of degree h: $K(X_i, X_j) = (X_i*X_j+1)^h$ + - <u>Gaussian radial basis function (RBF) kernel</u>: $K(X_i, X_j) = + e^{-||X_i - X_j||^2 / 2\sigma^2}$ + - Sigmoid kernel: $K(X_i, X_j) = tanh(KX_i*X_j - \delta)$ +- The formula for kernel matrix K for any two points $x_i, x_j \in C_k$ is + $K_{x_ix_j} = \phi(x_i)*\phi(x_j)$ +- The [[SSE]] criterion of _kernel K-means_: $$SSE(c) = + \sum_{k=1}^{K}\sum_{x_i\in{C_k}}||\phi(x_i) - c_k||^2$$ + - The formula for the cluster centroid: $$c_k = + \dfrac{\sum_{x_i\in{C_k}}\phi(x_i)}{|C_k|}$$ +- Clustering can be performed without the actual individual projections + $\Phi(x_i)$ and $\Phi(x_j)$ for the data points $x_i, x_j \in{C_k}$ (use + K(Xi,Xj) instead) ![[CleanShot 2023-10-24 at 22.06.43@2x.png]] ![[CleanShot +2023-10-24 at 22.07.05@2x.png]] |
