---
id: Kernel K-Means
aliases:
- *Kernel K-Means Clustering*
tags: []
---
## _Kernel K-Means Clustering_
- _Kernel K-Means_ can be used to detect non-convex clusters
- A region is **convex** if it contains all the line segments connecting any
pair of its points. Otherwise, it is **concave**
- _K-Means_ can only detect clusters that are **linearly** separable
- Idea: Project data onto the high-dimensional kernel space, and then
perform _K-Means_ clustering
- Map data points in the input space onto a high-demensional feature space
using the kernel function ![[CleanShot 2023-10-24 at 21.42.19@2x.png]]
- Perform _K-Means_ on the mapped feature space
- Computational complexity (time and space) is higher than K-Means
- Need to compute and store _n x n_ kernel matrix generated from the kernel
function on the original data, where _n_ is the number of points
## Kernel Functions and Kernel K-Means Clustering
- Typical kernel functions:
- Polynomial kernel of degree h: $K(X_i, X_j) = (X_i*X_j+1)^h$
- Gaussian radial basis function (RBF) kernel: $K(X_i, X_j) =
e^{-||X_i - X_j||^2 / 2\sigma^2}$
- Sigmoid kernel: $K(X_i, X_j) = tanh(KX_i*X_j - \delta)$
- The formula for kernel matrix K for any two points $x_i, x_j \in C_k$ is
$K_{x_ix_j} = \phi(x_i)*\phi(x_j)$
- The [[SSE]] criterion of _kernel K-means_: $$SSE(c) =
\sum_{k=1}^{K}\sum_{x_i\in{C_k}}||\phi(x_i) - c_k||^2$$
- The formula for the cluster centroid: $$c_k =
\dfrac{\sum_{x_i\in{C_k}}\phi(x_i)}{|C_k|}$$
- Clustering can be performed without the actual individual projections
$\Phi(x_i)$ and $\Phi(x_j)$ for the data points $x_i, x_j \in{C_k}$ (use
K(Xi,Xj) instead) ![[CleanShot 2023-10-24 at 22.06.43@2x.png]] ![[CleanShot
2023-10-24 at 22.07.05@2x.png]]