SI/Resource/Fundamentals of Data Mining/Content/K-Means.md


1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30

---
id: K-Means
aliases: []
tags: []
---

- [ ] ***
  id: K-Means
  aliases: []
  tags:
  - Clustering-Algorithms
  - Compare-and-Contrast

---

- K-Means [(Youtube)](https://www.youtube.com/watch?v=KzJORp8bgqs)
  - Each cluster is represented by the center/centroid of the cluster
- Given K, the number of clusters, the _K-Means_ clustering algorithm is
  outlined as follows
  - Select _**K**_ points as initial centroids
  - **Repeat**
    - Form _K_ clusters by assigning each point to its **closest** centroid
    - Re-compute the centroid (i.e., _**mean point**_) of each cluster
  - **Until** convergence criterion is satisfied (**e.g., no change of cluster
    membership, or a certain # of iterations have been reached, or, the [[SSE]]
    is < a pre-defined threshold**)
- Different kinds of distance measures can be used
  - [[Manhattan distance]] ($L_1$ norm), [[Euclidean distance]] ($L_2$ norm),
    [[Cosine similarity]], [[Mahalanobis distance]] ![[CleanShot 2023-10-24 at
15.34.07@2x.png]]