diff options
Diffstat (limited to 'SI/Resource/Fundamentals of Data Mining/Content/K-Means.md')
| -rw-r--r-- | SI/Resource/Fundamentals of Data Mining/Content/K-Means.md | 23 |
1 files changed, 23 insertions, 0 deletions
diff --git a/SI/Resource/Fundamentals of Data Mining/Content/K-Means.md b/SI/Resource/Fundamentals of Data Mining/Content/K-Means.md new file mode 100644 index 0000000..d61d82b --- /dev/null +++ b/SI/Resource/Fundamentals of Data Mining/Content/K-Means.md @@ -0,0 +1,23 @@ +--- +id: K-Means +aliases: [] +tags: + - Clustering-Algorithms + - Compare-and-Contrast +--- + +- K-Means [(Youtube)](https://www.youtube.com/watch?v=KzJORp8bgqs) + - Each cluster is represented by the center/centroid of the cluster +- Given K, the number of clusters, the _K-Means_ clustering algorithm is + outlined as follows + - Select _**K**_ points as initial centroids + - **Repeat** + - Form _K_ clusters by assigning each point to its **closest** centroid + - Re-compute the centroid (i.e., _**mean point**_) of each cluster + - **Until** convergence criterion is satisfied (**e.g., no change of cluster + membership, or a certain # of iterations have been reached, or, the [[SSE]] + is < a pre-defined threshold**) +- Different kinds of distance measures can be used + - [[Manhattan distance]] ($L_1$ norm), [[Euclidean distance]] ($L_2$ norm), + [[Cosine similarity]], [[Mahalanobis distance]] ![[CleanShot 2023-10-24 at +15.34.07@2x.png]] |
