From 4d53fa14ee0cd615444aca6f6ba176e0ccc1b5be Mon Sep 17 00:00:00 2001
From: TheSiahxyz <164138827+TheSiahxyz@users.noreply.github.com>
Date: Mon, 29 Apr 2024 22:06:12 -0400
Subject: init

---
 .../Content/K-Medoids.md                           | 45 ++++++++++++++++++++++
 1 file changed, 45 insertions(+)
 create mode 100644 SI/Resource/Fundamentals of Data Mining/Content/K-Medoids.md

(limited to 'SI/Resource/Fundamentals of Data Mining/Content/K-Medoids.md')
diff --git a/SI/Resource/Fundamentals of Data Mining/Content/K-Medoids.md b/SI/Resource/Fundamentals of Data Mining/Content/K-Medoids.md
new file mode 100644
index 0000000..6dc59fe
--- /dev/null
+++ b/SI/Resource/Fundamentals of Data Mining/Content/K-Medoids.md	
@@ -0,0 +1,45 @@
+---
+id: K-Medoids
+aliases:
+  - Handling Outliers: From _K-Means_ to _K-Medoids_ (Youtube)(Youtube-E.g.)
+tags:
+  - Compare-and-Contrast
+  - ""
+  - Complexity
+---
+
+## Handling Outliers: From _K-Means_ to _K-Medoids_ [(Youtube)](https://www.youtube.com/watch?v=OFELCn-6r2o) [(Youtube-E.g.)](https://www.youtube.com/watch?v=ChBxx4aR-bY&t=0s)
+
+- K-Medoids: Instead of taking the **mean** value of the objects in a cluster as
+  a reference point, **medoids** can be used, which is the **most centrally
+  located** object in a cluster
+
+- The _K-Medoids_ clustering algorithm:
+  - Select _K_ points as the initial ==representative== objects (i.e., as
+    initial _K medoids_)
+  - **Repeat**
+    - Assigning each point to the cluster with the closest medoid
+    - Randomly select a ==non-representative== object $o_i$
+    - Compute ==the total cost _S_== of swapping the medoid $m$ with
+      $o_i$(_==e.g.,useSSEtomeasure==_)
+    - If $S<0$ (_==e.g., new SSE < previous SSE==_), then swap $m$ with $o_i$ to
+      form the new set of medoids
+  - **Until** convergence criterion is satisfied
+
+### Discussion on _K-Medoids_ Clustering
+
+- _K-Medoids_ Clustering: Find _representative_ objects (<u>medoids</u>) in
+  clusters
+- _PAM_ (Partitioning Around Medoids)
+  - Starts from an initial set of medoids, and
+  - Iteratively replaces one of the medoids by one of the non-mdedoids if it
+    improves the total sum of the squared errors (SSE) of the resulting
+    clustering
+  - _PAM_ works effectively for small data sets but does not scale well for
+    large data sets (due to the computational complexity)
+  - Computational [[Complexity]]: PAM: $O(K(n - k)^2)$ (quite expensive!)
+- Efficiency improvements on PAM
+  - _**CLARA**_ (Kaufmann & Rousseeuw, 1990):
+    - PAM on samples; $O(Ks^2 + K(n - K))$, $s$ is the sample size
+  - _**CLARANS**_ (Ng & Han, 1994): ==Randomised re-sampling==, ensuring
+    efficiency + quality
-- 
cgit v1.2.3