summaryrefslogtreecommitdiff
path: root/SI/Resource/Fundamentals of Data Mining/Content/Hierarchical Clustering.md
diff options
context:
space:
mode:
Diffstat (limited to 'SI/Resource/Fundamentals of Data Mining/Content/Hierarchical Clustering.md')
-rw-r--r--SI/Resource/Fundamentals of Data Mining/Content/Hierarchical Clustering.md54
1 files changed, 54 insertions, 0 deletions
diff --git a/SI/Resource/Fundamentals of Data Mining/Content/Hierarchical Clustering.md b/SI/Resource/Fundamentals of Data Mining/Content/Hierarchical Clustering.md
new file mode 100644
index 0000000..096fe2c
--- /dev/null
+++ b/SI/Resource/Fundamentals of Data Mining/Content/Hierarchical Clustering.md
@@ -0,0 +1,54 @@
+---
+id: Hierarchical Clustering
+aliases:
+ - Hierarchical Clustering: Basic Concepts
+tags: []
+---
+
+## Hierarchical Clustering: Basic Concepts
+
+- Hierarchical clustering
+ - Generate a clustering hierarchy (drawn as a **dendrogram**)
+ - Not required to specify _K_, the number of clusters
+ - More deterministic
+ - No iterative refinement
+- Two categories of algorithms:
+ - **[[#Agglomerative Clustering Algorithm|Agglomerative]]**: Start with
+ sigleton clusters, continuously merge two clusters at a time to build a
+ **bottom-up** hierarchy of clusters
+ - **Divisive**: Start with a huge macro-cluster, split it continuously into
+ two groups, generating a **top-down** hierarchy of clusters
+
+## Dendrogram: Shows How Clusters are Merged
+
+![[CleanShot 2023-10-24 at 22.11.21@2x.png]]
+
+## Strengths of Hierarchical Clustering
+
+- Do not have to assume any particular number of clusters
+ - Any desired number of clusters can be obtained by 'cutting' the dendrogram
+ at the proper level
+- They may correspond to meaningful taxonomies
+ - Example in biological sciences (e.g., animal kingdom, phylogeny
+ reconstruction, ...)
+
+## Agglomerative Clustering Algorithm
+
+![[CleanShot 2023-10-24 at 22.14.30@2x.png]] ![[CleanShot 2023-10-24 at
+22.14.45@2x.png]] ![[CleanShot 2023-10-24 at 22.15.07@2x.png]] ![[CleanShot
+2023-10-24 at 22.15.23@2x.png]]
+
+## Extensions to Hierarchical Clustering
+
+- Major weaknesses of hierarchical clustering methods
+ - Can never undo what was done previously
+ - Do not scale well
+ - Time complexity of at least $O(n^2)$, where $n$ is the number of total
+ objects
+- Other hierarchical clustering algorithms
+ - BIRCH (1996): Use CF-tree and incrementally adjust the quality of
+ sub-clusters
+ - CURE (1998): Represent a cluster using a set of well-scattered
+ representative points
+ - CHAMELEON (1999): Use graph partitioning methods on the K-nearest neighbor
+ graph of the data