--- id: Hierarchical Clustering aliases: - Hierarchical Clustering: Basic Concepts tags: [] --- ## Hierarchical Clustering: Basic Concepts - Hierarchical clustering - Generate a clustering hierarchy (drawn as a **dendrogram**) - Not required to specify _K_, the number of clusters - More deterministic - No iterative refinement - Two categories of algorithms: - **[[#Agglomerative Clustering Algorithm|Agglomerative]]**: Start with sigleton clusters, continuously merge two clusters at a time to build a **bottom-up** hierarchy of clusters - **Divisive**: Start with a huge macro-cluster, split it continuously into two groups, generating a **top-down** hierarchy of clusters ## Dendrogram: Shows How Clusters are Merged ![[CleanShot 2023-10-24 at 22.11.21@2x.png]] ## Strengths of Hierarchical Clustering - Do not have to assume any particular number of clusters - Any desired number of clusters can be obtained by 'cutting' the dendrogram at the proper level - They may correspond to meaningful taxonomies - Example in biological sciences (e.g., animal kingdom, phylogeny reconstruction, ...) ## Agglomerative Clustering Algorithm ![[CleanShot 2023-10-24 at 22.14.30@2x.png]] ![[CleanShot 2023-10-24 at 22.14.45@2x.png]] ![[CleanShot 2023-10-24 at 22.15.07@2x.png]] ![[CleanShot 2023-10-24 at 22.15.23@2x.png]] ## Extensions to Hierarchical Clustering - Major weaknesses of hierarchical clustering methods - Can never undo what was done previously - Do not scale well - Time complexity of at least $O(n^2)$, where $n$ is the number of total objects - Other hierarchical clustering algorithms - BIRCH (1996): Use CF-tree and incrementally adjust the quality of sub-clusters - CURE (1998): Represent a cluster using a set of well-scattered representative points - CHAMELEON (1999): Use graph partitioning methods on the K-nearest neighbor graph of the data