diff options
Diffstat (limited to 'SI/Resource/Fundamentals of Data Mining/Content/Hierarchical Clustering.md')
| -rw-r--r-- | SI/Resource/Fundamentals of Data Mining/Content/Hierarchical Clustering.md | 54 |
1 files changed, 54 insertions, 0 deletions
diff --git a/SI/Resource/Fundamentals of Data Mining/Content/Hierarchical Clustering.md b/SI/Resource/Fundamentals of Data Mining/Content/Hierarchical Clustering.md new file mode 100644 index 0000000..096fe2c --- /dev/null +++ b/SI/Resource/Fundamentals of Data Mining/Content/Hierarchical Clustering.md @@ -0,0 +1,54 @@ +--- +id: Hierarchical Clustering +aliases: + - Hierarchical Clustering: Basic Concepts +tags: [] +--- + +## Hierarchical Clustering: Basic Concepts + +- Hierarchical clustering + - Generate a clustering hierarchy (drawn as a **dendrogram**) + - Not required to specify _K_, the number of clusters + - More deterministic + - No iterative refinement +- Two categories of algorithms: + - **[[#Agglomerative Clustering Algorithm|Agglomerative]]**: Start with + sigleton clusters, continuously merge two clusters at a time to build a + **bottom-up** hierarchy of clusters + - **Divisive**: Start with a huge macro-cluster, split it continuously into + two groups, generating a **top-down** hierarchy of clusters + +## Dendrogram: Shows How Clusters are Merged + +![[CleanShot 2023-10-24 at 22.11.21@2x.png]] + +## Strengths of Hierarchical Clustering + +- Do not have to assume any particular number of clusters + - Any desired number of clusters can be obtained by 'cutting' the dendrogram + at the proper level +- They may correspond to meaningful taxonomies + - Example in biological sciences (e.g., animal kingdom, phylogeny + reconstruction, ...) + +## Agglomerative Clustering Algorithm + +![[CleanShot 2023-10-24 at 22.14.30@2x.png]] ![[CleanShot 2023-10-24 at +22.14.45@2x.png]] ![[CleanShot 2023-10-24 at 22.15.07@2x.png]] ![[CleanShot +2023-10-24 at 22.15.23@2x.png]] + +## Extensions to Hierarchical Clustering + +- Major weaknesses of hierarchical clustering methods + - Can never undo what was done previously + - Do not scale well + - Time complexity of at least $O(n^2)$, where $n$ is the number of total + objects +- Other hierarchical clustering algorithms + - BIRCH (1996): Use CF-tree and incrementally adjust the quality of + sub-clusters + - CURE (1998): Represent a cluster using a set of well-scattered + representative points + - CHAMELEON (1999): Use graph partitioning methods on the K-nearest neighbor + graph of the data |
