blob: 91faabe2cd28dd41348f7df9006895e9b937007a (
plain)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
|
---
id: external
aliases:
- Measuring Clustering Quality: ==External== Methods
tags: []
---
## Measuring Clustering Quality: ==External== Methods
- Given the **ground truth** _T, Q(C, T)_ is the **quality measure** for a
clustering C
- _Q(C, T)_ is good if it satisfies the following **four** essential criteria
- **Cluster homogeneity**
- The purer, the better
- **Cluster completeness**
- Assign objects belonging to the same category in the ground truth to the
same cluster
- **Rag bag better than alien**
- Putting a heterogeneous object into a pure cluster should be penalized
**more** than putting it into a _rag bag_ (i.e., "miscellaneous" or
"other" category)
- **Small cluster preservation**
- Splitting a small category into pieces is more harmful than splitting a
large category into pieces
## Commonly Used External Measures
- **Matching-based measure**
- Purity, maximum matching, [[F-measure]]
- **Entropy-Based Measures**
- Conditional entropy
- <u>Normalized mutual information (NMI)</u>
- Variation of information
- **Pairwise measures**
- Four possibilities: True positive (TP), FN, FP, TN
- Jaccard coefficient, Rand statistic, Fowlkes-Mallow measure
- **Correlation measures**
- Discretized Huber static, normalized discretized Huber static
- Purity vs Maximum Matching ![[CleanShot 2023-10-25 at 15.57.30@2x.png]]
- [[F-measure]] ![[CleanShot 2023-10-25 at 15.57.51@2x.png]] ![[CleanShot
2023-10-25 at 15.58.04@2x.png]] ![[CleanShot 2023-10-25 at 15.58.19@2x.png]]
![[CleanShot 2023-10-25 at 15.58.40@2x.png]]
|