summaryrefslogtreecommitdiff
path: root/SI/Resource/Fundamentals of Data Mining/Content/external.md
diff options
context:
space:
mode:
Diffstat (limited to 'SI/Resource/Fundamentals of Data Mining/Content/external.md')
-rw-r--r--SI/Resource/Fundamentals of Data Mining/Content/external.md42
1 files changed, 42 insertions, 0 deletions
diff --git a/SI/Resource/Fundamentals of Data Mining/Content/external.md b/SI/Resource/Fundamentals of Data Mining/Content/external.md
new file mode 100644
index 0000000..91faabe
--- /dev/null
+++ b/SI/Resource/Fundamentals of Data Mining/Content/external.md
@@ -0,0 +1,42 @@
+---
+id: external
+aliases:
+ - Measuring Clustering Quality: ==External== Methods
+tags: []
+---
+
+## Measuring Clustering Quality: ==External== Methods
+
+- Given the **ground truth** _T, Q(C, T)_ is the **quality measure** for a
+ clustering C
+- _Q(C, T)_ is good if it satisfies the following **four** essential criteria
+ - **Cluster homogeneity**
+ - The purer, the better
+ - **Cluster completeness**
+ - Assign objects belonging to the same category in the ground truth to the
+ same cluster
+ - **Rag bag better than alien**
+ - Putting a heterogeneous object into a pure cluster should be penalized
+ **more** than putting it into a _rag bag_ (i.e., "miscellaneous" or
+ "other" category)
+ - **Small cluster preservation**
+ - Splitting a small category into pieces is more harmful than splitting a
+ large category into pieces
+
+## Commonly Used External Measures
+
+- **Matching-based measure**
+ - Purity, maximum matching, [[F-measure]]
+- **Entropy-Based Measures**
+ - Conditional entropy
+ - <u>Normalized mutual information (NMI)</u>
+ - Variation of information
+- **Pairwise measures**
+ - Four possibilities: True positive (TP), FN, FP, TN
+ - Jaccard coefficient, Rand statistic, Fowlkes-Mallow measure
+- **Correlation measures**
+ - Discretized Huber static, normalized discretized Huber static
+- Purity vs Maximum Matching ![[CleanShot 2023-10-25 at 15.57.30@2x.png]]
+- [[F-measure]] ![[CleanShot 2023-10-25 at 15.57.51@2x.png]] ![[CleanShot
+2023-10-25 at 15.58.04@2x.png]] ![[CleanShot 2023-10-25 at 15.58.19@2x.png]]
+ ![[CleanShot 2023-10-25 at 15.58.40@2x.png]]