diff options
Diffstat (limited to 'SI/Resource/Fundamentals of Data Mining/Content/external.md')
| -rw-r--r-- | SI/Resource/Fundamentals of Data Mining/Content/external.md | 42 |
1 files changed, 42 insertions, 0 deletions
diff --git a/SI/Resource/Fundamentals of Data Mining/Content/external.md b/SI/Resource/Fundamentals of Data Mining/Content/external.md new file mode 100644 index 0000000..91faabe --- /dev/null +++ b/SI/Resource/Fundamentals of Data Mining/Content/external.md @@ -0,0 +1,42 @@ +--- +id: external +aliases: + - Measuring Clustering Quality: ==External== Methods +tags: [] +--- + +## Measuring Clustering Quality: ==External== Methods + +- Given the **ground truth** _T, Q(C, T)_ is the **quality measure** for a + clustering C +- _Q(C, T)_ is good if it satisfies the following **four** essential criteria + - **Cluster homogeneity** + - The purer, the better + - **Cluster completeness** + - Assign objects belonging to the same category in the ground truth to the + same cluster + - **Rag bag better than alien** + - Putting a heterogeneous object into a pure cluster should be penalized + **more** than putting it into a _rag bag_ (i.e., "miscellaneous" or + "other" category) + - **Small cluster preservation** + - Splitting a small category into pieces is more harmful than splitting a + large category into pieces + +## Commonly Used External Measures + +- **Matching-based measure** + - Purity, maximum matching, [[F-measure]] +- **Entropy-Based Measures** + - Conditional entropy + - <u>Normalized mutual information (NMI)</u> + - Variation of information +- **Pairwise measures** + - Four possibilities: True positive (TP), FN, FP, TN + - Jaccard coefficient, Rand statistic, Fowlkes-Mallow measure +- **Correlation measures** + - Discretized Huber static, normalized discretized Huber static +- Purity vs Maximum Matching ![[CleanShot 2023-10-25 at 15.57.30@2x.png]] +- [[F-measure]] ![[CleanShot 2023-10-25 at 15.57.51@2x.png]] ![[CleanShot +2023-10-25 at 15.58.04@2x.png]] ![[CleanShot 2023-10-25 at 15.58.19@2x.png]] + ![[CleanShot 2023-10-25 at 15.58.40@2x.png]] |
