summaryrefslogtreecommitdiff
path: root/SI/Resource/Fundamentals of Data Mining/Content/NMI.md
diff options
context:
space:
mode:
Diffstat (limited to 'SI/Resource/Fundamentals of Data Mining/Content/NMI.md')
-rw-r--r--SI/Resource/Fundamentals of Data Mining/Content/NMI.md21
1 files changed, 21 insertions, 0 deletions
diff --git a/SI/Resource/Fundamentals of Data Mining/Content/NMI.md b/SI/Resource/Fundamentals of Data Mining/Content/NMI.md
new file mode 100644
index 0000000..d886599
--- /dev/null
+++ b/SI/Resource/Fundamentals of Data Mining/Content/NMI.md
@@ -0,0 +1,21 @@
+---
+id: NMI
+aliases:
+ - Normalized mutual information (NMI)
+tags: []
+---
+
+## Normalized mutual information (NMI)
+
+- Mutual information:
+ - Quantifies the amount of shared info between $I(C,T) =
+ \sum_{i=1}^{r}\sum{j=1}^{k}p_{ij}log\dfrac{p{ij}}{p_{c_i}p_{T_j}}$
+ - Measures the dependency between the observed joint probability $p_{ij}$ of
+ $C$ and $T$, and the expected joint probability $p_{Ci} * p_P{Tj}$ under the
+ independence assumption
+ - When $C$ and $T$ are independent, $p_{ij} = p_{Ci} * p_{Tj}, I(C, T) = 0$.
+ However, there is no upper bound on the mutual information
+- **Normalized mutual information (NMI)** $$N M I(C, T) =
+ \sqrt{\dfrac{I(C,T)}{H(C)}*\dfrac{I(C, T)}{H(T)}} = \dfrac{I(C,
+ T)}{\sqrt{H(C) * H(T)}}$$
+ - Value range of NMI: [0, 1]. Value close to 1 indicates a good clustering