summaryrefslogtreecommitdiff
path: root/SI/Resource/Fundamentals of Data Mining/Content/SSE.md
diff options
context:
space:
mode:
authorTheSiahxyz <164138827+TheSiahxyz@users.noreply.github.com>2024-04-29 22:06:12 -0400
committerTheSiahxyz <164138827+TheSiahxyz@users.noreply.github.com>2024-04-29 22:06:12 -0400
commit4d53fa14ee0cd615444aca6f6ba176e0ccc1b5be (patch)
tree4d9f0527d9e6db4f92736ead0aa9bb3f840a0f89 /SI/Resource/Fundamentals of Data Mining/Content/SSE.md
init
Diffstat (limited to 'SI/Resource/Fundamentals of Data Mining/Content/SSE.md')
-rw-r--r--SI/Resource/Fundamentals of Data Mining/Content/SSE.md23
1 files changed, 23 insertions, 0 deletions
diff --git a/SI/Resource/Fundamentals of Data Mining/Content/SSE.md b/SI/Resource/Fundamentals of Data Mining/Content/SSE.md
new file mode 100644
index 0000000..007fcb7
--- /dev/null
+++ b/SI/Resource/Fundamentals of Data Mining/Content/SSE.md
@@ -0,0 +1,23 @@
+---
+id: SSE
+aliases:
+ - Partitioning Algorithms: Basic Concepts
+tags: []
+---
+
+## Partitioning Algorithms: Basic Concepts
+
+- <u>Partitioning method</u>: Discovering the groupings in the data by
+ optimizing a specific ==objective function== and ==iteratively== improving the
+ quality of partitions
+- _K-partitioning_ method: Partitioning a dataset _**D**_ of _**n**_ objects
+ into a set of _**K**_ clusters so that an objective function is optimized
+ (e.g., the sum of squared distances is minimized within each cluster, where
+ $C_k$ is the centroid or medoid of cluster $C_k$)
+ - A typical objective function: **Sum of Squared Errors (SSE)** $$ SSE(C) =
+ \sum*{k=1}^{K}\sum*{x\_{i\in{C_k}}}||x_i - c_k||^2$$
+- **Problem definition**: Given _K_, find a partition of _K clusters_ that
+ optimizes the chosen partitioning criterion
+ - Global optimal: Needs to exhaustively enumerate all partitions
+ - Heuristic methods (i.e., greedy algorithms): _[[K-Means]], [[K-Medians]],
+ [[K-Medoids]], etc_