From 4d53fa14ee0cd615444aca6f6ba176e0ccc1b5be Mon Sep 17 00:00:00 2001 From: TheSiahxyz <164138827+TheSiahxyz@users.noreply.github.com> Date: Mon, 29 Apr 2024 22:06:12 -0400 Subject: init --- .../Fundamentals of Data Mining/Content/SSE.md | 23 ++++++++++++++++++++++ 1 file changed, 23 insertions(+) create mode 100644 SI/Resource/Fundamentals of Data Mining/Content/SSE.md (limited to 'SI/Resource/Fundamentals of Data Mining/Content/SSE.md') diff --git a/SI/Resource/Fundamentals of Data Mining/Content/SSE.md b/SI/Resource/Fundamentals of Data Mining/Content/SSE.md new file mode 100644 index 0000000..007fcb7 --- /dev/null +++ b/SI/Resource/Fundamentals of Data Mining/Content/SSE.md @@ -0,0 +1,23 @@ +--- +id: SSE +aliases: + - Partitioning Algorithms: Basic Concepts +tags: [] +--- + +## Partitioning Algorithms: Basic Concepts + +- Partitioning method: Discovering the groupings in the data by + optimizing a specific ==objective function== and ==iteratively== improving the + quality of partitions +- _K-partitioning_ method: Partitioning a dataset _**D**_ of _**n**_ objects + into a set of _**K**_ clusters so that an objective function is optimized + (e.g., the sum of squared distances is minimized within each cluster, where + $C_k$ is the centroid or medoid of cluster $C_k$) + - A typical objective function: **Sum of Squared Errors (SSE)** $$ SSE(C) = + \sum*{k=1}^{K}\sum*{x\_{i\in{C_k}}}||x_i - c_k||^2$$ +- **Problem definition**: Given _K_, find a partition of _K clusters_ that + optimizes the chosen partitioning criterion + - Global optimal: Needs to exhaustively enumerate all partitions + - Heuristic methods (i.e., greedy algorithms): _[[K-Means]], [[K-Medians]], + [[K-Medoids]], etc_ -- cgit v1.2.3