diff options
Diffstat (limited to 'SI/Resource/Fundamentals of Data Mining/Content/SSE.md')
| -rw-r--r-- | SI/Resource/Fundamentals of Data Mining/Content/SSE.md | 23 |
1 files changed, 23 insertions, 0 deletions
diff --git a/SI/Resource/Fundamentals of Data Mining/Content/SSE.md b/SI/Resource/Fundamentals of Data Mining/Content/SSE.md new file mode 100644 index 0000000..007fcb7 --- /dev/null +++ b/SI/Resource/Fundamentals of Data Mining/Content/SSE.md @@ -0,0 +1,23 @@ +--- +id: SSE +aliases: + - Partitioning Algorithms: Basic Concepts +tags: [] +--- + +## Partitioning Algorithms: Basic Concepts + +- <u>Partitioning method</u>: Discovering the groupings in the data by + optimizing a specific ==objective function== and ==iteratively== improving the + quality of partitions +- _K-partitioning_ method: Partitioning a dataset _**D**_ of _**n**_ objects + into a set of _**K**_ clusters so that an objective function is optimized + (e.g., the sum of squared distances is minimized within each cluster, where + $C_k$ is the centroid or medoid of cluster $C_k$) + - A typical objective function: **Sum of Squared Errors (SSE)** $$ SSE(C) = + \sum*{k=1}^{K}\sum*{x\_{i\in{C_k}}}||x_i - c_k||^2$$ +- **Problem definition**: Given _K_, find a partition of _K clusters_ that + optimizes the chosen partitioning criterion + - Global optimal: Needs to exhaustively enumerate all partitions + - Heuristic methods (i.e., greedy algorithms): _[[K-Means]], [[K-Medians]], + [[K-Medoids]], etc_ |
