# Ch8 data clustering

## intro

clusters can be different in different tasks for the same dataset

different methods

- …
- …
- …

## various similarity metrics

… blabla …

## mixture function

designate domain-specific assumptions

Assumption: data is generated on mixture function

### MLE for parameters of density function

#### gaussian density \mu unknown

#### gaussian density \mu, \sigma, prior unknown

### bayesian estimation

blabla …

## hierarchical clustering

- bottom-up
- min
- max

- top-down

**bottom-up:**

two samples per time, take the two with smallest distance as a cluster, update the dist between new cluster and others using the min/max distance

## spectral clustering

- PCA
- LDA(linear determinant analysis
- spectrum embedding (in manifold learning)
- …

adjacency matrix -> generalized to -> similarity matrix (weighted)

- full connectivity
- local
- k-nn
- $\epsilon$-radius

find a cut of a graph that makes some criteria optimal.

### min binary cut

to minimize:

however, the result ofter just only separate a single outlier point as a cluster, and others in a total form a cluster

-> normalize the cut with similar size for both clusters

- use cardinality (count of vertices in a set)
- use volume (sum of degree of vertices in a set)

to solve this in the following secitons

### laplace matrix properties

- …
- …
- …
- …
- number of the eigen vectors of a laplace matrix w.r.t the eigen value 0, equals to the number of connected subgraphs of the graph

#### explanation

- classical
- Ncut
- Ng’s algo.