relation with previous chapters
initial discriminant function is where $w_i$ is the class tag, and $P(w_i)$ is somewhat known as prior
$p(x \vert w_i)$ is assumed to have some form of distribution, with parameters $\mu$. Using MLE or bayesian estimation to get $\hat\mu$.
Also, we can use data to compute it in non-parameterized way
Assume the given form of discriminant function, determine its parameter based on samples.
- No need to known the generative models.
- Actually a non-parameter model
- Given discriminant function with known form but unknown parameters
- train with samples
- do classification
linear decision boundary
- one-vs-all: one class and all other classes
- one-vs-one, for all one-one pair
- one-vs-others, in pace, 1coc(???)
ambiguous cases exists, thus change to the new decision rule:
generalized linear discriminant function
assume linear separable
- batch learning
- linear (L1)
- square (L2)
- relaxed: large error sample affect much