# Ch5 linear discriminant functions

## relation with previous chapters

initial discriminant function is where $w_i$ is the class tag, and $P(w_i)$ is somewhat known as prior

$p(x \vert w_i)$ is assumed to have some form of distribution, with parameters $\mu$. Using MLE or bayesian estimation to get $\hat\mu$.

Also, we can use data to compute it in non-parameterized way

## introduction

Assume the given form of discriminant function, determine its parameter based on samples.

- No need to known the generative models.
- Actually a non-parameter model

Steps:

- Given discriminant function with known form but unknown parameters
- train with samples
- do classification

## linear decision boundary

blabla

### multi-class classifier

- one-vs-all: one class and all other classes
- one-vs-one, for all one-one pair
- one-vs-others, in pace, 1coc(???)

ambiguous cases exists, thus change to the new decision rule:

## generalized linear discriminant function

## percetron

assume linear separable

- batch learning
- fixed-increment

## relax

- linear (L1)
- square (L2)
- relaxed: large error sample affect much