relation with previous chapters

initial discriminant function is where $w_i$ is the class tag, and $P(w_i)$ is somewhat known as prior

$p(x \vert w_i)$ is assumed to have some form of distribution, with parameters $\mu$. Using MLE or bayesian estimation to get $\hat\mu$.

Also, we can use data to compute it in non-parameterized way


Assume the given form of discriminant function, determine its parameter based on samples.

  • No need to known the generative models.
  • Actually a non-parameter model


  1. Given discriminant function with known form but unknown parameters
  2. train with samples
  3. do classification

linear decision boundary


multi-class classifier

  • one-vs-all: one class and all other classes
  • one-vs-one, for all one-one pair
  • one-vs-others, in pace, 1coc(???)

ambiguous cases exists, thus change to the new decision rule:


generalized linear discriminant function


assume linear separable

  • batch learning
  • fixed-increment


  • linear (L1)
  • square (L2)
  • relaxed: large error sample affect much