relation with previous chapters

initial discriminant function is where $w_i$ is the class tag, and $P(w_i)$ is somewhat known as prior

$p(x \vert w_i)$ is assumed to have some form of distribution, with parameters $\mu$. Using MLE or bayesian estimation to get $\hat\mu$.

Also, we can use data to compute it in non-parameterized way

introduction

Assume the given form of discriminant function, determine its parameter based on samples.

  • No need to known the generative models.
  • Actually a non-parameter model

Steps:

  1. Given discriminant function with known form but unknown parameters
  2. train with samples
  3. do classification

linear decision boundary

blabla

multi-class classifier

  • one-vs-all: one class and all other classes
  • one-vs-one, for all one-one pair
  • one-vs-others, in pace, 1coc(???)

ambiguous cases exists, thus change to the new decision rule:

ooo.jpg

generalized linear discriminant function

percetron

assume linear separable

  • batch learning
  • fixed-increment

relax

  • linear (L1)
  • square (L2)
  • relaxed: large error sample affect much