course notes


find the best among those feasible models for linear-separable data


linear separable, hard margin

1. 分类正确,且没有尺度:

方便起见 c 取 1

2. 最优,距离最大:

is equivalent to


non-linear, soft margin

not linear separable or margin is too small

experiment, C 2^-5 ... 2^15     1 .. .. ..   2 .. ..    ... ..     10      sum .. .. ..    

find the C that yields the smallest summed error

non-linear SVM

map every sample to high dimensional space

the map function is hard to find but it’s easier to find the dot product of the mapped sample

the kernel

the kernel matrix should be

  • symmetry
  • semi positive definite
  • Mercer’s condition

multi-kernel learning



multi-class problem: Error-Correcting Output-Codes

reading notes


maximize margin

Lagrange multiplier

why lagrange multiplier works

problem solving and duality

dual problem transform

KKT condition

soft margin

add offset to margin

kernel methods

final form contains dot product

Mercey’s condition

what makes a kernel kernel