, two classes prob.

by convention, P is discrete, p is continuous

to compute <- posterior (data known then class) bayesian formula: ….

concepts: …

posterior + (prior (no info)) -> max likelihood

A posterior could serve as prior for the next posterior (争议)

(likelihood sum for multiple classes may > 1 only for continuous distribution)

scheme 1 min error

two-class error:

min error

in practice, we may use a decision boundary rather than the integral of a complex distribution.

scheme 2 min risk

sometimes, prior is too influential compared to the likelihood (observation),

so we introduce risk for every decision:

expected loss:

min risk: c may be larger than the number of class (拒识)

could be some hyperparameters to be learnt in other ways (like using incremental learning)

some makes the min-risk degraded to min-error

learning the risk: ??? (section 2.4) (paricle group/gene / blabla..) rather than gradient descent

determinant func, decision boundary

广义似然函数: , many different forms of decision:

decision boundary: (abstract: man not a line, a curve, or ….)

determinant func: g(x) <= g1(x) - g2(x)

to determin decision boundary or determinant func, in gaussian distribution

gaussian: blablabla (or bell function) and its statistics: mean, square error, entropy, blablabla…

and multivariate gaussian: blabla

  1. and determine the distribution
  2. 等密度点
  3. 不相关等价于独立
  4. 边缘、条件分布正态
  5. 线性变换正态:
  6. 线性组合正态:

白化变换: (ellipse -> round)

bayesian in gaussian

determinant: choose $$g_i(x) = \ln p(x w_i) + \ln P(w_i)$$

case 1: 可得二类决策面

case 2:

case 3: is arbitrary … determinant form: … 2-class decision boundary: … (超二次曲面)

classification error