lvwang@nlpr.ia.ac.cn

## intro

$P(\omega_1)$, $P(\omega_2)$ two classes prob.

by convention, P is discrete, p is continuous

to compute $P(\omega_1 \vert \bf{x})$ <- posterior (data known then class) bayesian formula: ….

concepts: …

posterior + (prior (no info)) -> max likelihood

A posterior could serve as prior for the next posterior (争议)

(likelihood sum for multiple classes may > 1 only for continuous distribution)

## scheme 1 min error

two-class error:

min error

in practice, we may use a decision boundary rather than the integral of a complex distribution.

## scheme 2 min risk

sometimes, prior is too influential compared to the likelihood (observation),

so we introduce risk for every decision: $risk: \lambda_{ij} = \lambda(\alpha_i, \omega_j)$

expected loss:

min risk: $\min_\alpha R(\alpha \vert x)$ c may be larger than the number of class （拒识）

$\lambda$ could be some hyperparameters to be learnt in other ways (like using incremental learning)

some $\lambda$ makes the min-risk degraded to min-error

learning the risk: ??? (section 2.4) (paricle group/gene / blabla..) rather than gradient descent

## determinant func, decision boundary

decision boundary: (abstract: man not a line, a curve, or ….)

determinant func: g(x) <= g1(x) - g2(x)

## to determin decision boundary or determinant func, in gaussian distribution

gaussian: blablabla (or bell function) and its statistics: mean, square error, entropy, blablabla…

and multivariate gaussian: blabla

1. $\mu$ and $\Sigma$ determine the distribution
2. 等密度点 $r^2=blabla$
3. 不相关等价于独立
4. 边缘、条件分布正态
5. 线性变换正态: $y=A^Tx \Rightarrow p(y) \sim N(A^t\mu, A^t\Sigma A)$
6. 线性组合正态:

### bayesian in gaussian

 determinant: choose $$g_i(x) = \ln p(x w_i) + \ln P(w_i)$$

case 1: $\Sigma_i=\sigma^2I,i=1,2,\dots,c$ 可得二类决策面 $w^T(x-x_0)=0$

case 2: $\Sigma_i=\Sigma, i=1,2,\dots,c$

case 3: $\Sigma_i$ is arbitrary … determinant form: … 2-class decision boundary: … (超二次曲面)