### 1. 起点

$KL\left[q \parallel p\right] = \sum_x q(x) \log \frac{q(x)}{p(x)}$

### 2. 过程

\begin{aligned} KL\left[q(z\mid X) \parallel p(z\mid X) \right] &= \sum_z q(z\mid X) \log \frac{q(z\mid X)}{p(z \mid X)} \\ &= \sum_z q(z\mid X) \left[ \log q(z\mid X) - \log p(z \mid X) \right] \\ &= \sum_z q(z\mid X) \left[ \log q(z\mid X) - \log p(z, X) + \log p(X) \right] \\ &= \sum_z q(z\mid X) \left[ \log q(z\mid X) - \log p(z, X) \right] + \sum_z q(z\mid X) \log p(X) \\ &= \sum_z q(z\mid X) \left[ \log q(z\mid X) - \log p(z, X) \right] + \log p(X) \end{aligned}

\begin{aligned} KL\left[q(z\mid X) \parallel p(z\mid X) \right] &= \sum_z q(z\mid X) \left[ \log q(z\mid X) - \log p(z, X) \right] + \log p(X) \\ &= \sum_z q(z\mid X) \left[ \log q(z\mid X) - \log p(z) - \log p(X \mid z) \right] + \log p(X) \\ &= \sum_z q(z\mid X) \left[ \log q(z\mid X) - \log p(z) \right] - \sum_z q(z\mid X) \log p(X \mid z) + \log p(X) \\ &= KL \left[ q(z\mid X) \parallel p(z) \right] - \mathbb{E}_{z \sim q(z\mid X)} \log p(X \mid z) + \log p(X) \end{aligned}

• 第一项就是编码器 q 模型与隐变量 z 的先验分布之间的 KL 散度，可以解释为让编码器的输出与先验尽可能接近，实际操作中先验一般用多元高斯或者 vMF 分布。
• 第二项就是从编码器采样 z 之后用解码器重建 X 得到的对数似然，可以解释为让解码器将隐变量能尽可能把隐变量 z 还原成编码器的输入 X。
• 第三项是唯一可观测的变量 X 的边际似然。

$\log p(X) \geq - KL \left[ q(z\mid X) \parallel p(z) \right] + \mathbb{E}_{z \sim q(z\mid X)} \log p(X \mid z)$

# 参考文献

1. Diederik P. Kingma and Max Welling. 2013. Auto-Encoding Variational Bayes. arXiv:1312.6114 [cs, stat], December. arXiv: 1312.6114.
2. Dipendra Misra, Ming-Wei Chang, Xiaodong He, and Wen-tau Yih. 2018. Policy Shaping and Generalized Update Equations for Semantic Parsing from Denotations. In Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, pages 2442–2452, Brussels, Belgium. Association for Computational Linguistics.
3. http://blog.shakirm.com/2015/10/machine-learning-trick-of-the-day-4-reparameterisation-tricks/
4. Eric Jang, Shixiang Gu, and Ben Poole. 2016. Categorical Reparameterization with Gumbel-Softmax. arXiv:1611.01144 [cs, stat], November. arXiv: 1611.01144.
5. Xuanfu Wu, Yang Feng, and Chenze Shao. 2020. Generating Diverse Translation from Model Distribution with Dropout. In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP), pages 1088–1097, Online, November. Association for Computational Linguistics.
6. Hai Ye, Wenjie Li, and Lu Wang. 2019. Jointly Learning Semantic Parser and Natural Language Generator via Dual Information Maximization. In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, pages 2090–2101, Florence, Italy, July. Association for Computational Linguistics.