LDA学习总结

时间：2014-05-08 13:16:00 阅读：560 评论：0 收藏：0 [点我收藏+]

1. What is the LDA?

LDA(latent dilichlet allocation) is a method to assign the topic (distribution) of a given document. However, note that this model is not necessarilly tied to text applications. The complementary applications can refer to original paper[1]. To have a simple overview of this algorithm, refer to [2].

2. Why does it outperform pLSI?

pLSI is another topic model which also involves mixture concept of topics in a document. But pLSI lack the generative procedure of topics estimation which can be solved appropriately by LDA. There is a topic distribution for each document. Hence the parameters for this corpus increase in order of corpus size. Thus, this model will suffer from the overfitting problem. The 5-th section in blog[2] specifically explained this description.

3. Why to use the variational inference to approximite posterior distribution?

The optimization method used in [1] is variational EM, which is a little more difficult(inconvenient) than gibbs sampling method. Recall the EM algorithm, we need to firstly find the Q function which is the expectation of the complete-data log likelihood with respect to the posterior distribution of the latent variables and then update parameters of this certain model. The main problem in using EM algorithm is to calculate the posterior distribution, i.e. $p(\theta,z| w, \alpha, \beta)$. But due to the complexity of $p(w|\alpha,\beta.)$, it is intractable. Hence variational inference is an alternative left for approximation.

Reference:

[1] Blei D M, Ng A Y, Jordan M I. Latent dirichlet allocation[J]. the Journal of machine Learning research, 2003, 3: 993-1022.

[2] LDA概念解析

LDA学习总结,布布扣,bubuko.com

LDA学习总结

原文：http://www.cnblogs.com/wead-hsu/p/3713966.html

踩

(0)

评论一句话评论（0）

分享档案

更多>

2021年09月23日 (328)
2021年09月24日 (313)
2021年09月17日 (191)
2021年09月15日 (369)
2021年09月16日 (411)
2021年09月13日 (439)
2021年09月11日 (398)
2021年09月12日 (393)
2021年09月10日 (160)
2021年09月08日 (222)