首页 > 其他 > 详细

Language Model

时间:2017-08-16 16:10:32      阅读:299      评论:0      收藏:0      [点我收藏+]

##20170801
##notes for lec2-2.pdf about language model

Evaluating a Language Model

Intuition about Perplexity

Evaluating N‐grams with Perplexity

Sparsity is Always a Problem
    Dealing with Sparsity
        General approach: modify observed counts to improve estimates
            – Discounting:  allocate  probability mass for unobserved
            events by discounting counts for observed events


            – Interpolation: approximate counts of N‐gram using  
            combination of estimates from related denser histories
            

            – Back‐off: approximate counts of unobserved N‐gram based
            on the proportion of back‐off events (e.g., N‐1 gram)

            Add‐One Smoothing
                ? We have V words in the vocabulary, N is the number of words
                in the training set
                ? Smooth observed counts by adding one to all the counts and
                renormalize
                – Unigram case:
                – Bigram  case:
                ? More general case:  add‐α, when α is added instead of one.

            Linear Interpolation

            Tuning Hyperparameters
                ? Both add‐α and linear interpolation have hyperparameters.
                ? The selection of their values is crucial for the smoothing
                The selection of their values is crucial for the smoothing
                performance
                ? Their values are tuned to maximize the likelihood of held‐out
                data
                – For linear interpolation, we will use EM to find optimal
                parameters (in few lectures)


            Kneser‐Ney Smoothing
                ? Observed n‐grams occur more in the training data than in the
                new data
                ? Absolute discounting: count*(x)=count(x)‐d
                P ad ( w i | w i ? 1 ) =
                count ( w i , w i ? 1 ) ? d
                + α P ? ( w i )
                count ( w i ? 1 )
                ? Distribute the remaining mass based on the skewness in the
                distribution of the lower order N‐gram (i.e., the number of
                words it can follow)
                P ? ( w i ) ∝ | w i ? 1 : count ( w i , w i ? 1 ) > 0 |
                ? Kneser‐Ney is repeatedly proven as very successful estimator

           

Language Model

原文:http://www.cnblogs.com/emmazha/p/7267741.html

(0)
(0)
   
举报
评论 一句话评论(0
关于我们 - 联系我们 - 留言反馈 - 联系我们:wmxa8@hotmail.com
© 2014 bubuko.com 版权所有
打开技术之扣,分享程序人生!