Language Model

时间：2017-08-16 16:10:32 阅读：299 评论：0 收藏：0 [点我收藏+]

##20170801
##notes for lec2-2.pdf about language model

Evaluating a Language Model

Intuition about Perplexity

Evaluating N‐grams with Perplexity

Sparsity is Always a Problem
   Dealing with Sparsity
       General approach: modify observed counts to improve estimates
           – Discounting: allocate probability mass for unobserved
           events by discounting counts for observed events

           – Interpolation: approximate counts of N‐gram using
           combination of estimates from related denser histories


           – Back‐off: approximate counts of unobserved N‐gram based
           on the proportion of back‐off events (e.g., N‐1 gram)

           Add‐One Smoothing
               ? We have V words in the vocabulary, N is the number of words
               in the training set
               ? Smooth observed counts by adding one to all the counts and
               renormalize
               – Unigram case:
               – Bigram case:
               ? More general case: add‐α, when α is added instead of one.

           Linear Interpolation

           Tuning Hyperparameters
               ? Both add‐α and linear interpolation have hyperparameters.
               ? The selection of their values is crucial for the smoothing
               The selection of their values is crucial for the smoothing
               performance
               ? Their values are tuned to maximize the likelihood of held‐out
               data
               – For linear interpolation, we will use EM to find optimal
               parameters (in few lectures)

           Kneser‐Ney Smoothing
               ? Observed n‐grams occur more in the training data than in the
               new data
               ? Absolute discounting: count*(x)=count(x)‐d
               P ad ( w i | w i ? 1 ) =
               count ( w i , w i ? 1 ) ? d
               + α P ? ( w i )
               count ( w i ? 1 )
               ? Distribute the remaining mass based on the skewness in the
               distribution of the lower order N‐gram (i.e., the number of
               words it can follow)
               P ? ( w i ) ∝ | w i ? 1 : count ( w i , w i ? 1 ) > 0 |
               ? Kneser‐Ney is repeatedly proven as very successful estimator

Language Model

原文：http://www.cnblogs.com/emmazha/p/7267741.html

踩

(0)

评论一句话评论（0）

分享档案

更多>

2021年09月23日 (328)
2021年09月24日 (313)
2021年09月17日 (191)
2021年09月15日 (369)
2021年09月16日 (411)
2021年09月13日 (439)
2021年09月11日 (398)
2021年09月12日 (393)
2021年09月10日 (160)
2021年09月08日 (222)