Sequence Model - Sequence Models & Attention Mechanism

时间：2021-08-24 09:41:27 阅读：24 评论：0 收藏：0 [点我收藏+]

Various Sequence To Sequence Architectures

Basic Models

Sequence to sequence model

Image captioning

use CNN(AlexNet) first to get a 4096-dimensional vector, feed it to a RNN

Picking the Most Likely Sentence

translate a French sentence \(x\) to the most likely English sentence \(y\) .

it‘s to find

\[\argmax_{y^{<1>}, \dots, y^{<T_y>}} P(y^{<1>}, \dots, y^{<T_y>} | x) \]

Why not a greedy search?

(Find the most likely words one by one) Because it may be verbose and long.

Beam Search

set the \(B = 3 \text{(beam width)}\), find \(3\) most likely English outputs
consider each for the most likely second word, and then find \(B\) most likely words
do it again until \(<EOS>\)

if \(B = 1\), it‘s just greedy search.

Length normalization

\[\argmax_{y} \prod_{t = 1}^{T_y} P(y^{<t>}|x, y^{<1>}, y^{<t - 1>}) \]

\(P\) is much less than \(1\) (close to \(0\)) take \(\log\)

\[\argmax_{y} \sum_{t = 1}^{T_y} \log P(y^{<t>}|x, y^{<1>}, y^{<t - 1>}) \]

it tends to give the short sentences.

So you can normalize it (\(\alpha\) is a hyperparameter)

\[\argmax_{y} \frac 1 {T_y^{\alpha}} \sum_{t = 1}^{T_y} \log P(y^{<t>}|x, y^{<1>}, y^{<t - 1>}) \]

Beam search discussion

large \(B\) : better result, slower
small \(B\) : worse result, faster

Error Analysis in Beam Search

let \(y^*\) be human high quality translation, and \(\hat y\) be algorithm output.

\(P(y^* | x) > P(\hat y | x)\) : Beam search is at fault
\(P(y^* | x) \le P(\hat y | x)\) : RNN model is at fault

Bleu(bilingual evaluation understudy) Score

if you have some good referrences to evaluate the score.

\[p_n = \frac{\sum_{\text{n-grams} \in \hat y} \text{Count}_{\text{clip}}(\text{n-grams})} {\sum_{\text{n-grams} \in \hat y} \text{Count}(\text{n-grams})} \]

Bleu details

calculate it with \(\exp(\frac{1}{4} \sum_{n = 1}^4 p_n)\)

BP = brevity penalty

\[BP = \begin{cases} 1 & \text{if~~MT\_output\_length > reference\_output\_length}\\exp(1 - \text{reference\_output\_length / MT\_output\_length}) & \text{otherwise} \end{cases} \]

don‘t want short translation.

Attention Model Intuition

it‘s hard for network to memorize the whole sentence.

compute the attention weight to predict the word from the context

Attention Model

Use a BiRNN or BiLSTM.

\[\begin{aligned} a^{<t‘>} &= (\vec a^{<t‘>}, \overleftarrow a^{<t‘>})\\sum_{t‘} \alpha^{<i, t‘>} &= 1\c^{<i>} &= \sum_{t‘} \alpha^{<i, t‘>} \alpha^{<t‘>} \end{aligned} \]

Computing attention

\[\begin{aligned} \alpha^{<t, t‘>} &= \text{amount of "attention" } y^{<t>} \text{ should pay to } a^{<t‘>}\&= \frac{\exp(e^{<t, t‘>})}{\sum_{t‘ = 1}^{T_x} \exp(e^{<t, t‘>})} \end{aligned} \]

train a very small network to learn what the function is

the complexity is \(\mathcal O(T_x T_y)\) , which is so big (quadratic cost)

Speech Recognition - Audio Data

Speech recognition

\(x(\text{audio clip}) \to y(\text{transcript})\)

Attention model for sppech recognition

generate character by character

CTC cost for speech recognition

CTC(Connectionist temporal classification)

"ttt_h_eee___ ____qqq\(\dots\)" \(\rightarrow\) "the quick brown fox"

Basic rule: collapse repeated characters not separated by "blank"

Trigger Word Detection

label the trigger word, let the output be \(1\)s

Sequence Model - Sequence Models & Attention Mechanism

原文：https://www.cnblogs.com/zjp-shadow/p/15178221.html

踩

(0)

评论一句话评论（0）

分享档案

更多>

2021年09月23日 (328)
2021年09月24日 (313)
2021年09月17日 (191)
2021年09月15日 (369)
2021年09月16日 (411)
2021年09月13日 (439)
2021年09月11日 (398)
2021年09月12日 (393)
2021年09月10日 (160)
2021年09月08日 (222)

Sequence Model - Sequence Models & Attention Mechanism

Various Sequence To Sequence Architectures

Basic Models

Sequence to sequence model

Image captioning

Picking the Most Likely Sentence

Beam Search

Refinements to beam search

Length normalization

Beam search discussion

Error Analysis in Beam Search

Bleu(bilingual evaluation understudy) Score

Bleu details

Attention Model Intuition

Attention Model

Computing attention

Speech Recognition - Audio Data

Speech recognition

Attention model for sppech recognition

CTC cost for speech recognition

Trigger Word Detection