use CNN(AlexNet) first to get a 4096-dimensional vector, feed it to a RNN
translate a French sentence \(x\) to the most likely English sentence \(y\) .
it‘s to find
Why not a greedy search?
(Find the most likely words one by one) Because it may be verbose and long.
set the \(B = 3 \text{(beam width)}\), find \(3\) most likely English outputs
consider each for the most likely second word, and then find \(B\) most likely words
do it again until \(<EOS>\)
if \(B = 1\), it‘s just greedy search.
\(P\) is much less than \(1\) (close to \(0\)) take \(\log\)
it tends to give the short sentences.
So you can normalize it (\(\alpha\) is a hyperparameter)
let \(y^*\) be human high quality translation, and \(\hat y\) be algorithm output.
if you have some good referrences to evaluate the score.
calculate it with \(\exp(\frac{1}{4} \sum_{n = 1}^4 p_n)\)
BP = brevity penalty
don‘t want short translation.
it‘s hard for network to memorize the whole sentence.
compute the attention weight to predict the word from the context
Use a BiRNN or BiLSTM.
train a very small network to learn what the function is
the complexity is \(\mathcal O(T_x T_y)\) , which is so big (quadratic cost)
\(x(\text{audio clip}) \to y(\text{transcript})\)
generate character by character
CTC(Connectionist temporal classification)
"ttt_h_eee___ ____qqq\(\dots\)" \(\rightarrow\) "the quick brown fox"
Basic rule: collapse repeated characters not separated by "blank"
label the trigger word, let the output be \(1\)s
Sequence Model - Sequence Models & Attention Mechanism
原文:https://www.cnblogs.com/zjp-shadow/p/15178221.html