(3) Attention : New neural technique
(3) Attention : New neural technique
Seq2Seq
(Prob) : Bottleneck problem
-> How can we get more info during the translation ?
(Sol) Attention
- Core idea
on each step of decoder,
use direct connection to the encoder
to focus on a particular part of the source sequence
attention score -> attention distribution
Why do we need both encoder rnn & decoder rnn ? -> "self-attention"
- Attention : in equation
decoder hidden state : s |
enoder hidden state : h | -> dot product
=> attention score e
=> attention distribution a
=> weighted sum a
- Advantage
- improve NMT performance
- human like : humans also look back at the source sentences
- solve the bottleneck problem
- helps the vanishing gradient problem
- interpretability
we just built seq2seq model & gave data -> earn "alignment"
- Attention variants
- values h
- query s
- attention scores e
- attention distribution a
- attention output a
- Attention
: "general" DL technique not just for MT
attention = collective summary of information
'CS224N' 카테고리의 다른 글
[CS224n] Lecture 7 - Translation, Seq2Seq (0) | 2022.01.11 |
---|