[CS224n] Lecture 7(3)-8 Attention

(3) Attention : New neural technique

Seq2Seq

(Prob) : Bottleneck problem

-> How can we get more info during the translation ?

(Sol) Attention

on each step of decoder,

use direct connection to the encoder

to focus on a particular part of the source sequence

attention score -> attention distribution

Why do we need both encoder rnn & decoder rnn ? -> "self-attention"

decoder hidden state : s |

enoder hidden state : h | -> dot product

=> attention score e

=> attention distribution a

=> weighted sum a

- improve NMT performance

- human like : humans also look back at the source sentences

- solve the bottleneck problem

- helps the vanishing gradient problem

- interpretability

we just built seq2seq model & gave data -> earn "alignment"

- values h

- query s

- attention scores e

- attention distribution a

- attention output a

: "general" DL technique not just for MT

attention = collective summary of information

[CS224n] Lecture 7 - Translation, Seq2Seq (0)	2022.01.11

IIIIIIIIIIII