본문 바로가기

CS224N

[CS224n] Lecture 7(3)-8 Attention


(3) Attention : New neural technique


(3) Attention : New neural technique

 

Seq2Seq

(Prob) : Bottleneck problem

-> How can we get more info during the translation ?

(Sol) Attention

 

 

 

  • Core idea

on each step of decoder, 

use direct connection to the encoder

to focus on a particular part of the source sequence

 

seq2seq with attention

attention score -> attention distribution

 

Why do we need both encoder rnn & decoder rnn ? -> "self-attention"

 

 

 

  • Attention : in equation

decoder hidden state : s        |

enoder hidden state : h         |   -> dot product

 

=> attention score e

 

=> attention distribution a

 

=> weighted sum a

 

 

 

  •  Advantage

- improve NMT performance

- human like : humans also look back at the source sentences

- solve the bottleneck problem

- helps the vanishing gradient problem

 

- interpretability

we just built seq2seq model & gave data -> earn "alignment" 

 

 

 

  • Attention variants

- values h

- query s

 

- attention scores e

 several ways to compute attnetion score

 

- attention distribution a

- attention output a

 

 

 

  • Attention 

: "general" DL technique not just for MT

attention = collective summary of information


 

'CS224N' 카테고리의 다른 글

[CS224n] Lecture 7 - Translation, Seq2Seq  (0) 2022.01.11