본문 바로가기

CS224N

[CS224n] Lecture 7 - Translation, Seq2Seq

 


Machine Transaltion : new task

Seq2Seq : new neural architecture



(1) Machine Transaltion : New task

SMT ( Statistical Machine Translation )

 

  • Bayes Rule

Translate & write

 

  • Learning alignment

need large amount of parallel data!

Alignment : correspondence between particular words 

(Prob) one - to - many , many - to - many  

 

Alignment a = latent variable

explicitly specified X

 

  • Decoding

 

(Q) How to compute this argmax?

(A) Impose strong indep' assumption    == Decoding

 

 


(2) Seq2Seq : New neural architecture

NMT (Nueral Machine Translation)

  • Seq2Seq

a single end-to-end nueral network = Seq2Seq (involves two models : encoder & decoder)

encoder : encode source sentence

decoder : generate target sentence

 

-> Summarization / dialogue / parsing / code generation

 

P ( y | x )

NMT directly calculate conditional model probability

| of target langugae sentence

| given source source language sentence

 

(Q) How to train?

(A) big corpus -> J = predicted // actual

 

 

  • Multi-layer RNNs <- more complex 

Multi-layer deep encoder-decoder achine translation net

 

2-4 layer -> encoder RNN

4 layer -> decoder RNN

(improvement : 1 to 2 >> 2 to 3)

 

  • Decoding

Greedy Decoding : step by step

(Prob)  ex. He hit a ___ ..?   ->  no way to go back

 

Exhausive search decoding

(Prob) computing all possible y -> EXPENSIVE

 

 

Beam Search Decoding **

On each step -> k most peobable partial translation 

k : beam size

(ex) what are the two most likely things ? ->  k=2

compare score for each hypothesis -> keep 

 

1. top-scoring

 

2. backtrack

 

(prob) longer hypthesis ? lower score 

(fix) Normalize by length 

use this to select top one

 

  • Disadvantage 

 

  • Evaluate 

BLEU : compare Machine written vs Human-written translation

| useful but imperfect

 

  • Is Machine Translation solved? NOPE!

+ common sense ex. paper jam

+ biases in training data ex. gender 

 

 


 

 

 

 

 

 

'CS224N' 카테고리의 다른 글

[CS224n] Lecture 7(3)-8 Attention  (0) 2022.01.12