[CS224n] Lecture 7 - Translation, Seq2Seq


Machine Transaltion : new task

Seq2Seq : new neural architecture

(1) Machine Transaltion : New task

SMT ( Statistical Machine Translation )


  • Bayes Rule

Translate & write


  • Learning alignment

need large amount of parallel data!

Alignment : correspondence between particular words 

(Prob) one - to - many , many - to - many  


Alignment a = latent variable

explicitly specified X


  • Decoding


(Q) How to compute this argmax?

(A) Impose strong indep' assumption    == Decoding



(2) Seq2Seq : New neural architecture

NMT (Nueral Machine Translation)

  • Seq2Seq

a single end-to-end nueral network = Seq2Seq (involves two models : encoder & decoder)

encoder : encode source sentence

decoder : generate target sentence


-> Summarization / dialogue / parsing / code generation


P ( y | x )

NMT directly calculate conditional model probability

| of target langugae sentence

| given source source language sentence


(Q) How to train?

(A) big corpus -> J = predicted // actual



  • Multi-layer RNNs <- more complex 

Multi-layer deep encoder-decoder achine translation net


2-4 layer -> encoder RNN

4 layer -> decoder RNN

(improvement : 1 to 2 >> 2 to 3)


  • Decoding

Greedy Decoding : step by step

(Prob)  ex. He hit a ___ ..?   ->  no way to go back


Exhausive search decoding

(Prob) computing all possible y -> EXPENSIVE



Beam Search Decoding **

On each step -> k most peobable partial translation 

k : beam size

(ex) what are the two most likely things ? ->  k=2

compare score for each hypothesis -> keep 


1. top-scoring


2. backtrack


(prob) longer hypthesis ? lower score 

(fix) Normalize by length 

use this to select top one


  • Disadvantage 


  • Evaluate 

BLEU : compare Machine written vs Human-written translation

| useful but imperfect


  • Is Machine Translation solved? NOPE!

+ common sense ex. paper jam

+ biases in training data ex. gender 









