Machine Transaltion : new task
Seq2Seq : new neural architecture
(1) Machine Transaltion : New task
SMT ( Statistical Machine Translation )
- Bayes Rule
- Learning alignment
need large amount of parallel data!
↓
Alignment : correspondence between particular words
(Prob) one - to - many , many - to - many
Alignment a = latent variable
explicitly specified X
- Decoding
(Q) How to compute this argmax?
(A) Impose strong indep' assumption == Decoding
(2) Seq2Seq : New neural architecture
NMT (Nueral Machine Translation)
- Seq2Seq
a single end-to-end nueral network = Seq2Seq (involves two models : encoder & decoder)
encoder : encode source sentence
decoder : generate target sentence
-> Summarization / dialogue / parsing / code generation
P ( y | x )
NMT directly calculate conditional model probability
| of target langugae sentence
| given source source language sentence
(Q) How to train?
(A) big corpus -> J = predicted // actual
- Multi-layer RNNs <- more complex
Multi-layer deep encoder-decoder achine translation net
2-4 layer -> encoder RNN
4 layer -> decoder RNN
(improvement : 1 to 2 >> 2 to 3)
- Decoding
Greedy Decoding : step by step
(Prob) ex. He hit a ___ ..? -> no way to go back
Exhausive search decoding
(Prob) computing all possible y -> EXPENSIVE
Beam Search Decoding **
On each step -> k most peobable partial translation
k : beam size
(ex) what are the two most likely things ? -> k=2
compare score for each hypothesis -> keep
(prob) longer hypthesis ? lower score
(fix) Normalize by length
- Disadvantage
- Evaluate
BLEU : compare Machine written vs Human-written translation
| useful but imperfect
- Is Machine Translation solved? NOPE!
+ common sense ex. paper jam
+ biases in training data ex. gender
'CS224N' 카테고리의 다른 글
[CS224n] Lecture 7(3)-8 Attention (0) | 2022.01.12 |
---|