1. E2E OCR
1.1 Donut
donut : Document Understanding Transformer without OCR
[논문] https://arxiv.org/pdf/2111.15664v1.pdf
[리뷰] https://yhkim4504.tistory.com/15?category=843360
[코드] X
1.2 LayoutMv2
LayoutLMv2 : Multi-modal Pre-training for Visually-Rich Document Understanding
[논문] https://arxiv.org/abs/2012.14740
[리뷰] https://www.youtube.com/watch?v=BI2Mx5cdc60&feature=youtu.be
[코드] https://github.com/microsoft/unilm/tree/master/layoutlmv2
1.3 E2E-MLT
E2E-MLT - an Unconstrained End-to-End Method for Multi-Language Scene Text (2018)
[논문] https://arxiv.org/ftp/arxiv/papers/1801/1801.09919.pdf
[코드] https://github.com/MichalBusta/E2E-MLT
1.4. FOTS
FOTS: Fast Oriented Text Spotting with a Unified Network (2018)
[논문]https://arxiv.org/abs/1801.01671
[코드] https://github.com/jiangxiluning/FOTS.PyTorch
1.5 PGNet
PGNet: Real-time Arbitrarily-Shaped Text Spotting with Point Gathering Network, 2021(**)
[논문] https://arxiv.org/pdf/2104.05458v1.pdf
[코드] https://github.com/PaddlePaddle/PaddleOCR-
1.6 STN-OCR
STN-OCR: A single Neural Network for Text Detection and Text Recognition, 2017(**)
[논문] https://arxiv.org/pdf/1707.08831v1.pdf
[코드] https://github.com/Bartzi/stn-ocr
1.7 TransVTSpotter
A Bilingual, OpenWorld Video Text Dataset and End-to-end Video Text Spotter with Transformer (TransVTSpotter)
[논문] https://arxiv.org/pdf/2112.04888.pdf
[코드] https://github.com/weijiawu/transvtspotter
1.8 Cost-effective End-to-end Information Extraction for Semi-structured Document Images
[논문] https://arxiv.org/pdf/2104.08041.pdf
[논문리뷰] X
[코드] X
2. Tranformer based OCR
2.1 SATRN(Self-Attention Text Recognition Network)
On Recognizing Texts of Arbitrary Shapes with 2D Self-Attention
[논문]
https://openaccess.thecvf.com/content_CVPRW_2020/papers/w34/Lee_On_Recognizing_Texts_of_Arbitrary_Shapes_With_2D_Self-Attention_CVPRW_2020_paper.pdf
[논문리뷰] https://oranz.tistory.com/5
[코드] https://github.com/clovaai/satrn
2.2 TrOCR
TrOCR: Transformer-based Optical Character Recognition with Pre-trained Models (2021)
[논문] https://arxiv.org/pdf/2109.10282.pdf
[코드] https://github.com/microsoft/unilm/tree/master/trocr
2.3 Transformer-Based Text Detection in the Wild
[논문] https://openaccess.thecvf.com/content/CVPR2021W/VOCVALC/papers/Raisi_Transformer-Based_Text_Detection_in_the_Wild_CVPRW_2021_paper.pdf
2.4 ViTSTR
Vision Transformer for Fast and Efficient Scene Text Recognition (ViTSTR)
[논문] https://arxiv.org/pdf/2105.08582.pdf
[코드] https://github.com/roatienza/deep-text-recognition-benchmark