본문 바로가기

Paper/OCR

[Search] OCR

1. E2E OCR

 

1.1 Donut

donut : Document Understanding Transformer without OCR 
[논문] https://arxiv.org/pdf/2111.15664v1.pdf
[리뷰] https://yhkim4504.tistory.com/15?category=843360 
[코드] X

 

1.2 LayoutMv2

LayoutLMv2 : Multi-modal Pre-training for Visually-Rich Document Understanding

[논문] https://arxiv.org/abs/2012.14740
[리뷰] https://www.youtube.com/watch?v=BI2Mx5cdc60&feature=youtu.be
[코드] https://github.com/microsoft/unilm/tree/master/layoutlmv2

 

 

1.3 E2E-MLT

E2E-MLT - an Unconstrained End-to-End Method for Multi-Language Scene Text (2018)

[논문] https://arxiv.org/ftp/arxiv/papers/1801/1801.09919.pdf

[코드] https://github.com/MichalBusta/E2E-MLT

 

 

1.4. FOTS

 FOTS: Fast Oriented Text Spotting with a Unified Network (2018)

[논문]https://arxiv.org/abs/1801.01671
[코드] https://github.com/jiangxiluning/FOTS.PyTorch

 

 

1.5 PGNet

PGNet: Real-time Arbitrarily-Shaped Text Spotting with Point Gathering Network, 2021(**)

[논문] https://arxiv.org/pdf/2104.05458v1.pdf
[코드] https://github.com/PaddlePaddle/PaddleOCR-

 

 

1.6 STN-OCR

STN-OCR: A single Neural Network for Text Detection and Text Recognition, 2017(**)
[논문] https://arxiv.org/pdf/1707.08831v1.pdf

[코드] https://github.com/Bartzi/stn-ocr

 

 

1.7 TransVTSpotter

 A Bilingual, OpenWorld Video Text Dataset and End-to-end Video Text Spotter with Transformer (TransVTSpotter)
[논문] https://arxiv.org/pdf/2112.04888.pdf
[코드] https://github.com/weijiawu/transvtspotter

 

 

1.8 Cost-effective End-to-end Information Extraction for Semi-structured Document Images

[논문] https://arxiv.org/pdf/2104.08041.pdf
[논문리뷰] X
[코드] X

 

 

 

 

2. Tranformer based OCR

 

2.1 SATRN(Self-Attention Text Recognition Network)

On Recognizing Texts of Arbitrary Shapes with 2D Self-Attention

[논문] 
https://openaccess.thecvf.com/content_CVPRW_2020/papers/w34/Lee_On_Recognizing_Texts_of_Arbitrary_Shapes_With_2D_Self-Attention_CVPRW_2020_paper.pdf
[논문리뷰] https://oranz.tistory.com/5
[코드] https://github.com/clovaai/satrn

 

 

2.2 TrOCR

TrOCR: Transformer-based Optical Character Recognition with Pre-trained Models (2021)

[논문] https://arxiv.org/pdf/2109.10282.pdf
[코드] https://github.com/microsoft/unilm/tree/master/trocr

 

 

2.3 Transformer-Based Text Detection in the Wild
[논문] https://openaccess.thecvf.com/content/CVPR2021W/VOCVALC/papers/Raisi_Transformer-Based_Text_Detection_in_the_Wild_CVPRW_2021_paper.pdf

 

 

2.4 ViTSTR

Vision Transformer for Fast and Efficient Scene Text Recognition (ViTSTR)
[논문] https://arxiv.org/pdf/2105.08582.pdf
[코드] https://github.com/roatienza/deep-text-recognition-benchmark