본문 바로가기

Paper/OCR

(5)
[OCR] ViT-STR: Vision Transformer for Fast and Efficient SceneText Recognition https://arxiv.org/pdf/2105.08582.pdf https://github.com/roatienza/deep-text-recognition-benchmark GitHub - roatienza/deep-text-recognition-benchmark: PyTorch code of my ICDAR 2021 paper Vision Transformer for Fast and Efficien PyTorch code of my ICDAR 2021 paper Vision Transformer for Fast and Efficient Scene Text Recognition (ViTSTR) - GitHub - roatienza/deep-text-recognition-benchmark: PyTorch..
[OCR] FOTS: Fast Oriented Text Spotting with a Unified Network https://arxiv.org/pdf/1801.01671.pdf https://github.com/jiangxiluning/FOTS.PyTorch GitHub - jiangxiluning/FOTS.PyTorch: FOTS Pytorch Implementation FOTS Pytorch Implementation. Contribute to jiangxiluning/FOTS.PyTorch development by creating an account on GitHub. github.com 0. Abstract FOTS : detection & recognition -->simultaneous & complementary ( computational&visual information 공유) 1. Introd..
[OCR] Donut : Document Understanding Transformer without OCR https://arxiv.org/pdf/2111.15664v1.pdf 0. Abstract OCR framework에서 벗어난 E2E model synthetic document image generator -> large scale data에 대한 의존 낮춤 1. Introduction Semi-structured documents 기존 VDU (Visual Document Understanding) 보통 OCR 기반 분리된 세 개의 모듈로 구성 : text detection, text recognition, parsing 문제점 1: OCR is expensive and is not always available 문제점 2: OCR errors negatively influence subsequent..
[OCR] TrOCR: Transformer-based Optical Character Recognition with Pre-trained Models https://arxiv.org/pdf/2109.10282.pdf https://github.com/microsoft/unilm/tree/master/trocr GitHub - microsoft/unilm: Large-scale Self-supervised Pre-training Across Tasks, Languages, and Modalities Large-scale Self-supervised Pre-training Across Tasks, Languages, and Modalities - GitHub - microsoft/unilm: Large-scale Self-supervised Pre-training Across Tasks, Languages, and Modalities github.com ..
[Search] OCR 1. E2E OCR 1.1 Donut donut : Document Understanding Transformer without OCR [논문] https://arxiv.org/pdf/2111.15664v1.pdf [리뷰] https://yhkim4504.tistory.com/15?category=843360 [코드] X 1.2 LayoutMv2 LayoutLMv2 : Multi-modal Pre-training for Visually-Rich Document Understanding [논문] https://arxiv.org/abs/2012.14740 [리뷰] https://www.youtube.com/watch?v=BI2Mx5cdc60&feature=youtu.be [코드] https://github...