본문 바로가기

분류 전체보기

(25)

[CODE] SynthText - Korean ver 응용 https://arxiv.org/pdf/1604.06646.pdf https://github.com/ankush-me/SynthText 조작 간편할 것이라 생각됨 1. 투명도 너무 높거나 지나치게 잘려있는 등 육안으로도 알아보기 힘든 변형 존재 투명도 colorize3_poisson.py op = 0.50 + 0.1*np.random.randn() 2. 여러 줄로 나오는경우 -> 한번에 /n으로 묶어서 처리 (bbox도 마찬가지로 함께 처리) text_utils.py def get_nline_nchar nline = 1 3. 글씨 잘려서 나오는 경우 pygame ver 1.9.6으로 변경 -> 해결 4. 글씨체 제한적 fontlist.txt.에 추가 5. output style 현재 프로젝트에서 기준으로..

[OCR] ViT-STR: Vision Transformer for Fast and Efficient SceneText Recognition https://arxiv.org/pdf/2105.08582.pdf https://github.com/roatienza/deep-text-recognition-benchmark GitHub - roatienza/deep-text-recognition-benchmark: PyTorch code of my ICDAR 2021 paper Vision Transformer for Fast and Efficien PyTorch code of my ICDAR 2021 paper Vision Transformer for Fast and Efficient Scene Text Recognition (ViTSTR) - GitHub - roatienza/deep-text-recognition-benchmark: PyTorch..

[OCR] FOTS: Fast Oriented Text Spotting with a Unified Network https://arxiv.org/pdf/1801.01671.pdf https://github.com/jiangxiluning/FOTS.PyTorch GitHub - jiangxiluning/FOTS.PyTorch: FOTS Pytorch Implementation FOTS Pytorch Implementation. Contribute to jiangxiluning/FOTS.PyTorch development by creating an account on GitHub. github.com 0. Abstract FOTS : detection & recognition -->simultaneous & complementary ( computational&visual information 공유) 1. Introd..

[DACON] 잡케어 추천 알고리즘 Part2 [ Model ] 1. HGB 2. XGB 4. Model 4.1 HGBoost 순서형과 명목형 변수가 섞여있는 데이터 셋. Q. 이 사실을 어떻게 모델에게 전달해줄 것인가? A1. one-hot encoding? 데이터셋에서 순서형보다 명목형이 훨씬 많음. H_속성 등은 1000이상까지 넘어감. → 차원 증폭 + 트리계열 모델에선 숫자의 차이가 모델에 영향을 주지 않음 (i.e. label encoding으로도 충분) A2. categorical_features = https://scikit-learn.org/stable/modules/ensemble.html#categorical-support-gbdt 1.11. Ensemble methods The goal of ensemble methods is ..

[OCR] Donut : Document Understanding Transformer without OCR https://arxiv.org/pdf/2111.15664v1.pdf 0. Abstract OCR framework에서 벗어난 E2E model synthetic document image generator -> large scale data에 대한 의존 낮춤 1. Introduction Semi-structured documents 기존 VDU (Visual Document Understanding) 보통 OCR 기반 분리된 세 개의 모듈로 구성 : text detection, text recognition, parsing 문제점 1: OCR is expensive and is not always available 문제점 2: OCR errors negatively influence subsequent..

[OCR] TrOCR: Transformer-based Optical Character Recognition with Pre-trained Models https://arxiv.org/pdf/2109.10282.pdf https://github.com/microsoft/unilm/tree/master/trocr GitHub - microsoft/unilm: Large-scale Self-supervised Pre-training Across Tasks, Languages, and Modalities Large-scale Self-supervised Pre-training Across Tasks, Languages, and Modalities - GitHub - microsoft/unilm: Large-scale Self-supervised Pre-training Across Tasks, Languages, and Modalities github.com ..

[Search] OCR 1. E2E OCR 1.1 Donut donut : Document Understanding Transformer without OCR [논문] https://arxiv.org/pdf/2111.15664v1.pdf [리뷰] https://yhkim4504.tistory.com/15?category=843360 [코드] X 1.2 LayoutMv2 LayoutLMv2 : Multi-modal Pre-training for Visually-Rich Document Understanding [논문] https://arxiv.org/abs/2012.14740 [리뷰] https://www.youtube.com/watch?v=BI2Mx5cdc60&feature=youtu.be [코드] https://github...

[CS224n] Lecture 7(3)-8 Attention (3) Attention : New neural technique (3) Attention : New neural technique Seq2Seq (Prob) : Bottleneck problem -> How can we get more info during the translation ? (Sol) Attention Core idea on each step of decoder, use direct connection to the encoder to focus on a particular part of the source sequence attention score -> attention distribution Why do we need both encoder rnn & decoder rnn ? -> "..

이전 1 2 3 4 다음

티스토리툴바