[OCR] FOTS: Fast Oriented Text Spotting with a Unified Network

https://arxiv.org/pdf/1801.01671.pdf

https://github.com/jiangxiluning/FOTS.PyTorch

GitHub - jiangxiluning/FOTS.PyTorch: FOTS Pytorch Implementation

FOTS Pytorch Implementation. Contribute to jiangxiluning/FOTS.PyTorch development by creating an account on GitHub.

github.com

0. Abstract

FOTS : detection & recognition -->simultaneous & complementary ( computational&visual information 공유)

1. Introduction

Common Method
- CNN : extract feature map
- another decoder : decode the regions ---> heavy time cost & ignore correlation [Background]
FOTS
- learns more generic features -> shared between 'detection' & 'recognition'
- single detection network ---> computational cost ↓
HOW? ROIRotate
- oriented detection bounding box에 따라, feature map으로부터 적절한 feature 가져옴
- text detection branch : predict the detection bounding boxes
- text proposal features
  1. RoIRotate : detection results에 따라 text propsal features 추출
  2. RNN encoder
  3. CTC(Connectionist Temporal Classification) decoder

2. Related Work

2.1. Text Detection

character based methods (text를 character의 결합으로 생각)
- localize characters ----> group them into words or text lines
- Sliding-window-based methods
- connected-components based methods
Recently
- AIM : directly detect words in images
- . . .

2.2. Text Recognition

AIM : decode 'a sequence of label' from regularly cropped but variable-length text images
Recently
- word classification based
- sequence-to-label decode based
- sequence-to-sequence model based

2.3. Text Spotting

separate detection / recognition model
Recently <Towards end-to-end text spotting with convolutional recurrent neural networks>
- detection : text proposal network inspired by RPN
- recogniton : LSTM with attention mechanism
--> FOTS : 수직회전외의 더 복잡한 케이스 처리가능 + 빠름

3. Methodology

3.1. Overall Architecture

backbone : ResNet-50
concatenate -> low-level feature maps & high-level semantic feature maps
text detection branch
RoIRotate
- text detection branch 의 output = 'oriented text region proposals' 에 따라
- fixed-height representations로
text recognition branch
- CNN and LSTM : encode text sequence information
- CTC : decode

3.2. Text Detection Branch

text detector : fully convolutional network
upscale the feature maps from 1/32 to 1/4 size
first channel : the probability of each pixel being a positive sample
- positive sample?
  - 4 channels --> predict its distances to top, bottom, left, right sides of the bounding box
  - boundig box에 thresholding and NMS적용한 것 = final output
loss :
- text classification term
- bounding box regression term

3.3. RoIRotate

oriented feature regions ---> axis-aligned feature maps
원본이미지 그대로X, feature map을!
(비뚤어진 글씨들 각도 맞게)
기존방식(ROI POOLING, ROI ALIGN)에 대한 차별점
- Bilinear Interpolation (이중 선형 보간법)활용 -> 출력길이 가변적
- *이중선형보간법 : x축, y축으로 선형보간법 두번적용
Importance
- text recognition은 noise에 매우 예민

3.4. Text Recognition Branch

Aim : predict text labels ( RoIRotate에서 얻은 region features 활용해서 )

VGGlike sequential convolutions
poolings with reduction along height axis only
- output : higher-level feature maps
encoder : bi-directional LSTM
one fully-connection
decoder : CTC ( frame-wise classification scores ---[transform]---> label sequence )

Loss

'Paper > OCR' 카테고리의 다른 글

[OCR] ViT-STR: Vision Transformer for Fast and Efficient SceneText Recognition (0)	2022.02.03
[OCR] Donut : Document Understanding Transformer without OCR (0)	2022.01.25
[OCR] TrOCR: Transformer-based Optical Character Recognition with Pre-trained Models (0)	2022.01.23
[Search] OCR (0)	2022.01.18

IIIIIIIIIIII

[OCR] FOTS: Fast Oriented Text Spotting with a Unified Network

0. Abstract

1. Introduction

2. Related Work

2.1. Text Detection

2.2. Text Recognition

2.3. Text Spotting

3. Methodology

3.1. Overall Architecture

3.2. Text Detection Branch

3.3. RoIRotate

3.4. Text Recognition Branch

'Paper > OCR' 카테고리의 다른 글

티스토리툴바

[OCR] FOTS: Fast Oriented Text Spotting with a Unified Network

0. Abstract

1. Introduction

2. Related Work

2.1. Text Detection

2.2. Text Recognition

2.3. Text Spotting

3. Methodology

3.1. Overall Architecture

3.2. Text Detection Branch

3.3. RoIRotate

3.4. Text Recognition Branch

'Paper > OCR' 카테고리의 다른 글

'Paper/OCR' Related Articles

티스토리툴바