1. Basics

1-1. Object detection VS Text detection

Object detection
- localization, multiple class
- Low density, low aspect ratio, general box shapes, size variance $\downarrow$
Text detection
- localization, only one class (text)
- High density, high aspect ratio, unique box shapes, size variance $\uparrow$

Untitled

Untitled

RBOX (Rotated Box)
- $(x_1, y_1, width, height, \theta)$ or $(x_1, y_1, x_2, y_2, \theta)$

Untitled

Untitled

Untitled

Regression-Based
- input image → MODEL → representation
- [-] Arbitrary-shaped text
- [-] Extreme aspect ratio
- TextBoxes ‘18
Segmentation-Based
- input image → MODEL → pixel information → process → representation
- [-] complex post-processing
- [-] interference between instances
- PixelLink ‘18
Hybrid
- input image → Regression-based MODEL → coarse representation → Segmentation-based MODEL → pixel information → post-processing → output representation
- MaskTextSpotter ‘18

Character-Based Method
- Character region map + Character affinity map → detection result
- CRAFT ‘19
Word-Based Method