1. Basics
1-1. Object detection VS Text detection
-
Object detection
- localization, multiple class
- Low density, low aspect ratio, general box shapes, size variance $\downarrow$
-
Text detection
- localization, only one class (text)
- High density, high aspect ratio, unique box shapes, size variance $\uparrow$



1-2. Text box representation
- RECT (Rectangle)
- $(x_1, y_1, width, height)$ or $(x_1, y_1, x_2, y_2)$

- RBOX (Rotated Box)
- $(x_1, y_1, width, height, \theta)$ or $(x_1, y_1, x_2, y_2, \theta)$

- QUAD (Quadrilateral)
- $(x_1, y_1, \dots, x_4, y_4)$

- Polygon
- $(x_1, y_1, \dots , x_N, y_N)$

2. Taxonomy
2-1. Regression-based VS Segmentation-based
-
Regression-Based
- input image → MODEL → representation
- [-] Arbitrary-shaped text
- [-] Extreme aspect ratio
- TextBoxes ‘18
-
Segmentation-Based
- input image → MODEL → pixel information → process → representation
- [-] complex post-processing
- [-] interference between instances
- PixelLink ‘18
-
Hybrid
- input image → Regression-based MODEL → coarse representation → Segmentation-based MODEL → pixel information → post-processing → output representation
- MaskTextSpotter ‘18
2-2. Character-based VS Word-based
-
Character-Based Method
- Character region map + Character affinity map → detection result
- CRAFT ‘19
-
Word-Based Method
3. Baseline Model - EAST
- EAST : An Efficient and Accurate Scene Text Detector