[CVPR 2020] Bridging the Gap Between Anchor-based and Anchor-free Detection via ATSS 리뷰

Bridging the Gap Between Anchor-based and Anchor-free Detection via Adaptive Training Sample Selection [논문]

본 논문은 Anchor-based architecture와 Anchor-free 간의 차이점을 분석하면서 Anchor라는 개념이 과연 Object Detection에 필수적인 것인지 질문을 던진다. Anchor-based와 Anchor-free의 근본적인 성능 차이는 Positive, Negative Traning Sample을 정의하는 방법으로부터 비롯된다고 지적한다. Anchor가 있든 없든 간에, 다른 말로 Box로부터든 Point로부터든 Regression을 하든지 간에 Positive, Negative Training Sample을 동일한 방식으로 정의한다면 유의미한 성능 차가 없다는 것이다. 즉, Positive trainint sample과 negative training sample을 어떻게 정의하느냐가 OD에서 매우 중요하다.

1. Introduction

1.1 Object Detection

= Image classification (이미지의 label 찾기) + Object localization (이미지 내의 물체의 위치 찾기. Bbox 사용)

즉, OD는 이미지가 입력되었을 때 두 가지 정보(Bbox, class label이 어떤 것인지)를 모두 산출함.

1.2 Anchor-based Detector

예전에는 sliding window 방식을 사용했음. object가 없는 영역까지도 봐야한다는 비효율성 존재. 성능도 높지 않음.

- Two-stage detector (R-CNN 계열)

regional proposal과 classification을 순차적으로 진행
regional proposal : 물체가 있을법한 영역을 찾음
느리지만 비교적 정확도 높음

- One-stage detector

regional proposal과 classification을 한 번에 진행
빠르지만 상대적으로 정확도 낮음.
속도가 더 중요하기 때문에 One-stage detector 연구가 계속 진행됨.

- RetinaNet

one-stage anchor-based detector
class imbalance problem을 해결하기 위해 focal loss를 제안
class imbalance : foreground (positive)와 background (negative)가 균일하지 않아서 생기는 문제
Focal loss :
- 학습에 기여하지 않는 easy negative sample 들을 모두에 학습에 포함시키는 것이 비효율적이고, 많은 easy negative sample들을 학습에 사용했을 때, 일반적이지 않은 방향으로 학습될 수 있는 위험이 존재. 따라서 easy negative에 압도되지 않기 위해 focal loss를 제안함.
- easy sample에는 weight를 적게 주고, hard sample(분류하기 어려운 부분)에는 weight를 크게 줘서 weight를 조정

1.3 Anchor-free Detector

preset anchor가 없이 object를 찾는 방법

- Keypoint-based method

사전에 정의되었거나 self로 학습할 수 있는 keypoints가 있을 때, keypoint에 대해 localization을 진행
Keypoint에 대한 detection을 한다음에 grouping을 또 진행하기 때문에 느리다는 단점이 있지만 비교적 정확하다

- Center-based method

물체의 중심점을 예측하는 방식, 이미지 히트맵의 피크점을 이용함으로써 물체마다 하나의 center point를 추정. computation problem과 hyperparameter sensitiveness 을 좀 완화시키는 방법
FCOS (fully convolution one-stage object detection) : center-based anchor-free detector
- 기존 모델들이 hyper-parameters setting에 민감하다는 문제, positive 와 negative samples class의 불균형이 심하다는 문제를 해결하기 위해 개발된 모델
- center point를 기준으로 상하좌우 거리를 계산해서 바운딩 박스를 regression함.
- center point와 ground truth box의 엣지의 거리를 의미하는 center-ness 값을 측정하여서 거리가 먼 부분은 최대한 제거하고 거리가 가까운 부분만 positive로 남겨 둠.

2. Difference Analysis of Anchor-based and Anchor-free Detector

Comparison of RetinaNet and FCOS

각 location 마다 anchors을 몇 개까지 지정할 것인가 (The number of anchors tiled per location)
- (RetinaNet) 하나의 location 마다 여러 개의 anchor boxes를 뽑음
- (FCOS) 하나의 location 마다 하나의 anchor point만 뽑아냄
positive와 negative samples를 정의하는 방식의 차이 (The definition of positive and negative samples)
- (RetinaNet) 각 Anchor와 Ground-truth의 IoU를 계산해서 특정 threshold 이상은 positive 이하는 negative로 정함
- (FCOS) spatial 과 scale constraints를 사용해서 samples를 선택. 해당 Object의 Bounding Box가 pixel의 정중앙 좌표를 포함할 때 해당 영역을 positive sample로 분류하고, 나머지는 negative sample로 분류함 (Spatial Constraint). 추가적으로 FCOS는 feature pyramid level 별로 서로 다른 크기의 annotation을 할당하기 때문에 사전에 정의한 scale과 annotation의 크기가 맞지 않는 pyramid level에선 모두 negative sample로 할당함 (Scale Constraint).
regression할 때 시작하는 포인트 자체가 다르다 (The regression starting status)
- (RetinaNet) 미리 정해진 anchor box를 기준으로 bounding box를 regress
- (FCOS) box를 정하지 않고, 물체 마다 하나의 anchor point로 부터 물체가 어느정도 위치하고 있는지 regress

3. Adaptive Training Sample Selection

positive and negative sample을 어떻게 정의하는 지가 중요함. 따라서 positive sample을 뽑기 위한 자동 selection 방법을 도입하겠다. 본 논문에서는 k라는 하나의 hyperparameter만 존재하기 때문에 hyperparameters를 튜닝하는 수고를 덜 수 있다.

기존의 sample selection strategy
- hyper-parameters 튜닝에 굉장히 민감함
제안하는 ATSS method
- k라는 하나의 hyper-parameter만 있기 때문에 거의 hyper-parameter 튜닝 없이 통계적인 특성에 따라 positive 와 negative samples를 자동으로 구분할 수 있음
  1. candidate positive samples를 찾음
  2. candidates과 ground-truth 사이의 IoU 값을 계산하고, mean과 standard deviation 값을 계산함
  3. final positive samples를 선택

'Computer Vision > Object Detection, Segmentation' 카테고리의 다른 글

Meta AI의 SAM(Segment Anything Model) 리뷰 (1)	2024.01.24
[CVPR 2022] Oriented RepPoints for Aerial Object Detection (0)	2023.03.17
[MMRotate] Tutorial (0)	2022.08.26
[MMRotate] 개념 (0)	2022.08.24
[ECCV 2020] DETR : End-to-End Object Detection with Transformers (Facebook AI) (0)	2021.12.07