[SEAM] Self-supervised Equivariant Attention Mechanism for Weakly Supervised Semantic Segmentation

(23.02.14)

WSL(weakly supervised learning) 논문 리뷰 3탄

본 논문에서 제안하는 method가 연관 개념들이 굉장히 많이 들어간다. 성능개선이 있었지만, 간단한 mechanism은 아닌듯하다...!

- 논문 제목: Self-supervised Equivariant Attention Mechanism for Weakly Supervised Semantic Segmentation (CVPR 2020)

- https://arxiv.org/pdf/2004.04581v1.pdf

핵심 정리

- SEAM, PCM 제안. fully and weakly supervision의 gap을 줄이고자.

- siamese network 구조로 구현, ECR(equivariant cross regularization) loss 사용. over-activated와 under-activates regions 줄이고자.

Probelm & Solution

Problem

- image-level의 Weakly supervised semantic segemtation은 떠오르는 computer vision의 문제인데, CAM이 이 문제에 대한 큰 변화를 일으켰다. 그런데, CAM은 full supervision과 weak supervision 사이의 gap 때문에 object mask를 제공하기에 어려웠다.

Solution

- self-supervised equivariant attention mechanism (SEAM) 제안

- 다양하게 transform된 images로부터 CAMs에 consistency regularization을 적용하였다.

- additional supervision을 얻고, gap을 줄이기 위함.

- equivariance is an implicit constraint in fully supervised semantic segmentation이라는 것에 기반.

- self-supervision을 제공하기 위해서 consistency regularizaion을 적용.

- pixel correlation module (PCM) 제안

- context appearance information 이용.

- similar neighbors로 current pixel의 예측을 다듬어줌.

Method

Equivariant Regularization

shared-weight siamese structure로 네트워크를 구성하였다.

한 branch는 original image를 input으로, 또 다른 branch는 affine transform을 적용한 image를 input으로 넣어 나온 두개의 activation maps를 regularized하여 CAMs의 일관성을 보장하고자 하였다.

Pixel Correlation Module

1,2번의 식을 합쳐 4번의 식을 도출해냈다.

결국 final CAM은 original CAM들을 normalized similarities와 함께 합한 값들이다.

SEAM의 loss function

equivariant regularization loss on original CAM

loss에 대해 최종적으로 설명하자면, 아래와 같다.

- classification loss: localize objects 에 사용됨.

- ER loss: pixel and image-level supervisions의 gaps를 줄이는 데에 사용됨.

- ECR loss: 다양한 affine transformations를 거쳐 일관된 예측을 하기 위하여 PCM을 통합하는 데에 사용됨.

------

자세한 설명

WSSS aproaches에서는 classification과 segmentation의 parameter를 공유하는데 (ws=wc) 그렇기 때문에 이 method에서는 classification network를 학습시키고나서 classification network의 pooling function을 제거하고 segmentation task를 다룬다.

PCM: 각 pixel에 대한 context appearance information을 포착, affinity attention maps를 학습함으로써 original CAMs를 바꾼다. -> prediction의 consistency를 향상시키기 위함.

SEAM은 siamese network로 구현되었고, ECR(equivariant cross regularization loss)를 사용하였다.

다른 branches에서 original CAMs와 revised CAMs를 regularizes한다.

Experiment

- dataset: PASCAL VOC 2012 dataset with 21 class annotations (ex: 20 foreground objects and background)

728x90

'Computer Vision > 논문' 카테고리의 다른 글

[SegNet] A Deep Convolutional Encoder-Decoder Architecture for Image Segmentation (1)	2023.03.04
[FCN] Fully Convolutional Networks for Semantic Segmentation (0)	2023.03.02
[DSRG] Weakly-Supervised Semantic Segmentation Network with Deep Seeded Region Growing (0)	2023.02.12
[CAM] Learning Deep Features for Discriminative Localization (0)	2023.02.09
[MoCo v3] An Empirical Study of Training Self-Supervised Vision Transformers (0)	2023.02.09

Probelm & Solution

Method

Experiment

'Computer Vision > 논문' 카테고리의 다른 글

검색 태그

티스토리툴바