[논문] Predicting depth, surface normals and semantic labels with a common multi-scale convolutional architecture

(23.05.02)

Depth Estimation 논문 정리하기 2탄

- 논문 제목: Predicting depth, surface normals and semantic labels with a common multi-scale convolutional architecture (ICCV 2015)

- https://arxiv.org/pdf/1411.4734v4.pdf

(https://arxiv.org/pdf/2003.06620.pdf survey에서 설명하는 내용)

single image로부터 depth estimation, surface normal estimation, semantic label prediction과 같은 task를 다룰 수 있는 일반적인 multi-scale framework를 제안한 논문이다.

1. Summary

전체적인 방법론은 Depth Map Prediction from a Single Image using a Multi-Scale Deep Network (NeurIPS 2014)(https://hey-stranger.tistory.com/306) 이것과 거의 동일하다.

convolutional layer가 더 추가되었다는 점, scale 3가 추가되었다는 점, multichannel map을 통과한다는 점이 추가되었을 뿐이다.

Depth estimation 을 포함하여 3가지 task에 적용가능하다는 점도 추가되었다.

공식 코드는 없는 것 같다..

2. Abstract

본 논문에서는 single multi-scale convolutional netwrok architecture를 사용하여 세가지 computer vision task를 다룬다.

1) depth prediction, 2) surface normal estimation, 3) semantic labeling

작은 modifications를 통해서 input image에서 output map으로 regressing함으로써 각 task에 자연스럽게 적용시킬 수 있다.

본 논문에서 제시하는 방법은 sequence of scales를 사용하여 predictions를 점진적으로 정제하고, superpixels나 low-level segmentation 없이 image details를 포착한다.

그리하여, 3가지 task에 대해서 sota를 달성했다.

3. Model Architecture

본 논문에서 제시하는 모델은 multi-scale deep network이다.

우선, 전체 image area에 대해서 coarse global output을 예측한다. 그리고, finer-scale local networks를 사용해서 이를 refine한다. 해당 아키텍쳐는 "Depth Map Prediction from a Single Image using a Multi-Scale Deep Network (NeurIPS 2014)" 여기에서 제안된 것에 개선점을 붙인 것이다.

- 1) make model deeper: convolutional layer를 더 붙였다

- 2) add third scale at higher resolution

- 3) pass multichannel feature maps

scale 1: Full-Image View

- 네트워크의 first scale은 coarse를 예측한다. scale1에서 AlextNet와 VGGNet 두가지를 사용하여 학습하는데, task에 따라서 성능이 다르게 나타났다.

scale 2: Predictions

- 두번째 scale의 역할은 mid-level resolution에서 predictions를 만드는 것이다.

scale 3: Higher Resolution

- 마지막 scale에서는 predictions를 higher resolution으로 refine한다. scale2의 ouuput을 concatenate 한다. 최종적인 output resolution은 network input의 절반이다.

4. Loss function

(task 3가지 중에서 depth estimation만 정리했다.)

predicted ground-truth log depth maps를 비교하는 loss fucntion을 사용한다.

5. Experiments

728x90

'Computer Vision > 논문' 카테고리의 다른 글

[논문] Single-Image Depth Perception in the Wild (0)	2023.05.03
[논문] Deeper Depth Prediction with Fully Convolutional Residual Networks (0)	2023.05.03
[논문] Depth Map Prediction from a Single Image using a Multi-Scale Deep Network (0)	2023.05.01
[CLIP] Learning Transferable Visual Models From Natural Language Supervision (1)	2023.04.03
[SWSSS] Learning pseudo labels for semi-and-weakly supervised semantic segmentation (0)	2023.03.15

내 블로그 - 관리자 홈 전환	`Q` `Q`
새 글 쓰기	`W` `W`

글 수정 (권한 있는 경우)	`E` `E`
댓글 영역으로 이동	`C` `C`

이 페이지의 URL 복사	`S` `S`
맨 위로 이동	`T` `T`
티스토리 홈 이동	`H` `H`
단축키 안내	`Shift` + `/` `⇧` + `/`