[DLHacks 実装]Perceptual Adversarial Networks for Image-to-Image Transformation

>100 Views

October 19, 17

スライド概要

Deep Learning JP:
http://deeplearning.jp/hacks/

シェア

またはPlayer版

埋め込む »CMSなどでJSが使えない場合

関連スライド

各ページのテキスト
1.

Image-to-Image Translation with Conditional Adversarial Nets (Pix2Pix) & Perceptual Adversarial Networks for Image-to-Image Transformation (PAN) 2017/10/2 DLHacks Otsubo

2.

Topic : image-to-image “translation” 1

3.

Info Pix2Pix [CVPR2017] • Phillip Isola, Jun-Yan Zhu, Tinghui Zhou, Alexei A. Efros - iGAN [ECCV 2016] interactive-deep-colorization [SIGGRAPH 2017] Context-Encoder [CVPR 2016] Image Quilting [SIGGRAPH 2001] Texture Synthesis by Non-parametric Sampling [ICCV 1999] • University of California • 178 citations PAN [arXiv2017] • Chaoyue Wang, Chang Xu, Chaohui Wang, Dacheng Tao • University of Technology Sydney, The University of Sydney, Universite Paris-Est 2

4.

Background • Many tasks are regarded as “translation” from input image to output image - Diverse methods exist for them Is there single framework to achieve them? 3

5.

Overview Pix2Pix • General-purpose solution to image-to-image translation using single framework - Single framework: conditional GAN (cGAN) PAN • Pix2Pix - (per-pixel loss) + (perceptual adversarial loss) 4

6.

Naive Implementation : U-Net (①) ①per-pixel loss (L1/L2) 5

7.

Pix2Pix (①+②) ②adversarial loss 6

8.

Pix2Pix’s loss (①+②) ① ② ② 7

9.

PAN (②+③) ③perceptual adversarial loss 8

10.

PAN’s loss (②+③) ② ② ③ ③ m : constant L1 norm 9

11.

Example1 : Image De-Raining • Removing rain from single images via a deep detail network [Fu, CVPR2017] • ID-GAN (cGAN) [Zhang, arXiv2017] - per-pixel loss - adversarial loss - pre-trained VGG’s perceptual loss Input Output (Ground Truth) 10

12.

Example1 : Image De-Raining • Removing rain from single images via a deep detail network [Fu, CVPR2017] • ID-GAN (cGAN) [Zhang, arXiv2017] - per-pixel loss - adversarial loss - pre-trained VGG’s perceptual loss (cf. PAN uses discriminator’s perceptual loss) Input Output (Ground Truth) 11

13.

Example2 : Image Inpainting • Globally and Locally Consistent Image Completion [Iizuka, SIGGRAPH2017] • Context Encoders (cGAN) [Pathak, CVPR2016] - per-pixel loss - adversarial loss Input Output (Ground Truth) 12

14.

Example3 : Semantic Segmentation Cityscape / Pascal VOC • DeepLabv3 [Chen, arXiv2017] • PSPNet [Zhao, CVPR2017] http://host.robots.ox.ac.uk:8080/leaderboard/displaylb.php? cls=mean&challengeid=11&compid=6 Cell Tracking / CREMI • Learned Watershed [Wolf, ICCV2017] • U-Net [Ronneberger, MICCAI2015] Input http://www.codesolorzano.com/Challenges/CTC/Welcome.html Output (Ground Truth) 13

15.

Result1 : Image De-Raining (≒pix2pix) (≒pix2pix)→ 14

16.

Result2 : Image Inpainting 15

17.

Result3 : Semantic Segmentation 16

18.

Discussion Why is perceptual adversarial loss so efficient? vs. No perceptual loss (Pix2Pix) - Perceptual loss enables D to detect more discrepancy between True/False images vs. Pre-trained VGG perceptual loss (ID-GAN) - VGG features tend to focus on content - PAN features tend to focus on discrepancy - PAN’s loss leads to avoid adversarial examples [Goodfellow, ICLR2015] (?) 17

19.

Minor Difference • Pix2Pix uses Patch-GAN - Small size(70×70) patch-discriminator - Final output of D is average of patch-discriminator’s responses (convolutionally applied) 18

20.

To Do • Implement 1. Pix2Pix (Patch Discriminator) 2. PAN (Patch Discriminator) 3. PAN (Normal Discriminator) Wang et al. might compare 1 with 3. 19

22.

Implementation 2017/10/17 DLHacks Otsubo

23.

My Implementation • https://github.com/DLHacks/pix2pix_PAN • pix2pix - https://github.com/junyanz/pytorch-CycleGAN-and-pix2pix • PAN - per-pixel loss à perceptual adversarial loss - not same as paper’s original architecture - num of parameters is same as pix2pix 22

24.

My Experiments • Facade (label à picture) • Map (picture à Google map) • Cityscape (picture à label) 23

25.

Result (Facade pix2pix) 24

26.

Result (Facade PAN) 25

27.

Result (Map pix2pix) 26

28.

Result (Map PAN) 27

29.

Result (Cityscape pix2pix) 28

30.

Result (Cityscape PAN) 29

31.

Result (PSNR[dB]) 30

32.

Discussion – Why pix2pix > PAN? • per-pixel loss is needed? • patch discriminator is not suited for PAN? • positive margin m • (bad pix2pix implementation in PAN’s paper…?) 31