【DL輪読会】NeRF-VAE: A Geometry Aware 3D Scene Generative Model

189 Views

April 19, 21

スライド概要

2021/04/16
Deep Learning JP:
http://deeplearning.jp/seminar-2/

シェア

またはPlayer版

埋め込む »CMSなどでJSが使えない場合

関連スライド

各ページのテキスト
1.

NeRF-VAE: A Geometry Aware 3D Scene Generative Model 
 Shohei Taniguchi, Matsuo Lab

2.

֓ཁ ະ஌γʔϯͷ෮‫ݩ‬ɾੜ੒͕Ͱ͖ΔNeRF • ஶऀ Adam R. Kosiorek, Heiko Strathmann, Daniel Zoran, Pol Moreno, Rosalia Schneider, Soňa Mokrá, Danilo J. Rezende • DeepMind • GQNͷgeneratorʹNeRFΛ࢖ͬͨϞσϧ • Last authorͷRezende͸GQNͷఏҊऀ • ICMLϑΥʔϚοτ 2

3.

Outline 1. લఏ஌ࣝ • Neural Radiance Fields (NeRF) • Generative Query Networks (GQN) 2. ख๏ɿNeRF-VAE 3. ࣮‫ݧ‬ 4. ·ͱΊ 3

4.

લఏ஌ࣝ 4

5.

[Mildenhall et al., ECCV2020] • 3࣍‫࠲ݩ‬ඪ (x) ͱࢹઢํ޲ (d) Λ ೖྗͱًͯ͠౓ (r, g, b) ͱີ౓ σ Λ ग़ྗ͢ΔNN (γʔϯؔ਺ Fθ : (x, d) ↦ ((r, g, b), σ) • ༷ʑͳ֯౓͔ΒࡱͬͨࣸਅͰֶश ผͷ֯౓͔ΒࡱͬͨࣸਅΛ ɹੜ੒Ͱ͖Δ(novel view synthesis) 5 )
 ➡︎ NeRF

6.

NeRF [Mildenhall et al., ECCV2020] • γʔϯΛ3࣍‫࠲ݩ‬ඪͱࢹઢํ޲͔Βً౓ͱີ౓ ΁ͷؔ਺ͱͯ͠ද‫ݱ‬ • ͜ͷؔ਺͕Θ͔Δͱɺvolume renderingΛ༻͍ͯ೚ҙͷࢹ఺͔Βͷը૾Λ ੜ੒Մೳʢৄ͘͠͸౔‫͞ډ‬Μͷࢿྉ[1, 2]Λࢀরʣ 6

7.

NeRF [Mildenhall et al., ECCV2020] • ֶश͸ϨϯμϦϯάͨ͠ը૾ͱ ਅͷը૾ͱͷ̎৐‫ࠩޡ‬ͷ࠷খԽ • volume rendering͕ඍ෼ՄೳͳͷͰ end-to-endʹֶशՄೳ • ϨϯμϦϯά࣌ʹ࢖͏αϯϓϧ఺ͷ બͼํͳͲʹ͸༷ʑͳ޻෉͋Γ 7

8.

NeRF [Mildenhall et al., ECCV2020] Pros • 3Dγʔϯͷද‫ͯ͠ͱݱ‬ը‫ظ‬త • ैདྷ͸఺‫܈‬΍ϝογϡͷΑ͏ͳ ཭ࢄͰߴίετͳද‫ݱ‬ • NNΛ࢖ͬͨimplicitͳද‫Ͱݱ‬ ෳࡶͳγʔϯΛਫ਼៛ʹଊ͑ΒΕΔ 8

9.

NeRF [Mildenhall et al., ECCV2020] Cons • γʔϯ͝ͱʹஞҰϞσϧΛ࠷దԽ͢Δඞཁ͕͋Δ • ະ஌ͷγʔϯ͕ಘΒΕͨΒɺͦͷ౓ʹϞσϧΛֶश͠ͳ͚Ε͹ͳΒͳ͍ • γʔϯ͝ͱʹͨ͘͞Μͷը૾Λ༻ҙ͢Δඞཁ͕͋Δ • 1γʔϯ͋ͨΓֶशʹ1~2೔͔͔Δ • ʢ౰વ͕ͩʣ৽͍͠γʔϯͷੜ੒͸Ͱ͖ͳ͍ 9

10.

GQN [Eslami et al., 2018] • 3࣍‫ݩ‬γʔϯ෮‫ݩ‬Λߦ͏VAE • EncoderΛ༻͍ͯ৽͍͠γʔϯΛ ߴ଎ʹ෮‫͖Ͱݩ‬Δ • Ϟσϧ͸৞ΈࠐΈϕʔε • ৄ͘͠͸ླ໦͞Μͷࢿྉ[3]Λࢀর 10

11.

GQN [Eslami et al., 2018] • ࢹ఺ c ͔Β‫ͨݟ‬ը૾Λ I ͱ͠ɺγʔϯΛજࡏม਺ z Ͱද‫ݱ‬ c • VAEͱಉ༷ʹม෼Լքͷ࠷େԽͰֶश I log p ({Ik} ∣ {ck} ) k=1 k=1 𝔼 N N N = log p (z) p (Ik ∣ ck, z) dz ∏ ∫ k=1 ≥ N log p I ∣ c , z − D q∥p ( ) ( ) k k KL q(z ∣ {Ik, ck} ) [ ∑ ] k=1 k=1 N 11 z

12.

GQN [Eslami et al., 2018] ৽͍͠γʔϯͷ෮‫ݩ‬͸encoder (q)Λ ࢖ͬͯߴ଎ʹͰ͖Δ p (I ∣ c, {Ik, ck} q(z ∣ {Ik, ck} k=1) M p I ∣ c, z ( )] [ 𝔼 ≈ k=1) M 12

13.

GQN [Eslami et al., 2018] Pros Cons • EncoderͰະ஌γʔϯΛߴ଎ʹ • ‫ز‬Կతͳ৘ใΛ࢖ͬͯͳ͍ͷͰ ෮‫ݩ‬ը૾ʹҰ؏ੑ͕ͳ͍ ෮‫͖Ͱݩ‬Δ (amortized inference) • NeRF΄Ͳ៉ྷʹੜ੒Ͱ͖ͳ͍ • ֶश࣌ؒ΋ͦ͜·Ͱ͔͔Βͳ͍ 13

14.

ख๏ 14

15.

NeRF-VAE • NeRFʹજࡏม਺Λ࣋ͨͤͯɺVAEͷΑ͏ʹֶश͢Δ͜ͱͰ ະ஌γʔϯͷ෮‫͕ݩ‬Մೳͳ‫֦ʹܗ‬ு • γʔϯؔ਺ͷೖྗʹ΋જࡏม਺ΛՃ͑Δ Gθ( ⋅ , z) : (x, d) ↦ ((r, g, b), σ) • γʔϯؔ਺ͷύϥϝʔλ θ ͸શγʔϯʹ‫ڞ‬௨ͳߏ଄Λֶश͠ જࡏม਺ z ͕γʔϯ͝ͱͷಛ௃Λଊ͑ΔΑ͏ʹͳΔ • ࣄલ෼෍ p (z) ͔Βαϯϓϧ͢Ε͹ɺ৽͍͠γʔϯͷੜ੒΋Ͱ͖Δ 15 c I z

16.

NeRF-VAE ࠷దԽ ̂ • ࢹ఺ c ͔ΒͷϨϯμϦϯά݁ՌΛ I = render (Gθ( ⋅ , z), c) ͱ͢Δͱ 2 ̂ I(i, j) ∣ I(i, j), σ ໬౓ؔ਺͸ pθ(I ∣ z, c) = lik) ∏ ( c i,j • ֶश͸GQNͱಉ༷ʹม෼Լքͷ࠷େԽ N log p I ∣ c , z − D q∥p ( ) ( ) k k KL q(z ∣ {Ik, ck} ) [ ∑ ] k=1 k=1 𝒩 𝔼 N 16 I z

17.

NeRF-VAE ࡉ͔͍޻෉ 1. Encoder (q) ͸ResNetͰ֤ը૾ΛຒΊࠐΜͩಛ௃ͷฏ‫ۉ‬Λऔͬͯ ਖ਼‫ن‬෼෍ͷύϥϝʔλʹม‫׵‬ 2. Encoderͷਪ࿦࣌ʹiterative amortized inferenceΛ࢖͏ 3. γʔϯؔ਺ Gθ( ⋅ , z) ʹattentionϕʔεͷ ΞʔΩςΫνϟΛ࢖͏ 17

18.

‫ݧ࣮‬ ‫‪18‬‬

19.

NeRFͱͷൺֱ • NeRFʹൺ΂ͯগͳ͍ࢹ఺਺Ͱ΋͏·͍͘͘ • ࢹ఺਺͕े෼ଟ͍৔߹͸NeRFͷํ͕͖Ε͍ʢ͜Ε͸౰વʣ 19

20.

GQN (CONV-AR-VAE) ͱͷൺֱ ϨϯμϦϯάͷҰ؏ੑ • GQN͸Ұ؏ੑ͕ͳ͍ʢ෺ମ͕‫ݱ‬ΕͨΓফ͑ͨΓ͍ͯ͠Δʣ • ఏҊ๏͸NeRFͰ‫ز‬Կతͳࣄલ஌͕ࣝೖ͍ͬͯΔͷͰɺৗʹҰ؏͍ͯ͠Δ 20

21.

GQN (CONV-AR-VAE) ͱͷൺֱ ෼෍֎΁ͷ൚Խ • GQN͸ֶश࣌ʹ‫ͱͨ͜ݟ‬ͷͳ͍ࢹ఺͸͏·͘ϨϯμϦϯάͰ͖ͳ͍ • ఏҊ๏͸͏·͘൚Խ͍ͯ͠Δ 21

22.

৽͍͠γʔϯͷੜ੒ • ࣄલ෼෍͔Βαϯϓϧ͢Δ͜ͱͰ৽͍͠γʔϯੜ੒΋Ͱ͖Δ • ‫ݪ‬ཧతʹ͸GQNͰ΋Ͱ͖Δ͸͕ͣͩଟ෼͜Μͳʹ៉ྷʹੜ੒Ͱ͖ͳ͍͸ͣ 22

23.

·ͱΊ & ‫ײ‬૝ • NeRFͱVAEΛ૊Έ߹ΘͤΔ͜ͱͰɺະ஌γʔϯͷ෮‫ݩ‬/ੜ੒͕Ͱ͖ΔϞσϧ NeRF-VAEΛఏҊ • Ұ؏ͨ͠ϨϯμϦϯά΍৽͍͠γʔϯͷੜ੒͕Մೳʹ ‫ײ‬૝ • ૉ௚ͳ֦ுͰྑͦ͞͏͕ͩɺ࣮‫݁ݧ‬Ռ͸ͦΕ΄Ͳ‫͕ͨ͠ؾ͍ͳ͘ڧ‬ • ͜Ε͕NeRFͱಉ͘͡Β͍ෳࡶͳγʔϯʹεέʔϧͨ͠Β͔ͳΓͦ͢͝͏ 23

24.

References [1] [DLྠಡձ]NeRF: Representing Scenes as Neural Radiance Fields for View Synthesis (https://www.slideshare.net/DeepLearningJP2016/dlnerf-representingscenes-as-neural-radiance-fields-for-view-synthesis) [2] [DLྠಡձ]Neural Radiance Field (NeRF) ͷ೿ੜ‫ͱ·ڀݚ‬Ί (https:// www.slideshare.net/DeepLearningJP2016/dlneural-radiance-field-nerf?ref=https:// deeplearning.jp/) [3] [DLྠಡձ]GQNͱؔ࿈‫ڀݚ‬ɼੈքϞσϧͱͷؔ܎ʹ͍ͭͯ (https:// www.slideshare.net/DeepLearningJP2016/dlgqn-111725780) 24