[DL輪読会]近年のエネルギーベースモデルの進展

>100 Views

January 24, 20

スライド概要

2020/01/24
Deep Learning JP:
http://deeplearning.jp/seminar-2/

シェア

またはPlayer版

埋め込む »CMSなどでJSが使えない場合

関連スライド

各ページのテキスト
1.

ۙ೥ͷΤωϧΪʔϕʔεϞσϧͷਐల Shohei Taniguchi, Matsuo Lab (M1) 1

2.

എ‫ܠ‬ • ࠷ۙɺΤωϧΪʔϕʔεϞσϧ (EBM) ͕·ͨ஫໨͞Ε࢝Ί͍ͯΔ(?) • ҎԼͷ2ຊͷ࿦จΛϝΠϯͰ঺հ - Flow Contrastive Estimation of Energy-Based Models ‣ ϑϩʔͱΤωϧΪʔϕʔεͷ2ͭͷੜ੒ϞσϧΛಉ࣌ʹֶश͢Δ - Your Classifier is Secretly an Energy Based Model and You Should Treat it Like One ‣ ΤωϧΪʔϕʔεϞσϧΛ࢖ͬͯੜ੒Ϟσϧͱࣝผ‫ث‬Λಉ࣌ʹֶ श͢Δ 2

3.

Outline 1. લఏ஌ࣝɿEnergy Based Model (EBM) - EBMͷओͳֶशͷ࢓ํ ‣ Contrastive Divergence Learning (CD๏) ‣ Noice Contrastive Estimation (ϊΠζରরਪఆ) 2. EBMͷྺ࢙ - Restricted Boltzmann Machine (RBM) ͱͦͷ‫ޙ‬ 3. Flow Contrastive Estimation of Energy-Based Models 4. Your Classifier is Secretly an Energy Based Model and You Should Treat it Like One 3

4.

લఏ஌ࣝɿEnergy Based Model 4

5.

EBMͱ͸ • σʔλxͷ֬཰ີ౓ؔ਺pθ (x)ΛɺxΛೖྗͱͯ͠εΧϥʔΛฦ͢Τωϧ Ϊʔؔ਺Eθ (x)Λ༻͍ͯɺҎԼͷΑ͏ʹఆٛ͢Δ pθ (x) = exp (−Eθ (x)) Z (θ) Z (θ) = exp (−Eθ (x)) dx ( ) ∫ - Z (θ)͸෼഑ؔ਺ͱ‫ݺ‬͹ΕΔ 5

6.

EBMͷ࢖͍Ͳ͜Ζ • ີ౓(ൺ)ਪఆ - ΤωϧΪʔؔ਺͸ີ౓ͷର਺ͷෛͷ஋ʹൺྫ͢ΔͷͰɺෳ਺ͷσʔλͷ ີ౓ͷൺֱ͕Մೳ ‣ ͨͩ͠ɺີ౓ͦͷ΋ͷ͸෼഑ؔ਺͕͋ΔͨΊ‫͍ͳ͖Ͱࢉܭ‬ ‣ ‫ޙ‬ड़ͷNCEΛ࢖͑͹ີ౓ͷਪఆ஋͸‫·ٻ‬Δ - ҟৗ‫ݕ‬஌ͳͲʹ༗༻ (?) • σʔλͷαϯϓϦϯά - ΤωϧΪʔؔ਺͕ඍ෼ՄೳͰ͋Ε͹HMCͰσʔλͷαϯϓϦϯά͕Մೳ ‣ ͨͩ͠ɺߴ࣍‫ͱͩݩ‬MCMCͷऩଋ͕஗͍ͷͰ࣮ࡍʹ͸‫͍͠ݫ‬ 6

7.

EBMͷֶश • ‫ج‬ຊతʹ͸‫܇‬࿅σʔλʹର͢Δର਺໬౓log pθ (x)Λ࠷େԽ͢ΔΑ͏ʹ ֶश͢Δ (࠷໬ਪఆ) - ͔͠͠ଟ͘ͷ৔߹ɺ෼഑ؔ਺Z (θ)͸ੵ෼Λ‫ؚ‬ΉͷͰ‫͍ͳ͖Ͱࢉܭ‬ ➡ ໬౓΋‫͍ͳ͖Ͱࢉܭ‬ - ͜ͷͨΊɺEBMͷֶशʹ͸෼഑ؔ਺ͷ‫ࢉܭ‬Λආ͚ΔͨΊͷ޻෉͕ඞ ཁͱͳΔ 7

8.

Contrastive Divergence Learning (CD๏) • SGDͳͲͷޯ഑ϕʔεͰֶश͢Δʹ͸ɺର਺໬౓ͦͷ΋ͷͰ͸ͳ͘ɺͦ ∂ log pθ (x) ͷύϥϝʔλʹ͍ͭͯͷޯ഑ ͕෼͔Ε͹े෼ ∂θ - ͜Ε͸ҎԼͷΑ͏ʹ‫ࢉܭ‬Մೳ ∂ log pθ (x) ∂Eθ (x) ∂Eθ (x) = 𝔼pθ(x) − 𝔼pdata(x) [ ∂θ ] [ ∂θ ] ∂θ ‣ ཁ͸ΤωϧΪʔؔ਺ͷޯ഑ΛϞσϧ͔Βͷαϯϓϧͱσʔλ͔Β ͷαϯϓϧʹରͯ͠‫ࠩͯ͠ࢉܭ‬ΛऔΕ͹Α͍ ‣ Ϟσϧpθ (x)͔Βͷαϯϓϧ͸MCMCͱ͔Ͱ‫ؤ‬ுͬͯऔΔ 8

9.

CD๏ͷ՝୊ • Ϟσϧpθ (x)͔ΒαϯϓϧΛऔΔͷ͕ΊΜͲ͍͘͞ - MCMC͸σʔλ͕ߴ࣍‫ͳʹݩ‬Δͱऩଋʹ͕͔͔࣌ؒΔ - ύϥϝʔλͷߋ৽Λ͢ΔͨͼʹMCMCͰαϯϓϧΛऔ͍ͬͯͨΒඇ ৗʹ͕͔͔࣌ؒͬͯ͠·͏ ➡ Ϟσϧ͔ΒαϯϓϧΛऔΒͳͯ͘΋ྑ͍ํ๏͕ཉ͍͠ 9

10.

Noise Contrastive Estimation (NCE, ϊΠζରরਪఆ) • ·ͣɺ෼഑ؔ਺Z (θ)ࣗମΛผͷύϥϝʔλcͰਪఆͤ͞Δ͜ͱʹ͠ɺ log pθ (x) = − Eθ (x) − cΛ࠷େԽͤ͞Δ͜ͱΛߟ͑Δ - ͜ͷͱ͖ɺҎԼͷ໨తؔ਺Λ࠷େԽ͢ΔΑ͏ʹֶश͢Δͱɺθ͸ର਺໬౓͕࠷େ Խ͞Εɺc͸Z (θ)ʹҰக͢Δ͜ͱ͕஌ΒΕ͍ͯΔ pθ(x) q(x) J (θ) = 𝔼pdata(x) log + 𝔼q(x) log pθ(x) + q(x) ] pθ(x) + q(x) ] [ [ ‣ ͨͩ͠ɺq (x)͸ͳΜΒ͔ͷϊΠζ෼෍ (Ψ΢γΞϯϊΠζͱ͔) ‣ ௚‫ײ‬తʹ͸σʔλ͔ΒͷαϯϓϧͱϊΠζΛ‫ݟ‬෼͚ΒΕΔΑ͏ʹֶश͞ΕΔ ‣ ࣮͸GANͱগؔ͠܎͕͋Δ (‫ޙ‬ड़) 10

11.

NCEͷ՝୊ • q (x)ΛͲͷΑ͏ʹબͿ͔ - q (x)͕ຬͨ͢΂͖৚݅ɿ ① ີ౓͕؆୯ʹ‫͖Ͱࢉܭ‬Δ ② ؆୯ʹαϯϓϧͰ͖Δ ③ σʔλ෼෍pdata (x)ʹ͍ۙ͜ͱ͕๬·͍͠ - ①, ② ͸ΘΓͱ؆୯͕ͩɺ③͕೉͍͠ ‣ ͱ͍͏͔ɺ࠷ॳ͔Βσʔλ෼෍ʹ͍ۙ෼෍͕ಘΒΕ͍ͯΔͳΒɺ Θ͟Θ͟EBMΛֶश͢Δඞཁ͸ͳ͍ 11

12.

EBMͷྺ࢙ 12

13.

EBMొ৔ͷഎ‫ܠ‬ɿࣄલֶश • ॳ‫ظ‬ͷਂ૚ֶशͰ͸ɺଟ૚ͷϞσϧΛֶश͢Δʹ͸ɺࣄલֶश͕ෆՄ ܽͩͬͨ • ࣄલֶशͷख๏ͱͯ͠େ͖͘2ͭ͋ͬͨ - ࠶ߏ੒ϕʔε: ૚͝ͱʹ࠶ߏ੒‫ࠩޡ‬Λ࠷খԽͤ͞ΔΑ͏ʹֶश ɹɹɹɹɹɹ e.g. Autoencoder, Denoising AE - EBMϕʔε: ֤χϡʔϩϯΛ2஋ͷ֬཰ม਺ͱͯ͠૚͝ͱʹର਺ ɹɹɹɹɹ ໬౓Λ࠷େԽ͢ΔΑ͏ʹֶश ɹɹɹɹe.g. Restricted Boltzmann Machine, Deep Boltzmann Machine 13

14.

Restricted Boltzmann Machine (RBM) • ӅΕϢχοτhiΛͦΕͧΕ2஋ͷ֬཰ม਺ͱͯ͠ɺ P (hi = 1 | v) = σ (v⊤W:,i + bi)ͱ͢ΔͱɺΤωϧΪʔ ؔ਺͸ E(v, h) = − b ⊤v − c ⊤h − v ⊤Wh ͱͳΔ • ͜ΕΛશͯͷhʹ͍ͭͯपลԽͯ͠ਖ਼‫ن‬Խͨ͠໬౓ RBM 1 p (v) = p(v, h) p(v, h) = exp(−E(v, h)) ∑ ( ) Z i Λ࠷େԽ͢ΔΑ͏ʹֶश (hi͕2஋ͳͷͰपลԽ͸؆୯ʹ‫͖Ͱࢉܭ‬Δ) • ֶश͸CD๏Ͱߦ͏͜ͱ͕ଟ͍ • ͜ΕΛଟ૚ʹੵΈॏͶֶͯश͢Δͷ͕Deep Boltzmann Machine 14

15.

RBMҎ‫ޙ‬ͷEBM • RBM΍DBMͳͲͷEBMΛ༻͍ͨࣄલֶश͸౰࣌ඇৗʹ༗ޮ͕ͩͬͨɺ ReLU΍υϩοϓΞ΢τͷొ৔΍ॳ‫ظ‬Խख๏ͷൃలʹΑΓɺࣄલֶश͸ ࢖ΘΕͳ͘ͳͬͨ • ੜ੒Ϟσϧͱͯ͠΋VAE, GANͳͲͷొ৔ͱͱ΋ʹଘࡏ‫ͨͬͳ͘ͳ͕ײ‬ • ‫ࢉܭ‬ਆ‫ܦ‬ՊֶͰͷ೴ͷϞσϧͱͯ͠͸͍·ͩʹΑ͘࢖ΘΕΔ • ͳͥ࠷ۙ·ͨ஫໨͞Ε͍ͯΔʁ ➡ ΤωϧΪʔؔ਺ͷ࢖͍ํ͕มԽͨ͜͠ͱͰɺ༷ʑͳ༻్ʹ࢖͑ΔΑ ͏ʹͳ͖ͬͯͨ 15

16.

EBMͷࠓੲ ੲͷEBM (RBMͳͲ) ࠷ۙͷEBM • ӅΕ૚ͷ֤χϡʔϩϯΛ2஋ͷ • ΤωϧΪʔؔ਺ͦͷ΋ͷΛNNͰ ֬཰ม਺ͱߟ͑ͯɺͦͷશମʹ ରͯ͠ΤωϧΪʔؔ਺Λఆٛ • ӅΕ૚ͷχϡʔϩϯʹ͍ͭͯप ลԽͨ͠஋Λ༻ֶ͍ͯश E (v, h (1), h (2), h (3)) = − v ⊤W (1)h (1) − h (1)⊤W (2)h (2) − h (2)⊤W (3)h (3) ఆٛ͢Δ (NN͸શମͱͯ͠1ͭͷ ܾఆ࿦తͳؔ਺ͱߟ͑Δ) • ΤωϧΪʔؔ਺ͷग़ྗΛͦͷ· ·༻ֶ͍ͯश E (v) = NN (v) = w (n) ⋯φ (W (2)φ (W (1)v + b (1)) + b (2)) + b (n) ( ) ֶश๏ͳͲ͸‫ڞ‬௨͕ͩɺ࢖͍ํ͕͔ͳΓҧ͏͜ͱʹ஫ҙ 16

17.

࠷ۙͷEBMͷྫ • Implicit Generation and Modeling with Energy-Based Models (NeurIPS 2019) - EBMͰ៉ྷͳը૾ੜ੒͕Ͱ͖ΔΑ͏ʹͳͬͨ - ֶश͸CD๏ϕʔε - ਖ਼ଇԽͳͲΛ޻෉͢Δ͜ͱͰ͔ͳΓੜ੒ͷ ࣭͕վળ - ࠓճ͸ৄ͘͠͸औΓ্͛·ͤΜ 32x32 Imagenet 17

18.

Flow Contrastive Estimation of Energy-Based Models 18

19.

ॻࢽ৘ใ • ஶऀ ੜ੒αϯϓϧ (flow) - Ruiqi Gao, Erik Nijkamp, Diederik P. Kingma, Zhen Xu, Andrew M. Dai, Ying Nian Wu • NeurIPS 2019 Bayesian Deep Learning Workshop • Kingmaܑ‫و‬ͷ৽࡞ • NCEϕʔεͰEBMΛֶश͠ͳ͕ΒflowϞσϧ΋ಉ࣌ʹֶश͢Δ • ੜ੒Ϟσϧͷ৭ʑͳ஌‫͍ͯͯ͠ྲྀ߹͕ݟ‬ΊͪΌΊͪΌ໘ന͍ 19

20.

Noise Contrastive Estimation (࠶) • ·ͣɺ෼഑ؔ਺Z (θ)ࣗମΛผͷύϥϝʔλcͰਪఆͤ͞Δ͜ͱʹ͠ɺ log pθ (x) = − Eθ (x) − cΛ࠷େԽͤ͞Δ͜ͱΛߟ͑Δ - ͜ͷͱ͖ɺҎԼͷ໨తؔ਺Λ࠷େԽ͢ΔΑ͏ʹֶश͢Δͱɺθ͸ର਺໬౓͕࠷େ Խ͞Εɺc͸Z (θ)ʹҰக͢Δ͜ͱ͕஌ΒΕ͍ͯΔ pθ(x) q(x) J (θ) = 𝔼pdata(x) log + 𝔼q(x) log pθ(x) + q(x) ] pθ(x) + q(x) ] [ [ ‣ ͨͩ͠ɺq (x)͸ͳΜΒ͔ͷϊΠζ෼෍ (Ψ΢γΞϯϊΠζͱ͔) ‣ ௚‫ײ‬తʹ͸σʔλ͔ΒͷαϯϓϧͱϊΠζΛ‫ݟ‬෼͚ΒΕΔΑ͏ʹֶश͞ΕΔ ‣ ࣮͸GANͱগؔ͠܎͕͋Δ (‫ޙ‬ड़) 20

21.

NCEͷ՝୊ (࠶) • q (x)ΛͲͷΑ͏ʹબͿ͔ - q (x)͕ຬͨ͢΂͖৚݅ɿ ① ີ౓͕؆୯ʹ‫͖Ͱࢉܭ‬Δ ② ؆୯ʹαϯϓϧͰ͖Δ ③ σʔλ෼෍pdata (x)ʹ͍ۙ͜ͱ͕๬·͍͠ - ①, ② ͸ΘΓͱ؆୯͕ͩɺ③͕೉͍͠ ‣ ͱ͍͏͔ɺ࠷ॳ͔Βσʔλ෼෍ʹ͍ۙ෼෍͕ಘΒΕ͍ͯΔͳΒɺ Θ͟Θ͟EBMΛֶश͢Δඞཁ͸ͳ͍ 21

22.

Flow Contrastive Estimation (FCE) • ϊΠζ෼෍q (x)ʹflowϞσϧΛಉ࣌ʹֶश͠ͳ͕Β࢖͏ͱ͍͏ͷ͕ϝΠϯ ΞΠσΞ - flow͕Θ͔Βͳ͍ਓ͸ླ໦͞ΜͷࢿྉΛࢀর https://www.slideshare.net/DeepLearningJP2016/dlflowbased-deepgenerative-models • ͜ͷͱ͖ɺflowϞσϧqα (x)ͷֶश͸௨ৗͷ໬౓࠷େԽͰ΍ͬͯ΋Α͍͕ɺ FCEͰ͸NCEͷ໨తؔ਺ΛEBMϞσϧͱ͸‫࠷ʹٯ‬খԽ͢ΔΑ͏ʹֶश͢Δ qα (gα(z)) pθ(x) V(θ, α) = 𝔼pdata(x) log + 𝔼p(z) log pθ(x) + qα(x) ] [ [ pθ (gα(z)) + qα (gα(z)) ] - ͭ·ΓɺEBMͱflowΛఢରతʹֶशͤ͞Δ 22

23.

FCEͰflow͸ԿΛֶश͍ͯ͠Δͷ͔ʁ qα (gα(z)) pθ(x) V(θ, α) = 𝔼pdata(x) log + 𝔼p(z) log pθ(x) + qα(x) ] [ [ pθ (gα(z)) + qα (gα(z)) ] • ͜ͷࣜɺΑ͘‫ݟ‬ΔͱGANʹΊͬͪΌࣅͯΔ pθ(x) ͸ x ͕EBMͷαϯϓϧͰ͋Δ֬཰ pθ(x) + qα(x) qα (gα(z)) pθ (gα(z)) + qα (gα(z)) ͸gα(z) ͕flow͔ΒͷαϯϓϧͰ͋Δ֬཰ • ͜ΕΛ࠷খԽ͢Δͱ͍͏͜ͱ͸ɺEBMͱflow͔Βͷαϯϓϧͷ‫ݟ‬෼͚͕͔ͭͳ ͘ͳΔΑ͏ʹֶश͢Δͱ͍͏͜ͱ 23

24.

Vͷ࠷খԽ = JSD࠷খԽ • EBMͷֶश͕ਐΉͱɺEBMͷ෼෍͸σʔλ෼෍ʹ઴͍ۙͯ͘͠ͷͰɺ flow͸࠷ऴతʹਅͷσʔλ෼෍ʹରͯ͠ఢରతʹֶश͞ΕΔ ➡ GANͱಉ͡ • GANͱಉ༷ʹɺEBMͷ෼෍͕σʔλ෼෍ʹҰக͍ͯ͠Δঢ়ଶʹ͓͍ ͯɺVͷ࠷খԽ͸Jensen-Shannon Divergence (JSD) ͷ࠷খԽͱ౳Ձ JSD (qα∥pdata) = KL (pdata∥ (pdata + qα) /2) + KL (qα∥ (pdata + qα) /2) 24

25.

FCEͷར఺ • EBMͱflowϞσϧ͕ಉ࣌ʹֶशͰ͖Δ - flow͸σʔλͷαϯϓϧ͸༰қ͕ͩɺϠίϏΞϯͷ‫ʹࢉܭ‬ΑΓɺ ΞʔΩςΫνϟʹ੍໿͕͋ΔͷͰද‫͕ྗݱ‬΍΍௿͍ - EBM͸ද‫ྗݱ‬͸ߴ͍͕ɺσʔλΛαϯϓϧ͢Δʹ͸MCMCͳͲΛ࢖ Θͳ͚Ε͹ͳΒͣ໘౗ ➡ ྆ํಘΒΕΔͷͰɺ͍͍ͱ͜औΓ͕Ͱ͖Δ ‣ σʔλͷີ౓ਪఆʹ͸EBMΛ࢖͍ɺαϯϓϦϯάʹ͸flowΛ࢖͏ ͳͲ 25

26.

࣮‫ݧ‬1 ਓ޻2DσʔλͰີ౓ਪఆ • 1൪ࠨͷΑ͏ͳ෼෍ͰಘΒΕͨσʔλ Ͱֶशͨ͠Ϟσϧͷີ౓ͷ෼෍ΛՄࢹԽ - Glow-MLE: ࠷໬๏Ͱֶशͨ͠Glow - Glow-FCE: FCEͰֶशͨ͠Glow - EBM-FCE: FCEͰֶशͨ͠EBM • FCEͰֶशͨ͠EBM͕1൪៉ྷʹີ౓Λ ਪఆͰ͖͍ͯΔ 26

27.

࣮‫ݧ‬1 ਓ޻2DσʔλͰີ౓ਪఆ • EBMͷີ౓ਪఆͷਫ਼౓ͷֶश‫ۂ‬ઢ • GlowΛ࠷໬ਪఆͰࣄલֶश͔ͯ͠ΒFCEͨ͠৔߹ (trained)ΑΓ΋ɺϥ ϯμϜͳॳ‫ظ‬ԽͰ࠷ॳ͔ΒFCEͰಉ࣌ʹֶशͨ͠৔߹ (rand) ͷํ͕ҙ ֎ʹ΋ऩଋ͕ૣ͍ 27

28.

࣮‫ݧ‬2 ࣮ը૾σʔλ FCEͰֶशͨ͠Glowͷੜ੒ը૾ FIDͷൺֱɹɹɹɹɹɹɹɹɹɹςετσʔλʹର͢Δෛͷର਺໬౓ 28

29.

FCE·ͱΊ • NCEͰEBMΛֶश͢ΔࡍʹɺϊΠζ෼෍ʹflowΛಉ࣌ʹఢରతʹֶश ͤ͞ΔFlow Contrastive Estimation (FCE) ΛఏҊ • σʔλͷαϯϓϧ͸flowͰɺີ౓ਪఆ͸EBMΛ༻͍ΔͳͲͷ྆ऀͷར ఺Λੜ͔ͨ͠࢖͍ํ͕Մೳʹ • flowϞσϧ͸GANͷgeneratorͱಉ͡ࢦඪ (JSD) Ͱֶश͍ͯ͠Δ͕ɺ discriminator͕‫׬‬શʹࣝผͰ͖Δͱgeneratorͷޯ഑͕ফ͑ͯ͠·ֶͬͯ श͕ෆ҆ఆͳGANͱҧͬͯɺ҆ఆֶͯ͠शͰ͖ͦ͏ • EBMΛ҆ఆֶͯ͠शͰ͖Δख๏ͱͯ͠΋ߩ‫ݙ‬͸େ͖͍ 29

30.

Your Classifier is Secretly an Energy Based Model and You Should Treat it Like One 30

31.

ॻࢽ৘ใ • ஶऀ - Will Grathwohl, Kuan-Chieh Wang, Jörn-Henrik Jacobsen, David Duvenaud, Mohammad Norouzi, Kevin Swersky • ICLR 2020 accepted (8, 8, 6) • σʔλxͱϥϕϧyͷಉ࣌෼෍ͷΤωϧΪʔؔ਺Λߟ͑Ε͹ɺࣝผϞσ ϧp (y | x)ͱੜ੒Ϟσϧp (x)Λಉ࣌ʹಘΒΕΔͱ͍͏࿦จ • ‫ݴ‬ΘΕͯΈΕ͹ͦ͏͚ͩͲࢥ͍͔ͭͳ͔ͬͨͱ͍͏‫͡ײ‬ͷ಺༰ • ࣮‫ݧ‬͸͔ͳΓׂѪ͢ΔͷͰɺ‫ͳʹؾ‬Δਓ͸‫ݩ‬࿦จΛಡΜͰ͍ͩ͘͞ 31

32.

Joint Energy based Model (JEM) • ࣝผϞσϧͰ࠷‫ʹޙ‬softmaxΛ͔͚Δલͷ஋͸ෛͷΤωϧΪʔؔ਺ͱΈ ͳͤΔͱ͍͏ͷ͕ओͳண૝ pθ(y | x) = exp (fθ(x)[y]) ∑y′ exp (fθ(x)[y′]) • ͜ΕΛ༻͍Δͱɺx ͱ y ͷಉ࣌෼෍͸ pθ(x, y) = exp (fθ(x)[y]) Z(θ) , Z (θ) = ∫∑ y′ exp (fθ(x)[y′]) dx 32

33.

Joint Energy based Model (JEM) • ಉ࣌෼෍ͷର਺໬౓͸ɺࣝผϞσϧͱੜ੒ϞσϧͷͦΕͧΕͷର਺໬ ౓ͷ࿨Ͱද͞ΕΔͷͰɺ͜ΕΛಉ࣌ʹ࠷େԽ͢Ε͹Α͍ log pθ(x, y) = log pθ(x) + log pθ(y | x) • ୈ2߲͸௨ৗͷࣝผϞσϧͷֶश • ୈ1߲͸xͷΤωϧΪʔؔ਺Eθ (x)͕ҎԼͷΑ͏ʹ؆୯ʹ‫·ٻ‬ΔͷͰɺ ී௨ͷCD๏ͰֶशͰ͖Δ (ଟ෼NCEͰ΋Ͱ͖Δ͸ͣ) Eθ(x) = − LogSumExpy (fθ(x)[y]) = − log ∑ y exp (fθ(x)[y]) 33

34.

JEMͷར఺ • ࣝผϞσϧɺੜ੒Ϟσϧ͕ಉ࣌ʹಘΒΕΔ - ΤωϧΪʔؔ਺ͰϥϕϧΛ‫ݻ‬ఆͯ͠΍Ε͹ɺclass-conditionalͳੜ੒ ΋Մೳ • ൒‫͋ࢣڭ‬Γʹ΋؆୯ʹ֦ுͰ͖Δ - ϥϕϧ͕ͳ͍σʔλ͸୯ʹੜ੒Ϟσϧଆֶ͚ͩश͢Ε͹Α͍ 34

35.

࣮‫ࣝ ݧ‬ผɾੜ੒Ϟσϧͷಉֶ࣌श • ࣝผɺੜ੒ͱ΋ʹ୯ମͰֶशͨ͠΋ͷʹඖఢ͢Δਫ਼౓ CIFAR10 • class-conditionalͳੜ੒ը૾ 35

36.

JEMͷ՝୊ • ΍͸ΓCD๏ͰͷEBMͷֶश͕ը૾ͱ͔ͩͱ೉͍͠ - ෼഑ؔ਺͕͋ͬͯ໬౓͕‫·ٻ‬Βͳ͍ͷͰɺֶश͕͏·͍͍ͬͯ͘Δ ͔‫ͮ͠ূݕ‬Β͍ - MCMCͰͷαϯϓϦϯάΛ࢖ֶͬͯश͢Δͷ͸΍͸Γෆ҆ఆ ‣ ࡉ͔͍νϡʔχϯάʹ͔ͳΓηϯγςΟϒΒ͍͠ ‣ લ൒ͷFCEΛ࢖͏ͱղܾ͢Δ͔΋ (?) 36

37.

શମ·ͱΊ • ۙ೥ͷΤωϧΪʔϕʔεϞσϧͷਐలʹ͍ͭͯ·ͱΊͨ • ࣄલֶशϞσϧͱͯ͠ͷRBMͷࠒͱ͸ҧ͍ɺ࠷ۙ͸͔ͳΓॊೈͳ࢖Θ Εํ͕͞Ε͖͍ͯͯΔ • EBM͸ֶशΛ҆ఆͤ͞Δͷ͕೉͔͕ͬͨ͠ɺલ൒ͷFCE͸EBMͷ҆ఆ తͳֶशʹ͔ͳΓߩ‫͍ͯ͠ݙ‬Δͱࢥ͏ - JEMͷֶशʹFCEΛ࢖ͬͨΒͲ͏ͳΔ͔͸‫ݸ‬ਓతʹ‫ͳʹؾ‬Δ • ಉ͘͡NCEΛ࢖͏૬‫ޓ‬৘ใྔ‫ܥ‬ͷख๏ͱͷؔ࿈ͳͲɺࠓ‫ޙ‬΋͍Ζ͍Ζ ͱ‫͕ڀݚ‬ਐలͦ͠͏ • Ռͨͯ͠EBMϒʔϜ͕དྷΔͷ͔(?) 37