[DL輪読会]Learning Deep Mean Field Games for Modeling Large Population Behavior

-- Views

March 23, 18

スライド概要

2018/03/23
Deep Learning JP:
http://deeplearning.jp/seminar-2/

シェア

またはPlayer版

埋め込む »CMSなどでJSが使えない場合

関連スライド

各ページのテキスト
1.

DEEP LEARNING JP [DL Papers] “Learning deep mean field games for modeling large population behavior" or the intersection of machine learning and modeling collective processes 1 http://deeplearning.jp/

2.

: Learning deep mean fieldgamse for modeling large population behavior : Jiachen Yang, Xiaojing Ye, Rakshit Trivedi, Huan Xu, Hongyuan Zha • • • Georgia Institute of Technology and Georgia State University : ICLR2018 (Oral) • • Scores: 10, 8, 8 : • • Collective Behavior

3.

: • • Collective Behavior( ) • Mean Field Games(MFG) • Pros: • Cons: (= ) : Inference of MFG via Markov Decision Process(MDP) Optimization • • • MFG(discrete-time graph-state MFG) MFG MDP • • Twitter VAR, RNN

4.

: • • Arab Spring , Black Lives Matter movement, fake news, etc. • 1: • "Nothing takes place in the world whose meaning is not that of some maximum or minimum." by Euler • or • ( ) • https://openreview.net/forum?id=HktK4BeCZ • cf. ,

5.

2: • ⇄ • : topic1 topic1 topic2 topic1 topic2

6.

• MFG(discrete-time graph-state) • • e.g., , etc. : topic1 topic1 topic2 topic1 topic2

7.

• 1. 2. 3. ( ⇄ ) • Mean Filed Game 1. 2. 3. Time-Series-Analysis (e.g., VAR) Network-Analysis Mean Field Game , ?,

8.

Mean Field Game (MFG) • Ø N-player Ø !→∞ • e.g., • • • • opinion network • etc. ( : Gueant+ 2011)

9.

Mean Field Game (MFG) • MFG (Guent 2009): • • !→∞ • • Social Interactions of the mean field type • •

10.

Mean Field Game (MFG) • Social Interactions of the mean field type I DL DL …… 1 • 5 5 9 1 5 5 5 9 5 5 • N

11.

( ) Multi Agent Reinforcement Learning (MARL) • Mean Field Multi-Agent Reinforcement Learning (Yang+ 2018) • MARL Ø MARL Ø ( # $, & , 7($ 5 |&, $) Ø • j # # : !" $, & = ( # $, & + *+,- ~/(,- |&,,) [4" ($ 5 )] &

12.

Mean Field Game (MFG) • MFG Ø MFG Ø Ø : (=- ) agnostic MFG Toy-Problem Contribution: • • MFG Toy-Problem

13.

Discrete-time graph-state MFG : Discrete-time graph-state MFG • • d • !" # : • $"%& : t t, t+1 i i j (Mean) • : topic1 topic2 & $',+ !' # = 2 3 1 !+ # = 3 1 & 2 = , $+,' = 6 3 topic1 7 !' # + 1 = 9 topic2 !' # + 2 = 2 9

14.

Discrete-time graph-state MFG : Discrete-time graph-state MFG • • !" ($ % , '" % ): $ % = $" % ,"*+ '"- = '",+ , … , '",, !" ($ % , ' % )= !" ($ % , '" % ) • • i (where '- = '+- , … , ',- ) : topic1 topic2 topic2 topic1 !/ ($ % , '/ % ) $ % 2 2 ( ⇄ ) '/ %

15.

Discrete-time graph-state MFG • MFG # #34 # # ∑ • !"# = max [, , / + / !2 ] (backward Hamilton-Jacobi-Bellman equation, HJB) " " 2 "2 * () • -"#34 = ∑2 /2"# -2# • !"6 : • • t i - 7 , ! 8 , ," - # , /"# ," - # , /"# (forward Fokker-Planck equation) ( ) Dynamic Programing Trajectory - # , ! # • Ø HJB: Nash-Maximizer /"# 8 #97

16.

Inference on MFG via MDP optimization • … MDP MFG Trajectory

17.

Inference on MFG via MDP optimization MFG MDP • • • MFG MDP Ø Ø MFG Forward-Path • Settings • States: ! " , n • Actions: #" , n • Dynamics: !$"%& = ∑) #)$" !)" • Reward: * ! " , #" = ∑-$,& !$" ∑-),& #$)" .$) (! " , #$" ),

18.

Inference on MFG via MDP optimization • : MDP MFG HJB, Fokker-Planck HJB Fokker-Planck ! Nash-Maximizer!"

19.

Inference on MFG via MDP optimization • 1. 2. 3. ( ⇄ ) • MFG MDP V ∗ (% & ) = max [. % & , 0 + V ∗ % &23 ] Ø single-agent RL Ø Ø MDP , ⇄

20.

Experiments • : Twitter • d = 15 topics • n_timesteps=16, • n_episodes = 27, 27 15 16 Guided Cost Learning (Finn+ 2016) • • Forward-Path • • Deep • : Vector Autoregression(VAR), RNN ( ( 1episode , etc.) )

21.

Experiments • S0: • S2: A0: state-action

22.

Experiments • Jensen-Shannon-Divergence VAE, RNN • • ( ⇄ MFG RNN • RNN , MFG )

23.

Experiments • • ( )

24.

Conslusion • • MFG • MFG MDP Toy-Problem • • !" ($ % , ' % )= !" ($ % , '" % ) • ( • • ) Dynamics Model Network-Based Social

25.

• • or MFG VAR

26.

References • Gueant, Olivier. (2009). A reference case for mean field games models. Journal de Mathématiques Pures et Appliquées. 92. 276-294. 10.1016/j.matpur.2009.04.008. • Guéant O., Lasry JM., Lions PL. (2011) Mean Field Games and Applications. In: ParisPrinceton Lectures on Mathematical Finance 2010. Lecture Notes in Mathematics, vol 2003. Springer, Berlin, Heidelberg • Chelsea Finn, Sergey Levine, and Pieter Abbeel. Guided cost learning: Deep inverse optimal control via policy optimization. In International Conference on Machine Learning, pp. 49–58, 2016. • Yaodong Yang, Rui Luo, Minne Li, Ming Zhou, Weinan Zhang, Jun Wang, Mean Field Multi-Agent Reinforcement Learning, 2018, arxiv

27.

• MFG : • https://link.springer.com/content/pdf/10.1007%2Fs11537-007-0657-8.pdf • MFG : • https://www.sciencedirect.com/science/article/pii/S002178240900138X • : • https://terrytao.wordpress.com/2010/01/07/mean-field-equations/ • The causal mechanism for such waves is somewhat strange, though, due to the presence of the backward propagating equation – in some sense, the wave continues to propagate because the audience members expect it to continue to propagate, and act accordingly. (One wonders if these sorts of equations could provide a model for things like asset price bubbles, which seem to be governed by a similar mechanism.)