[DL輪読会]Learning Deep Mean Field Games for Modeling Large Population Behavior

>100 Views

March 23, 18

#Deep Learning #Mean Field Game #Reinforcement Learning #Modelling Collective Behavior #Twitter Data Analysis

スライド概要

2018/03/23
Deep Learning JP:
http://deeplearning.jp/seminar-2/

Deep Learning JP

@DeepLearning2023

スライド一覧

DL輪読会資料

またはPlayer版

埋め込む »CMSなどでJSが使えない場合

（ダウンロード不可）

関連スライド

【DL輪読会】KAN: Kolmogorov–Arnold Networks

Deep Learning JP 90.3K

【拡散モデル勉強会】拡散モデルの数理

Deep Learning JP 66.5K

【DL輪読会】Evolutionary Optimization of Model Merging Recipes モデルマージの進化的最適化

Deep Learning JP 61K

【DL輪読会】Conditional Flow Matching

Deep Learning JP 48.2K

【拡散モデル勉強会】Introduction to Diffusion Models

Deep Learning JP 46.3K

【DL輪読会】Cosmos World Foundation Model Platform for Physical AI

Deep Learning JP 45.5K

各ページのテキスト

DEEP LEARNING JP [DL Papers] “Learning deep mean field games for modeling large population behavior" or the intersection of machine learning and modeling collective processes 1 http://deeplearning.jp/

http://deeplearning.jp/

: Learning deep mean fieldgamse for modeling large population behavior : Jiachen Yang, Xiaojing Ye, Rakshit Trivedi, Huan Xu, Hongyuan Zha • • • Georgia Institute of Technology and Georgia State University : ICLR2018 (Oral) • • Scores: 10, 8, 8 : • • Collective Behavior

: • • Collective Behavior( ) • Mean Field Games(MFG) • Pros: • Cons: (= ) : Inference of MFG via Markov Decision Process(MDP) Optimization • • • MFG(discrete-time graph-state MFG) MFG MDP • • Twitter VAR, RNN

: • • Arab Spring , Black Lives Matter movement, fake news, etc. • 1: • "Nothing takes place in the world whose meaning is not that of some maximum or minimum." by Euler • or • ( ) • https://openreview.net/forum?id=HktK4BeCZ • cf. ,

https://openreview.net/forum?id=HktK4BeCZ

2: • ⇄ • : topic1 topic1 topic2 topic1 topic2

• MFG(discrete-time graph-state) • • e.g., , etc. : topic1 topic1 topic2 topic1 topic2

• 1. 2. 3. ( ⇄ ) • Mean Filed Game 1. 2. 3. Time-Series-Analysis (e.g., VAR) Network-Analysis Mean Field Game , ?,

Mean Field Game (MFG) • Ø N-player Ø !→∞ • e.g., • • • • opinion network • etc. ( : Gueant+ 2011)

Mean Field Game (MFG) • MFG (Guent 2009): • • !→∞ • • Social Interactions of the mean field type • •

10.

Mean Field Game (MFG) • Social Interactions of the mean field type I DL DL …… 1 • 5 5 9 1 5 5 5 9 5 5 • N

11.

( ) Multi Agent Reinforcement Learning (MARL) • Mean Field Multi-Agent Reinforcement Learning (Yang+ 2018) • MARL Ø MARL Ø ( # $, & , 7($ 5 |&, $) Ø • j # # : !" $, & = ( # $, & + *+,- ~/(,- |&,,) [4" ($ 5 )] &

12.

Mean Field Game (MFG) • MFG Ø MFG Ø Ø : (=- ) agnostic MFG Toy-Problem Contribution: • • MFG Toy-Problem

13.

Discrete-time graph-state MFG : Discrete-time graph-state MFG • • d • !" # : • $"%& : t t, t+1 i i j (Mean) • : topic1 topic2 & $',+ !' # = 2 3 1 !+ # = 3 1 & 2 = , $+,' = 6 3 topic1 7 !' # + 1 = 9 topic2 !' # + 2 = 2 9

14.

Discrete-time graph-state MFG : Discrete-time graph-state MFG • • !" ($ % , '" % ): $ % = $" % ,"*+ '"- = '",+ , … , '",, !" ($ % , ' % )= !" ($ % , '" % ) • • i (where '- = '+- , … , ',- ) : topic1 topic2 topic2 topic1 !/ ($ % , '/ % ) $ % 2 2 ( ⇄ ) '/ %

15.

Discrete-time graph-state MFG • MFG # #34 # # ∑ • !"# = max [, , / + / !2 ] (backward Hamilton-Jacobi-Bellman equation, HJB) " " 2 "2 * () • -"#34 = ∑2 /2"# -2# • !"6 : • • t i - 7 , ! 8 , ," - # , /"# ," - # , /"# (forward Fokker-Planck equation) ( ) Dynamic Programing Trajectory - # , ! # • Ø HJB: Nash-Maximizer /"# 8 #97

16.

Inference on MFG via MDP optimization • … MDP MFG Trajectory

17.

Inference on MFG via MDP optimization MFG MDP • • • MFG MDP Ø Ø MFG Forward-Path • Settings • States: ! " , n • Actions: #" , n • Dynamics: !$"%& = ∑) #)$" !)" • Reward: * ! " , #" = ∑-$,& !$" ∑-),& #$)" .$) (! " , #$" ),

18.

Inference on MFG via MDP optimization • : MDP MFG HJB, Fokker-Planck HJB Fokker-Planck ! Nash-Maximizer!"

19.

Inference on MFG via MDP optimization • 1. 2. 3. ( ⇄ ) • MFG MDP V ∗ (% & ) = max [. % & , 0 + V ∗ % &23 ] Ø single-agent RL Ø Ø MDP , ⇄

20.

Experiments • : Twitter • d = 15 topics • n_timesteps=16, • n_episodes = 27, 27 15 16 Guided Cost Learning (Finn+ 2016) • • Forward-Path • • Deep • : Vector Autoregression(VAR), RNN ( ( 1episode , etc.) )

21.

Experiments • S0: • S2: A0: state-action

22.

Experiments • Jensen-Shannon-Divergence VAE, RNN • • ( ⇄ MFG RNN • RNN , MFG )

23.

Experiments • • ( )

24.

Conslusion • • MFG • MFG MDP Toy-Problem • • !" ($ % , ' % )= !" ($ % , '" % ) • ( • • ) Dynamics Model Network-Based Social

25.

• • or MFG VAR

26.

References • Gueant, Olivier. (2009). A reference case for mean field games models. Journal de Mathématiques Pures et Appliquées. 92. 276-294. 10.1016/j.matpur.2009.04.008. • Guéant O., Lasry JM., Lions PL. (2011) Mean Field Games and Applications. In: ParisPrinceton Lectures on Mathematical Finance 2010. Lecture Notes in Mathematics, vol 2003. Springer, Berlin, Heidelberg • Chelsea Finn, Sergey Levine, and Pieter Abbeel. Guided cost learning: Deep inverse optimal control via policy optimization. In International Conference on Machine Learning, pp. 49–58, 2016. • Yaodong Yang, Rui Luo, Minne Li, Ming Zhou, Weinan Zhang, Jun Wang, Mean Field Multi-Agent Reinforcement Learning, 2018, arxiv

27.

• MFG : • https://link.springer.com/content/pdf/10.1007%2Fs11537-007-0657-8.pdf • MFG : • https://www.sciencedirect.com/science/article/pii/S002178240900138X • : • https://terrytao.wordpress.com/2010/01/07/mean-field-equations/ • The causal mechanism for such waves is somewhat strange, though, due to the presence of the backward propagating equation – in some sense, the wave continues to propagate because the audience members expect it to continue to propagate, and act accordingly. (One wonders if these sorts of equations could provide a model for things like asset price bubbles, which seem to be governed by a similar mechanism.)