[DL輪読会]Composable Deep Reinforcement Learning for Robotic Manipulation

>100 Views

April 06, 18

スライド概要

2018/04/06
Deep Learning JP:
http://deeplearning.jp/seminar-2/

シェア

またはPlayer版

埋め込む »CMSなどでJSが使えない場合

関連スライド

各ページのテキスト
1.

1 DEEP LEARNING JP [DL Papers] “Composable Deep Reinforcement Learning for Robotic Manipulation” Zero-Shot Iori Yanokura Visual Imitation (ICLR 2018) http://deeplearning.jp/

2.
[beta]
'
•

2



L(
• UC BerkeleyLSergey LevineLT\^Y

• Project page
• https://sites.google.com/view/composing-real-world-policies/
• https://github.com/haarnoja/softqlearning

• !
• Tuomas Haarnoja, Vitchyr Pong, Aurick Zhou, Murtaza Dalal, Pieter Abbeel, and Sergey Levine

• #
• 2<6@ 1-/53>;8;7L&%
•
]ZVWN)FQ.<:=<?8;7,0<984ALR
• *


- R

IL]ZVWN)EHBQ

• U[SXKGBH
- "MJ$DJB+P(CO



3.

Maximum Entropy RL    Maximum Entropy  Entropy    http://bair.berkeley.edu/blog/2017/10/06/soft-q-learning/  (Ziebart 2010) 3

4.

4 Soft Q-Learning     Q  Soft Bellman Equation (Haarnoja, 2017)   

5.
[beta]
COMPOSITIONALITY OF MAXIMUM
ENTROPY POLICIES
• Compositionality

• Maximum Entropy RL)=62*? 5!
• 7policyB;A/2.&EDCB*,4B
• Multi-objective settings35 
(9
IHFGB .2B-/? 
1178,@=7B"#6-/?,43%>
<'J178 :0847B$+?,4

5

6.
[beta]
COMPOSITIONALITY OF MAXIMUM
ENTROPY POLICIES

6

DCB%
DCB;&:

#55+>

:AFGHE268<?0C9 .4
 #@"/>-69=34.4'1*$9
5;7(/>)I
"Q$@Q$:

6/>

Qc*::!
:RL26-:QΣ*7:Qc*9!'):

*5+4'8'

+,8>

7.

Bounding the Sub-Optimality of Composed Policies Appendix A KL-divergence Qc*overestimate    7

8.

Bounding the Sub-Optimality of Composed Policies D , γC*   8

9.

9 Experiments • Simulation  • MuJoCo • 7-DoF Sawyer Robot • Actions • Torque command at each joint • Observations •   • Q-function, policy • 100 or 200 unit  

10.

B. Composing Policies for Pushing in Simulation • MuJoCo  • /    •     10

11.

C. SQL for Real-World Manipulation • Sawyer" • • • • Reaching Stack policy Avoid an obstacle 2compositionality of soft policies Reaching Lego Stacking 2   ('$& ( %$#" !. RL) 11

12.

C. SQL for Real-World Manipulation compositionality of soft policies Avoid a fixed obstacle policy  Lego Stacking policy || Composing Policies  https://www.youtube.com/watch?time_continue=10&v=wdexoLS2cWU 12

13.
[beta]
13

Conclusion
• Soft Q-learning0
?Model-free RL(DDPG, NAF)FE3;F47
• SQL?Composing Policy?
•

• 
•
?


• BD

=!)'*)+%($:@.A2-2?/

• 49<? =LKI0:1E?/

• ">MHGKFC5F
• JNOPQL/C

:1E/

47*)&%#,F89

:#)'*)+%($:16.