[DL輪読会]Manipulation-Independent Representations(MIR) for Successful Cross Embodiment Visual Imitation

161 Views

May 21, 21

#deep learning #Deep Learning #Manipulator-Independent Representations #Visual Imitation #Machine Learning #Robot Learning

スライド概要

2021/05/21
Deep Learning JP:
http://deeplearning.jp/seminar-2/

Deep Learning JP

@DeepLearning2023

スライド一覧

DL輪読会資料

またはPlayer版

埋め込む »CMSなどでJSが使えない場合

（ダウンロード不可）

関連スライド

【DL輪読会】KAN: Kolmogorov–Arnold Networks

Deep Learning JP 91K

【拡散モデル勉強会】拡散モデルの数理

Deep Learning JP 67.9K

【DL輪読会】Evolutionary Optimization of Model Merging Recipes モデルマージの進化的最適化

Deep Learning JP 61.2K

【DL輪読会】Conditional Flow Matching

Deep Learning JP 50.6K

【DL輪読会】Cosmos World Foundation Model Platform for Physical AI

Deep Learning JP 47.8K

【拡散モデル勉強会】Introduction to Diffusion Models

Deep Learning JP 47.5K

各ページのテキスト

DEEP LEARNING JP [DL Papers] Manipulation-Independent Representations(MIR) for Successful Cross Embodiment Visual Imitation XIN ZHANG, Matsuo Lab http://deeplearning.jp/

http://deeplearning.jp/

目次 1. 書誌情報 2. Introduction 3. Related Works 4. Manipulator-Independent Representations for Visual Imitation 5. Experiment, Evaluation 6. Discussion 2

書誌情報 ● タイトル： ○ Manipulator-Independent Representations for Visual Imitation ● 著者 ○ Yuxiang Zhou, Yusuf Aytar, Konstantinos Bousmalis ● 所属：DeepMind ● 投稿日：2021/3/18 (arXiv) ● 概要 ○ Cross-embodiment visual imitationで使える良い表現を学習したい. ○ 1.環境に依存せず、2.時空間の概念を取り入れ、3.行動の射影ができる ○ 条件を満たすManipulator-Independent Representations(MIR)を提案した。 3

https://arxiv.org/pdf/2103.09016.pdf

Imitation Learning：Easy to teach a new skill to a robot 4 Simple Remote, easy Easy Humanoid, mobile robot. Need to develop the interface Mapping the joint from different kind robot(human) Recent Advances in Robot Learning from Demonstration (Ravichandar el.at. 2020 ) 4

Introduction：Manipulation Imitation Learning 5

Related work - 模倣学習を、別の見方で、大きく２種類に分類できる。 1. 十分なデモンストレーションデータで、一般化した方策を学習させる 2. 初見のデモンストレーションで、特定の動きを模倣させる - - 種類１：一般化した方策を学習する - Behavioral Cloning - Inverse reinforcement learning 種類２：軌道でタスクを特定し、模倣する - One-shot Imitation Learning - Trajectory tacking 6

提案手法：MIR 1. Manipulator-Independent Representations（MIR表現を学習する） a. Cross-Domain Alignment i. Domain Gapに対応できるように b. Temporal Smoothness i. 時空間に関する概念をうまく取り入れるように c. Actionable Representations i. 行動できるような表現、後の学習に使えるように 2. Cross-Embodiment Imitation via RL（MIR表現を使って、模倣学習する） 7

1.a: Cross-Domain Alignment - Domain Gapに対して、domain-randomized simulated environmentsを用いる。 - 普通にやると、表現空間と方策は密に結合してしまう - - 提案手法MIRは、どのマニピュレータでも使える表現空間を学習したい - - manipulator’s body, actions space, the task at handは表現空間に基づくマニピュレータ、特定タスクの情報を含まない表現空間つまり、２つの特性を持つ表現が欲しい： 1. ドメインに影響させず 2. 環境に関する理解を射影できる 8

1.a: Cross-Domain Alignment - そこで、模倣学習について考え直すと、２種類の考え方がある。 1. mimic the movements of the manipulator（軌道重視） 2. replicate the effect of the manipulator on the environment（結果重視） - Case study：物体を持ってゴールまで運んでいく途中で物体を落とした。 - - どうする？主張：軌道の模倣というより、目的状態までの変化を再現するのが大事 9

10.

1.a: Cross-Domain Alignment (上)：マニピュレータに関する情報をエンコードするため (下)：環境の変化をキャプするため 10

11.

1.b: Temporal Smoothness 11

12.

1.b: Temporal Smoothness TCN：同一ドメインで、Temporal Smoothnessをやる TSCN：TCNをことなるドメインに汎化させる 12

13.

1.c : Actionable Representations 13

14.

1.c : Actionable Representations Goal Conditioned Policy：同一ドメイン Cross-Domain Goal-Conditional Policies(CD-GCP)：異なるドメインへ拡張 14

15.

2. Cross-Embodiment Imitation via RL 15

16.

2. Cross-Embodiment Imitation via RL 実際模倣するプロセス： - 学習済みMIRで、デモンストレーションが与えられる： - N-step後のゴールをサンプリングする： - 現在の観測oからゴールを達成した際の報酬： - RLアルゴリズムは、Maximum a Posteriori Policy Optimization (MPO)を使う。 16

17.

Experiments and Data - 7194のデモを収集、ドメインだけを変化させる二つのタスクで評価している。 - 積んだ物体を下ろすタスク - Objectを他のObjectの上に積むタスク 17

18.

Evaluation：Comparison of Imitation Performance Paired across domainの軌道を使わない手法：(上の２つ) - Navie Goal-Conditioned Policies(GCP) - Temporal Distance Classification(TDC) Paired across domainの軌道を使う手法：（下の３つ） - Time-Contrastive Networks(TCN) - Cross-Modal Distance Classification(CMC) - MIR 18

19.

Evaluation：Ablation Study MIRは、CD-GCPとTSCNから構成される 19

20.

Conclusion - Cross-embodiment visual imitationに対して、 MIRを提案した - 未知な形態のマニピュレータにも汎化できた(jaco hand) - human handのデモンストレーションでもある程度成功 - MIRをドメインシフトに強くした要素は、３つ - - focus on the change of object configurations(環境の用意) - temporally smooth（TSCN） - actionable(CD-GCPs) 感想 - マニピュレーションに関して、状態の変化に注目した研究 - 良い実験設定で、検証できたが、まだ研究が出てきそう。 - マニピュレーションだけでなく、行動に関する一般化的な話にでも適応できるのでは？ 20

21.

Appendix

22.

Appendix