【DL輪読会】Code as Policies: Language Model Programs for Embodied Control

>100 Views

November 11, 22

@deep learning jp

スライド概要

2022/11/11
Deep Learning JP
http://deeplearning.jp/seminar-2/

Deep Learning JP

@DeepLearning2023

スライド一覧

DL輪読会資料

またはPlayer版

埋め込む »CMSなどでJSが使えない場合

関連スライド

【DL輪読会】Evolutionary Optimization of Model Merging Recipes モデルマージの進化的最適化

Deep Learning JP 23.7K

【DL輪読会】Generative Agents: Interactive Simulacra of Human Behavior

Deep Learning JP 12.7K

【DL輪読会】4D Gaussian Splatting for Real-Time Dynamic Scene Rendering

Deep Learning JP 11.8K

【DL輪読会】LLMベースの自律型エージェントシステムのサーベイ

Deep Learning JP 11.1K

【DL輪読会】LightGlue: Local Feature Matching at Light Speed

Deep Learning JP 9.5K

【DL輪読会】Open Problems and Fundamental Limitations of Reinforcement Learning from Human Feedback

Deep Learning JP 7.8K

各ページのテキスト

DEEP LEARNING JP [DL Papers] Code as Policies: Language Model Programs for Embodied Control Keno Harada, M2, the University of Tokyo http://deeplearning.jp/

http://deeplearning.jp/

書誌情報論文名 Code as Policies: Language Model Programs for Embodied Control 著者 Jacky Liang, Wenlong Huang, Fei Xia, Peng Xu, Karol Hausman, Brian Ichter, Pete Florence, Andy Zeng (Robotics at Google) 概要大規模言語モデルによるプログラム生成を用いて、指示文のコメントと小サンプルのプロンプトからロボットの行動方策のプログラムを生成. あらかじめ準備する行動、認識APIとプロンプト文を工夫することによりPerception-actionのフィードバックループを必要とするようなタスクに応じた行動方策の記述を可能に. Link https://code-as-policies.github.io/ https://ai.googleblog.com/2022/11/robots-that-write-their-owncode.html 2

https://ai.googleblog.com/2022/11/robots-that-write-their-own-code.html

背景: 大規模言語モデルを用いたプランニング + 行動の課題 Perception-actionのフィードバックループを必要とするようなタスク(指示文)に応じた行動方策を柔軟に設計できない • スキルをあらかじめ準備し、タスクプランニングを大規模言語モデルに任せる(SayCanなど) - あらかじめ準備したスキルの選択、順序を決めるのみ - スキルの追加は大量のデータを用いたBC, RLが必要現状のパイプラインで実行できないタスク • 知覚と行動が結びついているタスク: “オレンジが見えたらリンゴを置いて” • 常識を反映するようなタスク: “より早く動いて” • 空間の相対関係を考慮するタスク: “リンゴをもう少し左に動かして” 3

大規模言語モデルを用いたプログラム生成に着目プロンプト指示文出力 From Code as Policies: Language Model Programs for Embodied Control 4

https://arxiv.org/abs/2209.07753

関連研究:大規模言語モデルを使用してタスクのサブタスクを記述、場面に合わせたサブタスクの選択 From Do As I Can, Not As I Say: Grounding Language in Robotic Affordances 5

https://arxiv.org/abs/2204.01691

関連研究:大規模言語モデルへ物体検出結果の組み込み From Socratic Models: Composing Zero-Shot Multimodal Reasoning with Language 6

https://arxiv.org/abs/2204.00598

関連研究: 言語モデルを使用したプログラム生成 From Evaluating Large Language Models Trained on Code 7

https://arxiv.org/abs/2107.03374

関連研究との違い From Code as Policies: Language Model Programs for Embodied Control 8

https://arxiv.org/abs/2209.07753

提案手法 • Prompting Language Model Programs - Promptの構成要素 • Example Language Model Programs(Low‒level) - Code-writing LLMの使用による学習データ中のthird-party library の使用 - 関数名の工夫とHint/Examplesの工夫による自前libraryの使用 - タスク指示文とcodeを結びつけるLanguage reasoning • Example Language Model Programs(High-level) - while loop, nested function, hierarchically generation 9

10.

Promptの構成要素 • Hints - どのAPIが呼び出し可能か、そのAPIがどのように呼び出しうるかの type hints import numpy as np from utils import get̲obj̲names, put̲first̲on̲second • Examples - 自然言語の指示文(#コメント)とそれを遂行するプログラムとのペア - プロンプトに過去の指示とプログラム例を含めていくことで、”undo the last action“というような指示も行える 10

11.

Low-level Third-party library From Code as Policies: Language Model Programs for Embodied Control 11

https://arxiv.org/abs/2209.07753

12.

Low-level 自前ライブラリ Language reasoning From Code as Policies: Language Model Programs for Embodied Control 12

https://arxiv.org/abs/2209.07753

13.

High-level: control flow From Code as Policies: Language Model Programs for Embodied Control 13

https://arxiv.org/abs/2209.07753

14.

High-level: nested function From Code as Policies: Language Model Programs for Embodied Control 14

https://arxiv.org/abs/2209.07753

15.

High-level: Hierarchical generation From Code as Policies: Language Model Programs for Embodied Control 15

https://arxiv.org/abs/2209.07753

16.

High-level From Code as Policies: Language Model Programs for Embodied Control 16

https://arxiv.org/abs/2209.07753

17.

実験 • 階層的なプログラム生成の工夫の有効性の確認 - Code-Generation Benchmarksにおいてプログラム生成そのものの質の確認 • マニピュレーションタスクにおいて既存手法との比較 • 提案手法が異なるロボットにおいても容易に適用可能であることの確認 17

18.

RoboCodeGenを新しく提案・評価空間情報、幾何情報を考慮したプログラム生成問題の追加生成結果に含まれるプログラムに外部ライブラリの使用許可・推奨 Docstingなし From Code as Policies: Language Model Programs for Embodied Control 18

19.

Flat vs Hierarchical(未定義の関数使用) このpromptにおける階層の工夫が提案手法での独特な工夫 From Code as Policies: Language Model Programs for Embodied Control 19

https://arxiv.org/abs/2209.07753

20.

既存手法より高い汎化性能を確認 • 階層的なプログラム生成の工夫の有効性の確認 - Code-Generation Benchmarksにおいてプログラム生成そのものの質の確認 U: Unseen, S: Seen, A: Attribute(物体の特徴), I: Instruction(指示文) From Code as Policies: Language Model Programs for Embodied Control 20

https://arxiv.org/abs/2209.07753

21.

既存手法より高い汎化性能を確認 From Code as Policies: Language Model Programs for Embodied Control 21

https://arxiv.org/abs/2209.07753

22.

既存手法より高い汎化性能を確認 From Code as Policies: Language Model Programs for Embodied Control 22

https://arxiv.org/abs/2209.07753

23.

Mobile Manipulatorへの適用 # take the coca cola can from the cart and put it in the middle of the fruits on the table. From Code as Policies: Language Model Programs for Embodied Control 23

https://arxiv.org/abs/2209.07753

24.

おまけ From Code as Policies: Language Model Programs for Embodied Control 24

https://arxiv.org/abs/2209.07753

25.

まとめ指示文のコメントと小サンプルのプロンプトからロボットの行動方策のプログラムを生成. あらかじめ準備する行動、認識APIとプロンプト文を工夫. Limitation あらかじめ準備するAPI, プロンプト文に制限される Exampleにない抽象度の行動は苦手らしい感想プロンプトエンジニアの記述力が試される(appendix Aオモロイ) 25