【DL輪読会】Data Retrieval with Importance Weights for Few-Shot Imitation Learning

>100 Views

January 22, 26

スライド概要

シェア

またはPlayer版

埋め込む »CMSなどでJSが使えない場合

ダウンロード

関連スライド

各ページのテキスト
1.

DEEP LEARNING JP [DL Papers] Data Retrieval with Importance Weights for FewShot Imitation Learning Jeremy Siburian, Matsuo-Iwasawa Lab, M1 http://deeplearning.jp/ 1

2.

Paper Overview • Paper Title: • Authors: • • Conference on Robot Learning (CoRL) 2025 (Oral) Links: Data Retrieval with Importance Weights for Few-Shot Imitation Learning Amber Xie1, Rahul Chand1, Dorsa Sadigh1, Joey Hejna1 (1Stanford University) – – ArXiv: https://arxiv.org/abs/2509.01657 Project Page: https://rahulschand.github.io/iwr/ Disclaimer: All credits for images, figures, tables, and other contents belong to the original authors. 2

3.

Introduction • • Imitation learning (IL) in robotics is highly data-hungry, often requiring hundreds to thousands of demonstrations per task, making it expensive and hard to scale to new tasks and environments Few-shot IL addresses this issue by augmenting a small target dataset with relevant samples retrieved from large, existing prior datasets (e.g., DROID, OpenX, Bridge) DROID [Khazatsky et al. 2024] Open X-Embodiment [O’Neill et al. 2024] Motivation: How can we leverage large existing prior datasets for quickly learning new tasks? 3

4.

Related Works Retrieval-based IL augments a small target dataset by retrieving relevant state–action samples from large prior datasets using a learned latent representation. FlowRetrieval [Lin et al. 2024] Behavior Retrieval [Du et al. 2024] SAILOR [Nasiriany et al. 2022] Previous retrieval-based IL methods rely on heuristic nearest-neighbor distance in latent space, leading to high-variance, noise-sensitive, and distributionally biased data selection. 4

5.

Method • This paper introduces Importance Weighted Retrieval (IWR), an importance sampling-inspired method for retrieval. • IWR retrieves prior data by how well it matches the target task distribution, not just by distance, by leveraging density-ratio (importance weight) estimation. • The IWR framework involves four main steps: (1) Representation Learning, (2) Importance Weight Estimation, (3) Data Retrieval, and (4) Policy Learning. 5

6.

Method (1) Representation Learning • • • Learn a low-dimensional latent representation of state–action pairs (or short sequences) to enable efficient retrieval. Use a Variational Autoencoder (VAE) to ensure a smooth and compact latent space. Why VAE? – VAEs impose a continuous, approximately Gaussian latent space through the KL regularization term. – This encourages local smoothness: nearby points in latent space correspond to similar state–action behaviors. • • This step is shared with prior retrieval methods (e.g., BR, FR, SAILOR). IWR does not modify representation learning, it only changes how retrieval is performed. 6

7.

Method (2) Importance Weight Estimation • • • • • Goal: estimate how relevant each prior sample is to the target task. Distance-based retrieval uses only the nearest target point, leading to high variance. Failure case (right figure): A sample close to many target points can be missed by nearest neighbors. IWR models the full target distribution using Gaussian KDE in latent space. Importance weights: Key Insight: Nearest neighbors estimate similarity while importance weights estimate probability. 7

8.

Method (3) Data Retrieval • • • • • Use the importance weights as the retrieval score for prior data. Retrieve samples whose importance weights exceed a threshold. Retrieval selects data that matches the target task distribution, not just nearest neighbors. Can be applied at multiple granularities: individual state–action pairs, action chunks, or short sub-trajectories. This step replaces L2-distance-based retrieval used in prior methods. 8

9.

Method (4) Policy Learning • • • • • Train the policy using behavior cloning on a mixture of: target demonstrations Dt + Retrieved prior data Dret Optimize a weighted imitation objective to balance target and retrieved data. Retrieved data provides broader state–action coverage and improves robustness. Policy architecture and training procedure are unchanged (e.g., Diffusion Policy). Performance gains come purely from improved data retrieval, not policy modifications. 9

10.

Experiments Experiment Objective 1. Performance: Does IWR improve policy success over existing retrieval methods? 2. Generality: Can IWR be applied to different retrieval representations (BR, FR, SAILOR)? 3. Ablations: Why does IWR help? (bias, variance, temporal diversity) Experimental Setup Simulation • Robomimic Square: 10 target demos • LIBERO: 5 hardest LIBERO-10 tasks, 5 demos each Real-World • Bridge V2 (WidowX): • Corn (5 demos), Carrot (10), Eggplant (20, long-horizon) • ~130k prior transitions from sink tasks 10

11.

Experiments Baseline Comparisons 1. Behavior Cloning (BC) is only trained to imitate Dt (target data). 2. Behavior Retrieval (BR) learns a VAE over state-action pairs, and it retrieves data from Dprior based on L2 distance in the latent space. 3. Flow Retrieval (FR) learns a VAE over optical flow between frames and actions. Similar to BR, it retrieves data via L2 distance in latent space. 4. SAILOR (SR) learns a skill-based latent space by compressing state-action chunks. SAILOR retrieves data via L2 distance in latent space. Training Details • All retrieval baselines use L2 distance-based retrieval • Policies are trained using Diffusion Policy Diffusion Policy [Chi et al. 2023] 11

12.

Results Q1: Does IWR improve policy success over existing retrieval methods? Simulation Results • IWR achieves the highest success rates across all simulated benchmarks. • On Robomimic Square: • Behavior Cloning nearly fails due to limited data. • IWR significantly outperforms distance-based retrieval, especially when incorrect prior data is harmful. • On LIBERO tasks: • IWR consistently improves over BR, FR, and SAILOR. • Gains are largest in tasks with diverse and imbalanced prior data, where nearest-neighbor retrieval struggles. 12

13.

Results Q1: Does IWR improve policy success over existing retrieval methods? Real-World Results • IWR delivers substantial improvements over all baselines on real robot tasks. • Achieves up to ~30% higher success rates compared to standard retrieval methods. • Particularly effective for long-horizon tasks (Eggplant): • 100% partial success rate • Highest full task completion among methods • Demonstrates that better retrieval directly translates to better real-world performance. 13

14.

Results Q1: Does IWR improve policy success over existing retrieval methods? Task relevance (left plots): • BR retrieves a large portion of irrelevant or harmful tasks. • IWR retrieves more task-relevant demonstrations. Temporal distribution (right plots): • BR over-samples early timesteps (reach motions). • IWR produces a more balanced distribution across task phases. These improvements lead to: • More informative training data • Better coverage of critical subtasks 14

15.

Results Q2: Can IWR be applied to different retrieval representations? • • IWR consistently improves performance when applied to BR, FR, and SAILOR by replacing L2-distance retrieval with importance weighting (Table 2). Works best with smooth, VAE-based latent representations; performance gains are limited for non-smooth embeddings such as BYOL (Table 10). 15

16.

Results Q3: Why does IWR help? • • • Importance Weights: Removing normalization by the prior distribution degrades performance, showing that density ratios are critical (Table 3). Bandwidth Parameters: KDE bandwidth ablations show IWR is robust, with smoother density estimates slightly improving performance (Table 3). Retrieval Thresholds: The amount of data retrieved can affect the policy performance, too much prior data introduces harmful samples (Table 4). 16

17.

Summary & Limitations • • • • The paper introduces Importance Weighted Retrieval (IWR), an importance sampling-inspired method for retrieval. IWR estimates both target and prior data distributions using Gaussian KDEs, and retrieves data based on importance weights (density ratios) rather than raw distances. Experiment results show that IWR is able to better select data for retrieval, as evidenced by improved performance across both simulated and real tasks, including a long horizon task. Limitations & Future Work: – Do not further evaluate what makes an effective latent space – Evaluation is largely limited to pick-and-place tasks due to large-scale data constraint – IWR assumes the use of Gaussian KDEs, which can become computational intractable and numerically unstable in higher dimensions 17