【DL輪読会】One-Shot Domain Adaptive and Generalizable Semantic Segmentation with Class-Aware Cross-Domain Transformers

764 Views

April 14, 23

#@deep learning jp #Deep Learning #One-shot Unsupervised Domain Adaptation #Semantic Segmentation #Pseudo-target Domain #Intermedia Domain Representation

スライド概要

2023/4/14
Deep Learning JP
http://deeplearning.jp/seminar-2/

Deep Learning JP

@DeepLearning2023

スライド一覧

DL輪読会資料

またはPlayer版

埋め込む »CMSなどでJSが使えない場合

（ダウンロード不可）

関連スライド

【DL輪読会】KAN: Kolmogorov–Arnold Networks

Deep Learning JP 90.3K

【拡散モデル勉強会】拡散モデルの数理

Deep Learning JP 66.5K

【DL輪読会】Evolutionary Optimization of Model Merging Recipes モデルマージの進化的最適化

Deep Learning JP 61K

【DL輪読会】Conditional Flow Matching

Deep Learning JP 48.2K

【拡散モデル勉強会】Introduction to Diffusion Models

Deep Learning JP 46.3K

【DL輪読会】Cosmos World Foundation Model Platform for Physical AI

Deep Learning JP 45.5K

各ページのテキスト

DEEP LEARNING JP [DL Papers] One-Shot Domain Adaptive and Generalizable Semantic Segmentation with Class-Aware Cross-Domain Transformers Yuting Lin, Kokusai Kogyo Co., Ltd.(国際航業) http://deeplearning.jp/ 1

http://deeplearning.jp/

書誌情報 • タイトル – One-Shot Domain Adaptive and Generalizable Semantic Segmentation with ClassAware Cross-Domain Transformers • 著者 – Rui Gong1, Qin Wang1, Dengxin Dai2, Luc Van Gool1,3 – 1Computer Vision Lab, ETH Zurich, 2MPI for Informatics, 3VISICS, KU Leuven • 投稿時間 – 2022/12/14(Arxiv) • Paper – https://arxiv.org/abs/2212.07292 2

https://arxiv.org/abs/2212.07292

Introduction • モチベーション – Target domainの収集が難しいタスクを解決したい • one-shot unsupervised domain adaptation (OSUDA)の提案 – source domainの空間的な構造情報とtargetのスタイルを利用して、pseudo-target domainを生成 – class-aware cross-domain transformersという機構を提案してdomain-invariant 特徴を抽出 – Target domainと見た目が類似する画像を入力とすることで、 one-shot domain generalization (OSDG) 手法として拡張可能 3

提案手法の概要 • ベースはpseudo-label based self-training strategy（mean-teacher framework） – intermediate domain randomization (IDR) を提案し、domain gapの削減を目指す – Teacher netは最終の出力を推定、Student netはTeacher netを更新 – domain-invariant情報を取得するためのattentionを提案 4

提案手法の詳細 - Pseudo-Target Domainの作成 • Pseudo-Target Domain for Style Alignment – Image translationで、source domainをone-shot targetのスタイルに生成（拡張） • One-shotという制約は過学習が起こりやすいため – Pseudo-Target Domain: 𝑥ො𝑖𝑠 = 𝒮 𝑥𝑖𝑠 𝑥 𝑡 – off-the-shelf手法MUNITで生成（weighted perceptual lossを採用） – Pseudo-Target Domainに対し、cross entropyで最適化（ℒ𝑝𝑡 ） • スタイルによるdomain gapを軽減 5

https://github.com/nvlabs/MUNIT

提案手法の詳細 - class-mixed sampling • Pseudo-Target Domainは空間的構造によるdomain gapを解消できない • class-mixed samplingでPseudo-Target Domain上で、sourceの空間的構造をrandomize 6

提案手法の詳細 - class-mixed sampling • Pseudo targetから、c個クラスをsamplingして、maskを生成 • intermediate domain sampleを生成 • Pseudo label𝑦෤𝑗𝑠 を利用することで、source domainへのoverfittingを防ぐ • intermediate domainに対してもcross entropyで最適化可能（ℒ𝑖𝑑𝑟 ） 7

提案手法の詳細 - Class-Aware Cross-Domain Transformers • domain-invariant情報の学習も重要 • 既存のlocalな情報に注目する手法（local patch-wise prototypical matching など）は、globalなinvariant情報の学習が課題 – Transformerはglobalな情報を取得可能 • Cross Transformer: pseudo targetをqueryとする 8

提案手法の詳細 - Class-Aware Cross-Domain Transformers • Class-Aware Cross-Domain attention（CACDA）を提案 – Pseudo target sampleからスタイル情報、 intermediate domain sampleから空間的構造情報からdomain-invariant情報を学習 – cross entropyで最適化（ℒ𝑐𝑑 ） 9

10.

実験結果 - OSUDA • SOTAを達成 • Few-shot手法にも勝てる 10

11.

実験結果 - Pseudo targetの生成 • Perceptual lossの重みを高く設定して、targetのスタイルに接近 • 学習ベースでない手法フーリエ変換は、アーティファクトが多い – OSDGでは効果あり 11

12.

実験結果 - OSDG • OSDGでもSOTAを達成 12

13.

実験結果 – ablation study • 提案手法の有効性を確認 – スタイルと空間的構造の情報からdomain-invariant情報を学習できた 13

14.

実験結果 – ablation study • Class Mixed Sampling based IDR vs. other IDR methods – 空間的構造のsamplingも有効 14

15.

実験結果 – ablation study • Comparison to Cross-Domain Transformer Variants – intermediate domain representation(IDR)に対してcross attentionをかけることは、 domain-invariant情報取得を促進（？） – Pseudo domainをattentionの対象にした方が効果的 • Pseudo domainはsourceとのgapが小さいため 15

16.

まとめ • One-shot unsupervised domain adaptation手法を提案 – スタイル変換により、pseudo target domainを生成 – 空間的構造も同時にsamplingするintermedia domain representationを生成 – pseudo target domainとintermedia domain representationを対象に、 Class-Aware Cross-Domain attentionにより、domain-invariant情報を抽出 – One-shot domain generalizationに拡張可能 • 所感 – One-shotの画像に依存（？）関連情報がない – intermediate domainサンプルは空間的構造より、pseudo targetとsourceの中間の表現となる。中間的な表現で、学習をしやすくする 16