【輪読会】Learning Continuous Image Representation with Local Implicit Image Function (CVPR2021)

204 Views

November 19, 21

deep learning

スライド概要

2021/11/19
Deep Learning JP:
http://deeplearning.jp/seminar-2/

Deep Learning JP

@DeepLearning2023

スライド一覧

DL輪読会資料

またはPlayer版

埋め込む »CMSなどでJSが使えない場合

関連スライド

【DL輪読会】Evolutionary Optimization of Model Merging Recipes モデルマージの進化的最適化

Deep Learning JP 22.8K

【DL輪読会】Generative Agents: Interactive Simulacra of Human Behavior

Deep Learning JP 12.7K

【DL輪読会】4D Gaussian Splatting for Real-Time Dynamic Scene Rendering

Deep Learning JP 11.4K

【DL輪読会】LLMベースの自律型エージェントシステムのサーベイ

Deep Learning JP 10.9K

【DL輪読会】LightGlue: Local Feature Matching at Light Speed

Deep Learning JP 9.3K

【DL輪読会】Open Problems and Fundamental Limitations of Reinforcement Learning from Human Feedback

Deep Learning JP 7.7K

各ページのテキスト

DEEP LEARNING JP [DL Papers] Learning Continuous Image Representation with Local Implicit Image Function (CVPR2021) Presenter: Kazutoshi Akita (Toyota Technological Institute, IntelligentInformation Media Lab) http://deeplearning.jp/ 1

http://deeplearning.jp/

前提知識 • 三次元形状の連続関数表現 𝑎𝑥 2 + 𝑏𝑦 2 + 𝑐𝑧 2 = 𝑑 引用：http://sssiii.seesaa.net/article/407308186.html 2

http://sssiii.seesaa.net/article/407308186.html

前提知識 • 三次元形状の連続関数表現 𝑎1 𝑥 2 + 𝑏1 𝑦 2 + 𝑐1 𝑧 2 = 𝑑1 𝑎2 𝑥 2 + 𝑏2 𝑦 2 + 𝑐2 𝑧 2 = 𝑑2 𝑎3 𝑥 2 + 𝑏3 𝑦 2 + 𝑐3 𝑧 2 = 𝑑3 ・・・引用：Interpolating and Approximating Implicit Surfaces from Polygon Soup - U.C. Berkeley Computer Graphics Research 3

http://graphics.berkeley.edu/papers/Shen-IAI-2004-08/

前提知識 • Implicit neural representation 座標からシグナルへのマッピング 𝑎1 𝑥 2 + 𝑏1 𝑦 2 + 𝑐1 𝑧 2 = 𝑑1 𝑎2 𝑥 2 + 𝑏2 𝑦 2 + 𝑐2 𝑧 2 = 𝑑2 𝑎3 𝑥 2 + 𝑏3 𝑦 2 + 𝑐3 𝑧 2 = 𝑑3 ・・・ 0 𝑓θ 𝒙 = ቊ 1 NN（MLP）によって暗に獲得 ⇒ Implicit Neural Representation 4

概要 • 従来の超解像は，CNNの構造上，整数倍の拡大しか不可能 • Implicit Neural Representationを用いた連続関数表現により，任意倍率の拡大が可能な超解像を提案 5

提案手法 • Local implicit image function (LIIF) 𝑠 = 𝑓θ (𝑧, 𝑥) 𝑠: RGB value 𝑧: latent code 𝑥:2D coordinate 𝑥𝑞 𝑀(𝑖) 𝑓θ 𝑧∗ 超解像の定式化 𝐼 𝑖 𝑥𝑞 = 𝑓θ (𝑧 ∗ , 𝑥𝑞 − 𝑣 ∗ ) 𝑧 ∗ : nearest latent code from 𝑥𝑞 𝑣 ∗ : coordinate for 𝑧 ∗ 6

提案手法 • 工夫①：Feature unfolding – 周辺8画素もconcatで統合してlatent codeとする – latent code 𝑧 ∗ をリッチに 𝑥𝑞 𝑓θ concat 𝑧∗ 7

提案手法 • 工夫②：Local ensemble – 最近傍のlatent codeを使うだけでは， latent codeが突然切り替わり不自然 ∗ 𝑧00 𝑆00 ∗ 𝑧01 𝑆01 𝑥𝑞 – 周辺4つのlatent codeでアンサンブル 𝑆10 ∗ 𝑧10 𝑓θ ∗ 𝑧11 8

提案手法 • 工夫③：cell decoding 𝑠 = 𝑓θ (𝑧, 𝑥) 𝑠 = 𝑓θ (𝑧, [𝑥, 𝑐]) 𝑐 = 𝑐ℎ 𝑐𝑤 : height and width of query pixel 1 1 x4超解像であれば， 𝑐 = [4 , 4] 定性的には，拡大倍率で条件付け？ 9

10.

提案手法 • 学習 10

11.

実験結果 • 定量評価学習した倍率（In-distribution）では，MetaSRと同等の性能学習していない倍率（Out-of-distribution）でMetaSRを上回る性能 11

12.

実験結果 • 定性評価学習していない倍率（x30）でも他手法より自然かつ鮮明な超解像が可能 12

13.

実験結果 • 各工夫の効果確認 -c: cell-decodingなし -u: feature unfoldingなし –e: local ensembleなし –d: LIIFのlayerを5⇒3 Cell-decodingにより性能低下する場合ありその他の工夫については利用により性能向上 13

14.

実験結果 • Cell-decodingの定性評価 x30の超解像 ⇒ cell-1/30が適切な設定適切なcell-decodingで鮮明な超解像が可能 14

15.

まとめ • Implicit Neural Representationを用いて画像の連続表現を獲得し，整数倍に限らない拡大が可能な超解像モデルを提案． • 学習した拡大倍率（x1-x4）を超える倍率（e.g. x30）においても高精細な超解像画像を生成 15