計算生命科学の基礎９（舘野　賢）

神戸大学計算生命科学 1.5 「AI 創薬：創薬における人工知能と機械学習の基礎および応用」《11月2日（水）》舘野賢日本たばこ産業株式会社医薬総合研究所主席研究員

人工知能（AI）による創薬とは？ AI が飛躍的に発展している！囲碁、将棋、etc. AlfaGo → 囲碁のチャンピオンを破った！画像処理（Image Processing）の技術人を見分ける能力が、人の能力を超えた！自然言語処理（Natural Language Processing）異なる言語の間の翻訳だけではなく、小説を書く！簡単な四則演算もする！画像処理やその他のタスクにも応用！創薬に AI が活用できるか？ → できる！ AlphaFold2 → Deep Learning により、タンパク質の立体構造の予測が可能になった！？

3.

人工知能（AI）による創薬とは？創薬に AI が活用できるか？ → できる！他方で、AI の Fields における日本の Presence は？？？そこで・・・あまり見慣れない（難しい！？）Concepts なども、先端的で重要な内容についてはやさしく（！）お話しすることに挑戦したいと思いますただしその分、直感的なお話しになるため、数学的により Strict な内容は、テキストや論文などを御覧下さい。（例えば、AI の内部がどのように動くのかも、できるだけお話してみたいと思います）また、以降のスライドの中の日本語や英語の文章は（講義の際にはあまり読まずに、復習などのために！）、ポイントした Terms と Figures, Tables などを中心に御覧下さい、できる限りそのような形で理解が進むように、お話したいと思います。

4.

人工知能（AI）による創薬とは？創薬のあらゆる領域において、 AI が既に活用されている！どんな Fields で？医薬品の分子設計など  Chemistry and Biophysics 医薬品の Physical, Physico-Chemical, Chemical, and Biological Properties 化合物の合成技術医薬品の薬理効果や体内動態など  Biochemistry and Biology 化合物の薬効、安全性（Toxicity）、化合物の Pharmaceutical Kinetics and Dynamics (PK and PD) 医薬品開発 Target の Identification  Biology and Genome Science Transcriptome, Proteome, Epigenetics, Interactome, etc. Protein-Protein Interaction (PPI) 医薬品の生産技術など  Engineering 製造プロセス技術、製剤技術、(結晶) 構造解析技術、品質管理、etc.

5.

人工知能（AI）による創薬とは？創薬のあらゆる領域において、 AI が既に活用されている！今日の Topics は？分子設計（I）タンパク質のアミノ酸配列から、結合サイトとリガンドを Identify（Affinity）分子設計（II）タンパク質・化合物の複合体の 3D Structural Docking （& Affinity）医薬品の電子構造薬理効果、PK and PD、安全性、分子設計（III）、etc. → 物質の様々な性質（Properties）や生体反応（Reactivity）を理解するための基盤 → Screening and Reactivity などの探索における基盤創薬のためのターゲットの探索 → Genome Science の応用も！ Transcriptome, Proteome, Interactome, Epigenetics, etc. → Multi Omics Analysis

6.

＜Attention Values of Amino Acid Residues and Ligand Atoms＞アミノ酸残基および化合物の相互作用部位を学習するか？アミノ酸残基の認識 Ligand 周辺 8Å のアミノ酸残基を Red Barで表示した attention 解析のイメジアミノ酸残基 Ligand 周辺 4Å のアミノ酸残基を Blue Barで表示した attention リガンドの認識相互作用部位アミノ酸残基アミノ酸残基をベクトル化化合物の原子をベクトル化

7.

＜Attention Values of Amino Acid Residues and Ligand Atoms＞アミノ酸残基および化合物の相互作用部位を学習するか？アミノ酸残基の認識 Ligand 周辺 8Å のアミノ酸残基を Red Barで表示した attention アミノ酸残基 Ligand 周辺 4Å のアミノ酸残基を Blue Barで表示した attention リガンドの認識相互作用部位アミノ酸残基 Avidin-Biotin (2AVI)

8.

人工知能（AI）による創薬とは？創薬のあらゆる領域において、 AI が既に活用されている！今日の Topics は？ AI 創薬を理解するために・・・ AI は何をしている？・どんなシステムが使われている？そのシステムで・・・・どんな計算をしているのかな？・何ができるようになったかな？・どんな課題があるのか？ AI を学ぶために、どんな勉強をすれば良いかなあ？ AI を創るために、どんな Tactics が大切かなあ？今日のお話しの基本的な方針を再度、確認して・・・

9.

人工知能（AI）による創薬とは？創薬に AI が活用できるか？ → できる！他方で、AI の Fields における日本の Presence は？？？そこで・・・あまり見慣れない（難しい！？）Concepts なども、先端的で重要な内容についてはやさしく（！）お話しすることに挑戦したいと思いますただしその分、直感的なお話しになるため、数学的により Strict な内容は、テキストや論文などを御覧下さい。（例えば、AI の内部がどのように動くのかも、できるだけお話してみたいと思います）また、以降のスライドの中の日本語や英語の文章は（講義の際にはあまり読まずに、復習などのために！）、ポイントした Terms と Figures, Tables などを中心に御覧下さい、できる限りそのような形で理解が進むように、お話したいと思います。

10.

Brief History of Artificial Mathematical Neural Network Beginning of Mathematical Models in Neural Network Fields (1943) Artificial Neuron Mathematical Model of Neuron Proposed by McCulloch and Pitts McCulloch, W. : neurophysiologist Pitts, W. : mathematician 1st Era of Artificial Neural Network Model (1950’s to 1960’s) Perceptron : a Feed-Forward Neural Network Model, Proposed by Roseblatt, F. in 1956 D. Mourtzis and J. Angelopoulos, Int. J. Adv. Manuf. Syst. (2020)

11.

Brief History of Artificial Mathematical Neural Network 2nd Era of Artificial Neural Network Model (1980’s to 90’s) Feed Forward Multilayer Network : Learning (Training) by Back Propagation Hopfield Networks, Boltzmann Machine, Cognitron (1975) Cf) “Expert System” : a System for Exploring the Accumulated Knowledge in Relevant, Specific Fields 3rd Era of Artificial Neural Network Model (2010’s to the present days) “Deep Learning” : a Trigger of the Beginning of this new Era. D. Mourtzis and J. Angelopoulos, Int. J. Adv. Manuf. Syst. (2020)

12.

Brief History of Artificial Mathematical Neural Network Neuron Model Proposed by McCulloch-Pitts (1943) Biological Neuron McCulloch-Pitts Neuron Model (1943) x1 x2 w1 w2 g f w3 x3 y wn ⋮ https://link.springer.com/article/10.1007/BF02478259 xn T. Fukuchi et al., Summation of inputs 𝑛 y Heaviside Step Function 𝑔 = ෍ 𝑥𝑖 𝑤𝑖 𝑖=1 1 Activation Function 0 g https://www.sciencedirect.com/topics/mathematics/heaviside-unit-step y = 𝑓𝑔 = ቊ 1 𝑔>0 0 𝑔≤0

13.

Perceptron-Type Neural Network Activation Functions and their Derivatives https://www.researchgate.net/publication/346898697_Thermodynamics-based_Artificial_Neural_Networks_for_constitutive_modeling

https://www.researchgate.net/publication/346898697_Thermodynamics-based_Artificial_Neural_Networks_for_constitutive_modeling

14.

Backpropagation Perceptron (3 layers) Input layer ・・・ 𝑦𝑘 k ・・ 𝑤𝑗𝑘 ・ ෍ 𝑤𝑗𝑘 𝑥𝑘 = 𝑥𝑗 𝑘 𝑦𝑗 = 𝜎 𝑥𝑗 𝑥𝑖 = ෍ 𝑤𝑖𝑗 𝑦𝑗 𝑗 𝑦𝑖 = 𝜎 𝑥𝑖 Multi Layer Perceptron-Type Neural Network Hidden layer ・・・ j ・・・ Output layer 𝑦𝑗 𝑤𝑖𝑗 Activation Function 1 𝜎 𝑧 = 1 + 𝑒 −𝑧 𝜎′ Sigmoidal Function 𝑑𝜎 𝑧 = 𝑑𝑧 𝑒 −𝑧 = 1 + 𝑒 −𝑧 2 = 1 1 × 1 − 1 + 𝑒 −𝑧 1 + 𝑒 −𝑧 = 𝜎(𝑧)(1 − 𝜎(𝑧)) ・・・ i ・・・ 𝑦𝑖 vs 𝑦ො𝑖 Supervise Signals

15.

Backpropagation Perceptron (3 layers) ෍ 𝑤𝑗𝑘 𝑥𝑘 = 𝑥𝑗 𝑘 𝑦𝑗 = 𝜎 𝑥𝑗 𝑥𝑖 = ෍ 𝑤𝑖𝑗 𝑦𝑗 𝑗 𝑦𝑖 = 𝜎 𝑥𝑖 Input layer ・・・ 𝑦𝑘 k ・・ 𝑤𝑗𝑘 ・ Multi Layer Perceptron-Type Neural Network Hidden layer ・・・ j ・・・ W1 = ( wjk ) Output layer 𝑦𝑗 𝑤𝑖𝑗 ・・・ i ・・・ 𝑦𝑖 vs 𝑦ො𝑖 Supervise Signals W2 = ( wij ) Products of Matrices and Vectors

16.

Backpropagation Error Function / Loss Function 𝐸= 1 σ𝑖 2 2 𝑦𝑖 − 𝑦ො𝑖 𝑑𝐸 𝜕𝐸 𝑑𝑤𝑖𝑗 =෍ 𝑑𝑡 𝜕𝑤𝑖𝑗 𝑑𝑡 Input layer ・・・ k ・・ 𝑤𝑗𝑘 ・ Multi Layer Perceptron-Type Neural Network Hidden layer ・・・ j ・・・ Output layer 𝑦𝑗 𝑤𝑖𝑗 ・・・ i ・・・ Supervise Signals 𝑖 とおく。ここで， ∇f (r) 𝑑𝑤𝑖𝑗 𝜕𝐸 =− ……. ∗ 𝑑𝑡 𝜕𝑤𝑖𝑗 とおくと 𝑑𝐸 𝜕𝐸 = −෍ 𝑑𝑡 𝜕𝑤𝑖𝑗 となり，𝐸は増えない 𝑦𝑖 vs 𝑦ො𝑖 𝑖 = grad f (r) 2 ≤0 = t( 𝜕𝑓 𝜕𝑓 𝜕𝑓 𝜕𝑥 , 𝜕𝑦 , 𝜕𝑧 )

17.

Gradient ( grad f (r) ) ∇f (r) Steepest Descent Minimization of = grad f (r) Multivariable Function Iso-Surface grad f (r) f (r) = E1 (constant) ∇f (r) = grad f (r) r= t(x, y, z) = t( 𝜕𝑓 𝜕𝑓 𝜕𝑓 𝜕𝑥 , 𝜕𝑦 , 𝜕𝑧 )

18.

Gradient ( grad f (r) ) ∇f (r) ∇f (r) Steepest Descent Minimization of = grad f (r) Multivariable Function = grad f (r) Iso-Surface t 𝜕𝑓 𝜕𝑓 𝜕𝑓 grad f (r) ーgrad f (r) E1 > E2 r= t(x, y, z) = ( 𝜕𝑥 , 𝜕𝑦 , 𝜕𝑧 f (r) = E1 f (r) = E2 )

19.

Backpropagation ෍ 𝑤𝑗𝑘 𝑥𝑘 = 𝑥𝑗 𝑘 𝑦𝑗 = 𝜎 𝑥𝑗 𝑥𝑖 = ෍ 𝑤𝑖𝑗 𝑦𝑗 𝑗 𝑦𝑖 = 𝜎 𝑥𝑖 𝑑𝐸 𝜕𝐸 𝑑𝑤𝑖𝑗 =෍ 𝑑𝑡 𝜕𝑤𝑖𝑗 𝑑𝑡 Error Function / Loss Function 𝑖 𝑑𝑤𝑖𝑗 𝜕𝐸 =− … ∗ 𝑑𝑡 𝜕𝑤𝑖𝑗 𝐸= 1 σ 2 𝑖 𝑦𝑖 − 𝑦ො𝑖 2 Chain Rule により 𝜕𝐸 𝜕𝐸 𝜕𝑦𝑖 𝜕𝑥𝑖 = = 𝑦𝑖 − 𝑦ො𝑖 𝜎 𝑥𝑖 1 − 𝜎 𝑥𝑖 𝑦𝑗 𝜕𝑤𝑖𝑗 𝜕𝑦𝑖 𝜕𝑥𝑖 𝜕𝑤𝑖𝑗 𝜕𝑥𝑖𝑗 𝜕 ∵) = ෍ 𝑤𝑖𝑗 𝑦𝑗 であり，𝑤𝑖𝑗 で偏微分するので，𝑦𝑗 のみ残る 𝜕𝑤𝑖𝑗 𝜕𝑤𝑖𝑗 𝑗 よって， ∗ は 𝑑𝑤𝑖𝑗 = − 𝑦𝑖 − 𝑦ො𝑖 𝜎 𝑥𝑖 1 − 𝜎 𝑥𝑖 𝑦𝑗 𝑑𝑡 Δ𝑤𝑖𝑗 = − 𝑦𝑖 − 𝑦ො𝑖 𝜎 𝑥𝑖 1 − 𝜎 𝑥𝑖 𝑦𝑗 Δ𝑡 (𝑛+1) Δ𝑤𝑖𝑗 = 𝑤𝑖𝑗 (𝑛) − 𝑤𝑖𝑗 = −Δ𝑡 𝑦𝑖 − 𝑦ො𝑖 𝜎 𝑥𝑖 1 − 𝜎 𝑥𝑖 𝑦𝑗

20.

Backpropagation ෍ 𝑤𝑗𝑘 𝑥𝑘 = 𝑥𝑗 𝑘 𝑦𝑗 = 𝜎 𝑥𝑗 𝑥𝑖 = ෍ 𝑤𝑖𝑗 𝑦𝑗 𝑗 𝑦𝑖 = 𝜎 𝑥𝑖 𝑑𝐸 𝜕𝐸 𝑑𝑤𝑗𝑘 =෍ 𝑑𝑡 𝜕𝑤𝑗𝑘 𝑑𝑡 𝑗 𝑑𝑤𝑗𝑘 𝜕𝐸 =− … ∗ 𝑑𝑡 𝜕𝑤𝑗𝑘 Chain Ruleより Input layer ・・・ k ・・ 𝑤𝑗𝑘 ・ Hidden layer ・・・ j ・・・ Output layer 𝑦𝑗 𝑤𝑖𝑗 𝜕𝐸 𝜕𝐸 𝜕𝑦𝑖 𝜕𝑥𝑖 𝜕𝑦𝑗 𝜕𝑥𝑗 = = 𝑦𝑖 − 𝑦ො𝑖 𝜎 𝑥𝑖 1 − 𝜎 𝑥𝑖 𝑤𝑖𝑗 𝜎 𝑥𝑗 𝜕𝑤𝑗𝑘 𝜕𝑦𝑖 𝜕𝑥𝑖 𝜕𝑦𝑗 𝜕𝑥𝑗 𝜕𝑤𝑗𝑘 よって， ∗ は 𝑑𝑤𝑗𝑘 = − 𝑦𝑖 − 𝑦ො𝑖 𝜎 𝑥𝑖 1 − 𝜎 𝑥𝑖 𝑤𝑖𝑗 𝜎 𝑥𝑗 𝑑𝑡 Δ𝑤𝑗𝑘 = − 𝑦𝑖 − 𝑦ො𝑖 𝜎 𝑥𝑖 1 − 𝜎 𝑥𝑖 𝑤𝑖𝑗 𝜎 𝑥𝑗 Δ𝑡 (𝑛+1) Δ𝑤𝑖𝑗 = 𝑤𝑗𝑘 (𝑛) ・・・ i ・・・ 1 − 𝜎 𝑥𝑗 1 − 𝜎 𝑥𝑗 𝑥𝑘 1 − 𝜎 𝑥𝑗 𝑥𝑘 − 𝑤𝑗𝑘 = −Δ𝑡 𝑦𝑖 − 𝑦ො𝑖 𝜎 𝑥𝑖 1 − 𝜎 𝑥𝑖 𝑤𝑖𝑗 𝜎 𝑥𝑗 1 − 𝜎 𝑥𝑗 𝑦𝑖 vs 𝑦ො𝑖 𝑥𝑘 𝑥𝑘

21.

𝑘 𝑦𝑗 = 𝜎 𝑥𝑗 𝑥𝑖 = ෍ 𝑤𝑖𝑗 𝑦𝑗 𝑗 𝑦𝑖 = 𝜎 𝑥𝑖 Vanishing/Exploding Gradient Problems Backpropagation ෍ 𝑤𝑗𝑘 𝑥𝑘 = 𝑥𝑗 𝑑𝐸 𝜕𝐸 𝑑𝑤𝑗𝑘 =෍ 𝑑𝑡 𝜕𝑤𝑗𝑘 𝑑𝑡 𝑗 𝑑𝑤𝑗𝑘 𝜕𝐸 =− … ∗ 𝑑𝑡 𝜕𝑤𝑗𝑘 Chain Ruleより Input layer ・・・ k ・・ 𝑤𝑗𝑘 ・ Hidden layer ・・・ j ・・・ Output layer 𝑦𝑗 𝑤𝑖𝑗 𝜕𝐸 𝜕𝐸 𝜕𝑦𝑖 𝜕𝑥𝑖 𝜕𝑦𝑗 𝜕𝑥𝑗 = = 𝑦𝑖 − 𝑦ො𝑖 𝜎 𝑥𝑖 1 − 𝜎 𝑥𝑖 𝑤𝑖𝑗 𝜎 𝑥𝑗 𝜕𝑤𝑗𝑘 𝜕𝑦𝑖 𝜕𝑥𝑖 𝜕𝑦𝑗 𝜕𝑥𝑗 𝜕𝑤𝑗𝑘 よって， ∗ は 𝑑𝑤𝑗𝑘 = − 𝑦𝑖 − 𝑦ො𝑖 𝜎 𝑥𝑖 1 − 𝜎 𝑥𝑖 𝑤𝑖𝑗 𝜎 𝑥𝑗 𝑑𝑡 Δ𝑤𝑗𝑘 = − 𝑦𝑖 − 𝑦ො𝑖 𝜎 𝑥𝑖 1 − 𝜎 𝑥𝑖 𝑤𝑖𝑗 𝜎 𝑥𝑗 Δ𝑡 (𝑛+1) Δ𝑤𝑖𝑗 = 𝑤𝑗𝑘 (𝑛) ・・・ i ・・・ 1 − 𝜎 𝑥𝑗 1 − 𝜎 𝑥𝑗 𝑥𝑘 1 − 𝜎 𝑥𝑗 𝑥𝑘 − 𝑤𝑗𝑘 = −Δ𝑡 𝑦𝑖 − 𝑦ො𝑖 𝜎 𝑥𝑖 1 − 𝜎 𝑥𝑖 𝑤𝑖𝑗 𝜎 𝑥𝑗 1 − 𝜎 𝑥𝑗 𝑦𝑖 vs 𝑦ො𝑖 𝑥𝑘 𝑥𝑘

22.

Vanishing/Exploding Gradient Problems 勾配の消失と爆発原因は Back Propagation における Activation Function の微分と Weight の “掛け算” を繰り返し実行することになるため！これを回避するためには、以下の３つの点が基本です。 ReLU (used as an Activation Function) Batch Normalization （Cf. Mini-Batch） Skip Connection (→ Residual Network. Cf. ODEnet ) etc. 参考書： Scikit-Learn、Keras、TensorFlowによる実践機械学習第2版, Aurélien Géron 著

23.

Regularization Error Function / Loss Function 𝐸= 1 σ𝑖 2 𝑦𝑖 − 𝑦ො𝑖 2 + R(w) Input layer ・・・ k ・・ 𝑤𝑗𝑘 ・ Multi Layer Perceptron-Type Neural Network Hidden layer ・・・ j ・・・ Training/Learning（学習）における Overfitting （過学習）を回避するために、Regularization （正則化）が重要な役割を果たす。 → L1 Norm, L2 Norm などがよく用いられる。 Overfitting : Training Dataset に対しては正しく出力するが、未知のTest Dataset（推論）には誤った出力を与えること。参考書： Scikit-Learn、Keras、TensorFlowによる実践機械学習第2版, Aurélien Géron 著 Output layer 𝑦𝑗 𝑤𝑖𝑗 ・・・ i ・・・ 𝑦𝑖 vs 𝑦ො𝑖 Supervise Signals ∇f (r) = grad f (r) = t( 𝜕𝑓 𝜕𝑓 𝜕𝑓 𝜕𝑥 , 𝜕𝑦 , 𝜕𝑧 )

24.

Vector, Matrix, and Scalar Product 𝟏 Vector 𝑦 𝑥 1 0 𝒓= 𝑦 =𝑥 +𝑦 0 1 𝒆𝟏 𝒆𝟐 “Basis Set (基底)” 𝒆′𝟐 𝒆𝟐 𝑥 𝒓= 𝑦 𝒆′𝟏 𝜃 O 𝒆𝟏 𝑥 = 𝒙 𝒆𝟏 + 𝒚 𝒆𝟐 Basis Set: 𝒆𝑖 2𝑖=1 を 𝜃 だけ回転し， 𝒆′𝑖 2𝑖=1 に変化させると， 𝒓 = 𝑥 ′ 𝒆′𝟏 + 𝑦 ′ 𝒆′𝟐 となり，同じベクトル 𝒓 であっても，別の座標 𝑥 ′ , 𝑦 ′ に変更される。垂直かつ長さ 1 の Basis Set を特に， “Orthonormal Basis Set” (正規直交基底) という。より一般に，Basis どうしが直交でないとき，“斜交座標系” という。 → テンソルへ

25.

Vector, Matrix, and Scalar Product 𝟐 Matrix 2 𝒂𝟏 𝒂𝟐 =𝑨 𝑨 : 2 x 2 の matrix ⇒ 2つの成分を持った縦ベクトル 𝒂𝟏 , 𝒂𝟐 を並べたもの（！）と考える。 2 前頁の Basis Set 𝒆𝟏 , 𝒆𝟐 を適用すると， 1 𝑨𝒆𝟏 = 𝑨 = 𝒂𝟏 0 ൞ 0 𝑨𝒆𝟐 = 𝑨 = 𝒂𝟐 1 となり， 𝒂𝟏 , 𝒂𝟐 を取り出せる。 ⇒ Matrix は Vector を並べたもの！逆に，上記の式の 𝒆𝟏 , 𝒆𝟐 を並べて表記すると， 𝑨 𝒆𝟏 𝒆𝟐 = 𝒂𝟏 𝒂𝟐 のように，Matrix の形に書き換えられる。

26.

Vector, Matrix, and Scalar Product より一般に，𝒆𝟏 , 𝒆𝟐 を任意のベクトル 𝒑𝟏 , 𝒑𝟐 とすると， 𝑝1𝑥 𝑨𝒑𝟏 = 𝑨 𝑦 = 𝒃𝟏 𝑝1 𝑝2𝑥 𝑨𝒑𝟐 = 𝑨 𝑦 = 𝒃𝟐 𝑝2 Matrix の掛け算としてまとめると 𝑨 𝒑𝟏 𝒑𝟐 = 𝒃𝟏 𝒃𝟐 ⇒ Matrix は Vector を並べたものであることが分かる。 Ex) 2 x 2 の Matrix の掛け算への応用 ⇒ 1 x 2 の縦ベクトル2つに分解して考える 2 2 2 𝒂𝟏 𝒂𝟐 2 −3 = 5 4 ↓ ↓ それぞれ 1 x 2 の縦ベクトル 2𝒂𝟏 +5𝒂𝟐 −3𝒂𝟏 +4𝒂𝟐 ↓ ↓ それぞれ 1 x 2 の縦ベクトル 2

27.

Vector, Matrix, and Scalar Product Ex) 𝑨 が 2 x 2 の Matrix であるとする。 2 −2 −3 1 𝑨 = ,𝑨 = 5 3 4 −1 のとき， 𝑨 2 −2 5 3 を求めよ。解答）与えられた条件により、容易に 𝑨 ※ 𝑨= 𝑎 𝑐 2 −2 −3 1 = 5 3 4 −1 𝑏 と置いて計算しなくても解けるが・・・ 𝑑

28.

Vector, Matrix, and Scalar Product 𝟑 Scalar Product 𝑎1 𝒂= 𝑎2 𝒂 ∙ 𝒃 = 𝒂 𝒃 cos 𝜃 余弦定理を用いると 𝜃 𝒂 ∙ 𝒃 = 𝑎1 𝑏1 + 𝑎2 𝑏2 𝒃= 𝒉 𝒂 cos 𝜃 𝑏1 𝑏2 𝒂∙𝒃 正射影ベクトル: 𝒉 = 𝒃 𝒃2 ・大きさは, 𝒉 = 1 = ・向きはベクトル 𝒃 と同じ。 𝒂∙𝒃 𝒂 𝒃 cos 𝜃 𝒃 𝒃 = = 𝒂 cos 𝜃 𝒃2 𝒃 𝒃 すなわち，ベクトル 𝒂 と 𝒃 の内積 𝒂 ∙ 𝒃 は，ベクトル 𝒂 の正射影ベクトル 𝒉 とベクトル 𝒃 により， 𝒂 ∙ 𝒃 = 𝒉 𝒃 と書くことができる。

29.

Vector, Matrix, and Scalar Product 𝟑 Scalar Product 𝒆𝜽𝟐 𝑦 大きさが 1 のベクトル 𝒆𝟏 と 𝒆𝜽𝒊 を考えると， 𝒆𝜽𝟏 𝜃2 𝜃1 𝒆𝟏 ∙ 𝒆𝜽𝒊 = cos 𝜃𝑖 O 𝒆𝟏 𝑥 𝜃𝑖 = 0 のとき， 𝒆𝜽𝒊 = 𝒆𝟏 となり cos 𝜃𝑖 は最大。 𝜃𝑖 = 𝜋 のとき， 𝒆𝜽𝒊 = −𝒆𝟏 となり cos 𝜃𝑖 は最小。 cos 𝜃𝑖 ⇒ 内積 𝒆𝟏 ∙ 𝒆𝜽𝒊 は，ベクトル 𝒆𝟏 と 𝒆𝜽𝒊 との ”類似性” 1 𝜃2 0 −1 𝜃1 𝜋/2 𝜋 𝜃𝑖

30.

Vector, Matrix, and Scalar Product 𝒃𝟏 𝒂𝟏 𝒂𝟐 ⋮ 𝒂𝒏 𝑨 = 𝒕𝑨 = 𝒂𝟏 𝑩 = 𝒂 𝟐 ⋯ 𝒂𝒏 𝒕𝑩 = 𝑨 と 𝒕𝑩 の積 𝒂𝟐 ⋮ 𝒂𝒏 𝒃𝟏 𝒃𝟐 ⋯ 𝒃𝒏 𝑩 と 𝒕𝑨 の積 𝒂𝟏 𝑨 𝒕𝑩 = 𝒃𝟐 ⋮ 𝒃𝒏 𝒃𝟏 𝒃𝟏 𝒂𝟏 ∙ 𝒃𝟏 ⋮ = 𝒂𝒏 ∙ 𝒃𝟏 𝒃𝟐 ⋯ 𝒃𝒏 ⋯ ⋱ ⋯ 𝒂 𝟏 ∙ 𝒃𝒏 ⋮ 𝒂 𝒏 ∙ 𝒃𝒏 𝑩 𝒕𝑨 = 𝒃𝟐 ⋮ 𝒃𝒏 𝒂𝟏 𝒃𝟏 ∙ 𝒂 𝟏 ⋮ = 𝒃𝒏 ∙ 𝒂 𝟏 𝒂𝟐 ⋯ 𝒂𝒏 ⋯ ⋱ ⋯ 𝒃𝟏 ∙ 𝒂 𝒏 ⋮ 𝒃𝒏 ∙ 𝒂 𝒏

31.

Vector, Matrix and Scalar Product アミノ酸残基および化合物の相互作用部位を学習するか？アミノ酸残基の認識 Ligand 周辺 8Å のアミノ酸残基を Red Barで表示した attention 解析のイメジ注）これは Multi-Head Attention で、他の Attention Mechanisms と区別しましょうアミノ酸残基 Ligand 周辺 8Å のアミノ酸残基を Blue Barで表示した attention リガンドの認識相互作用部位アミノ酸残基アミノ酸残基をベクトル化化合物の原子をベクトル化

32.

Vector, Matrix and Scalar Product アミノ酸残基および化合物の相互作用部位を学習するか？アミノ酸残基の認識注）これは Multi-Head Attention で、他のと区別しましょう Ligand 周辺 8Å のアミノ酸 Attention Mechanisms 残基を Red Barで表示した attention Q : Query K : Key V : Value アミノ酸残基 Ligand 周辺 8Å のアミノ酸残基を Blue Barで表示した attention リガンドの認識相互作用部位アミノ酸残基アミノ酸残基をベクトル化化合物の原子をベクトル化

33.

系列データ（自然言語処理 [NLP], etc.） RNN, LSTM, Transformer 画像データ CNN （今日はお話しする余裕がなさそうです・・・） https://stanford.edu/~shervine/teaching/cs-230/cheatsheet-convolutional-neural-networks

https://stanford.edu/~shervine/teaching/cs-230/cheatsheet-convolutional-neural-networks

34.

Recurrent Neural Network (RNN) 1. Overviews and Aims of the Techniques Neural Networks Capable of Handling Sequence Data and Time Series Data. 2. Theory and Network Architecture Previous Outputs to be Used as Inputs While Having Hidden States. “Hands-On Machine Learning with Scikit-Learn, Keras & TensorFlow.” 2nd Edition

35.

Recurrent Neural Network (RNN) Highlight of the Theory Forward 𝒉𝒕 = 𝜙ℎ 𝑾𝑻𝒙𝒉 𝑿𝒕 + 𝑾𝑻𝒉𝒉 𝒉𝒕−𝟏 + 𝒃𝒉 𝑦ො𝑡 = 𝜙𝑜 𝑾𝑻𝒚𝒉 𝒉𝒕 + 𝒃𝒚 Weight is shared across time. Activation Function 𝜙ℎ is usual to use a saturating nonlinear function (such as sigmoid or a tanh) Backward 𝐿 = σ𝑇𝑘=1 𝐿 𝑦𝑘 , 𝑦ො𝑘 , 𝜕𝐿𝑘 𝜕𝑾 𝜕𝐿 σ𝑇𝑘=1 𝑘 = 𝜕𝑾 = 𝜕𝑿𝒕 𝜕𝑿𝒌 = ς𝑘𝑖=𝑡 𝜕𝐿 𝜕𝑿𝒕 𝜕𝑿𝒌 σ𝑇𝑘=1( 𝑘 ) 𝜕𝑿𝒕 𝜕𝑿𝒌 𝜕𝑾 𝜕𝑿𝒊 = 𝜕𝑿𝒊−𝟏 ς𝑘𝑖=𝑡 𝑾𝑻𝒉𝒉𝑑𝑖𝑎𝑔(𝜙ℎ′ 𝑿𝒊−𝟏 ) https://stanford.edu/~shervine/teaching/cs-230/cheatsheetrecurrent-neural-networks Vanishing/Exploding Gradient Problems 𝜙ℎ′ 𝑿𝒕 < 1 (tanh < 1 , sigmoid < 1/4) Gradient clipping Gated RNN (GRU, LSTM etc.)

https://stanford.edu/~shervine/teaching/cs-230/cheatsheet-recurrent-neural-networks

36.

Recurrent Neural Network (RNN) Applications RNN models are mostly used in the fields of Natural Language Processing (NLP) and speech recognition. Various applications of RNN are summarized in the right table. https://stanford.edu/~shervine/teaching/cs-230/cheatsheetrecurrent-neural-networks

https://stanford.edu/~shervine/teaching/cs-230/cheatsheet-recurrent-neural-networks

37.

Recurrent Neural Network (RNN) Reference 1) “Hands-On Machine Learning with Scikit-Learn, Keras & TensorFlow.” 2nd Edition, OREILLY 2) 「ゼロから作るDeepLearning2」オライリージャパン 3) “On the difficulty of training Recurrent Neural Networks.” arXiv preprint arXiv:1211.5063v2 4) https://stanford.edu/~shervine/teaching/cs-230/cheatsheetrecurrent-neural-networks 5) https://mmuratarat.github.io/2019-02-07/bptt-of-rnn

38.

LSTM (Long Short-Term Memory ) Hochreiter, S., Schmidhuber, J., Long short-term memory. Neural Computation 1997, 9, 1735-80. 1. Overviews and Aims of the Techniques • RNN の発展形で、Gated RNN のひとつ "The key idea is that the network can learn what to store in the long-term state, what to throw away, and what to read from it." -- Geron, p 516 • Simple RNN の課題である，Back Propagation Through Time での勾配消失に対応途中で勾配が 0 になると，それ以前の寄与が全て消失する Back Propagation

39.

LSTM LSTM cell LSTM cell LSTM cell 𝑡 𝑡+1 2. Theory and Network Architecture • 3 Gate Controllers & Main Layer • Gate Controllers も Dense Layer からなり，学習する • Hidden State Vectors: Short-term State, Long-term State 𝑡−1 output long-term state new long-term state short-term state LSTM cell new short-term state input https://medium.com/@andre.holzner/lstm-cells-in-pytorch-fab924a78b1c

https://medium.com/@andre.holzner/lstm-cells-in-pytorch-fab924a78b1c

40.

LSTM cell 𝐠 𝑡 = tanh(𝐖𝑖𝑔⊤ 𝐱 𝑡 + 𝐛𝑖𝑔 + 𝐖ℎ𝑔⊤ 𝐡𝑡−1 + 𝐛ℎ𝑔 ) PyTorch documentation より https://pytorch.org/docs/stable/generated/torch.nn.LSTM.html Gates : 成分 ∈ [0,1] 𝐢𝑡 = 𝜎(𝐖𝑖𝑖 ⊤ 𝐱 𝑡 + 𝐛𝑖𝑖 + 𝐖ℎ𝑖 ⊤ 𝐡𝑡−1 + 𝐛ℎ𝑖 ) 𝐟𝑡 = 𝜎(𝐖𝑖𝑓 ⊤ 𝐱 𝑡 + 𝐛𝑖𝑓 + 𝐖ℎ𝑓 ⊤ 𝐡𝑡−1 + 𝐛ℎ𝑓 ) 𝐨𝑡 = 𝜎(𝐖𝑖𝑜 ⊤ 𝐱 𝑡 + 𝐛𝑖𝑜 + 𝐖ℎ𝑜⊤ 𝐡𝑡−1 + 𝐛ℎ𝑜 ) ∘: Hadamard product 𝑡: step 𝜎: sigmoid function Long-term State 𝐜𝑡 = 𝐟𝑡 ∘ 𝐜𝑡−1 + 𝐢𝑡 ∘ 𝐠 𝑡 Short-term State 𝐡𝑡 = 𝐨𝑡 ∘ tanh(𝐜𝑡 ) input 用 weight 𝐖𝑖𝑔 , 𝐖𝑖𝑖 , 𝐖𝑖𝑓 , 𝐖𝑖𝑜 Short-term State 用 Weight 𝐖ℎ𝑔 , 𝐖ℎ𝑖 , 𝐖ℎ𝑓 , 𝐖ℎ𝑜 Bias 𝐛𝑖𝑔 etc.

https://pytorch.org/docs/stable/generated/torch.nn.LSTM.html

41.

LSTM (Long Short-Term Memory ) Memory 前のステップ 𝑡 − 1 の Hidden State Vectors を受け取る Hidden State Vectors には2種類ある • Long-term State 𝐜𝑡−1 • Short-term State 𝐡𝑡−1 どちらもベクトル

42.

LSTM (Long Short-Term Memory ) Forget Gate forget gate 𝐟𝑡 は Long-term State 𝐜𝑡−1 をどれだけ引き継ぐかを決める Scaling Factor が並んだベクトル 𝐟𝑡 の各要素 𝑓𝑖 は， 0 ≤ 𝑓𝑖 ≤ 1 𝑓𝑖 = 0 をかければ完全に忘れる，𝑓𝑖 = 1 をかければそのまま受け取る。 Hadamard Product（要素ごとの積 ∘ で表記した）をとる。例：4-dim なら， 𝑐1 𝑓1 𝑐2 𝑓 𝐟𝑡 ∘ 𝐜𝑡−1 = 2 ∘ 𝑐 𝑓3 3 𝑐4 𝑓4 = 𝑓1 𝑐1 𝑓2 𝑐2 𝑓3 𝑐3 𝑓4 𝑐4

43.

LSTM (Long Short-Term Memory ) Gate Controller forget gate 𝐟𝑡 は gate controller (dense layer) が出力したベクトル。入力が input 𝐱 𝑡 と short-time state 𝐡𝑡−1 の2つあり，それぞれに対応したネットワークパラメータがある。 𝜎: sigmoid function ⊤ ⊤ 𝐟𝑡 = 𝜎(𝐖𝑖𝑓 𝐱 𝑡 + 𝐛𝑖𝑓 + 𝐖ℎ𝑓 𝐡𝑡−1 + 𝐛ℎ𝑓 ) ※ ネットワークパラメータの添え字：一つ目の 𝑖 は input 用の，ℎ は short-term state 用の，二つ目の 𝑓 は forget gate controller のという意味

44.

LSTM (Long Short-Term Memory ) Output Main Layer の出力 𝐠 𝑡 を計算 (𝐠 𝑡 は Cell Gate とも) 𝐠 𝑡 = tanh(𝐖𝑖𝑔⊤ 𝐱 𝑡 + 𝐛𝑖𝑔 + 𝐖ℎ𝑔⊤ 𝐡𝑡−1 + 𝐛ℎ𝑔 ) input gate 𝐢𝑡 : 𝐠 𝑡 に対する scaling factor 𝐢𝑡 = 𝜎(𝐖𝑖𝑖 ⊤ 𝐱 𝑡 + 𝐛𝑖𝑖 + 𝐖ℎ𝑖 ⊤ 𝐡𝑡−1 + 𝐛ℎ𝑖 ) 以上より、更新後の Long-term State を計算 𝐜𝑡 = 𝐟𝑡 ∘ 𝐜𝑡−1 + 𝐢𝑡 ∘ 𝐠 𝑡 これは次の Step 𝑡 + 1へ引き継がれると共に， Short-term State 𝐡𝑡 の計算にも用いられる。

45.

Applications of LSTM Natural language Processing, speech recognition, handwriting recognition, music composition, market prediction ... Hojjat Salehinejad, Sharan Sankar, Joseph Barfett, Errol Colak, Shahrokh Valaee. Recent Advances in Recurrent Neural Networks. arXiv preprint, arXiv:1801.01078, 2017. https://arxiv.org/abs/1801.01078 ..., and drug design Molecular Graph SMILES String Molecular Representation CN1c2ncn(C)c2C(=O)N(C)C1=O Reference REINVENT (AstraZeneca): Marcus Olivecrona, Thomas Blaschke, Ola Engkvist, Hongming Chen., Molecular de-novo design through deep reinforcement learning, Journal of Cheminformatics 2017, 9, 48.

https://arxiv.org/abs/1801.01078

46.

LSTM-Related Architecture GRU (Gated Recurrent Unit) • • • • Forget Gate と Input Gate の Controllers が一つにまとまり Update Gate Controller に Long-term State と Short-term State Vectors も一つにまとまった Update Gate 𝐳𝑡 が LSTM の forget gate の役割，(𝟏 − 𝐳𝑡 )が input gate の役割 Reset Gate 𝐫𝑡 は Main Layer に入る Hidden State Vector 𝐡𝑡−1 をスケーリングする，そのかわり Output Gate は無い 𝐠 𝑡 = tanh(𝐖𝑖𝑔 ⊤ 𝐱 𝑡 + 𝐛𝑖𝑔 + 𝐖ℎ𝑔⊤ 𝐫𝑡 ∘ 𝐡𝑡−1 + 𝐛ℎ𝑔 ) Reset Gate 𝐫𝑡 𝐫𝑡 = 𝜎(𝐖𝑖𝑟 ⊤ 𝐱 𝑡 + 𝐛𝑖𝑟 + 𝐖ℎ𝑟 ⊤ 𝐡𝑡−1 + 𝐛ℎ𝑟 ) Update Gate 𝐳𝑡 𝐳𝑡 = 𝜎(𝐖𝑖𝑧 ⊤ 𝐱 𝑡 + 𝐛𝑖𝑧 + 𝐖ℎ𝑧 ⊤ 𝐡𝑡−1 + 𝐛ℎ𝑧 ) output (= new hidden state vector) 𝐡𝑡 = 𝐳𝑡 ∘ 𝐡𝑡−1 +(𝟏 − 𝐳𝑡 ) ∘ 𝐠 𝑡 Kyunghyun Cho, Bart van Merrienboer, Dzmitry Bahdanau, Yoshua Bengio. On the Properties of Neural Machine Translation: EncoderDecoder Approaches. arXiv:1409.1259, 2014. https://arxiv.org/abs/1409.1259

https://arxiv.org/abs/1409.1259

47.

Transformer ◼ transformer ⚫ 自然言語処理（NLP）や画像解析などの広い領域で、大きな成果！ ⚫ ”どこに注意 (attention) するか” を考慮しながら、 AIが学習！ 𝑸 𝑽 𝑸𝒊 = 𝑸𝑾𝒊 , 𝑲𝒊 = 𝑲𝑾𝑲 𝒊 , 𝑽𝒊 = 𝑽𝑾𝒊 𝑸𝒊𝑲𝒊𝑻 𝒂𝒕𝒕𝒆𝒏𝒕𝒊𝒐𝒏𝒊 = Softmax ( transformer by Google (Attention is all you need) 𝒅𝒌 )𝑽𝒊 𝒎𝒖𝒍𝒕𝒊‐ 𝒉𝒆𝒂𝒅 𝒂𝒕𝒕𝒆𝒏𝒕𝒊𝒐𝒏= concat (𝒂𝒕𝒕𝒆𝒏𝒕𝒊𝒐𝒏𝟏 ,… 𝒂𝒕𝒕𝒆𝒏𝒕𝒊𝒐𝒏𝒉 )𝑾𝒐

48.

Transformer 1. Overviews and Aims of the Techniques an Encoder-Decoder model involving the Multi-Head Attention without employing RNNs or CNNs. 2. Theory and Network Architecture The Transformer Architecture is shown below (Figs. 1 and 2). Encoder Decoder * <BOS> : Beginning of sentence Fig.1. Architecture of Transformer Fig.2 An overview of Translation (Japanese -> English)

49.

Transformer Highlight of the Theory-1 Multi-Head Attention Attention(𝑄, 𝐾, 𝑉) = 𝑠𝑜𝑓𝑡𝑚𝑎𝑥 𝑄𝐾𝑇 𝑑𝑘 𝑉 Q : Query, K : Key, V : Value, dk : The dimension of key(512-dim) Divide Q, K, and V into 8 parts (the number of heads), and calculate the Attention. Finally, they are concatenated in one part. . s Multihead(Q, K, V) = 𝐶𝑜𝑛𝑐𝑎𝑡 ℎ𝑒𝑎𝑑1, … , ℎ𝑒𝑎𝑑ℎ 𝑊 𝑄 where headi = Attention(𝑄𝑊𝑖 , 𝐾𝑊𝑖𝐾 , 𝑉𝑊𝑖𝑉 ) 64-dim 512-dim Q, K, V = =8 Word Count Word Count 512 64-dim WiQ, WiK, WiV ( i = 1~8 ) Qi, Ki, Vi (i = 1~8 ) Fig.3 Multi-Head Attention

50.

Transformer Highlight of the Theory-2 BOS・・・ BOS Masked Multi-head Attention … Use(Add) “-inf” to hide future information. (Prevention of cheating) Calculation is the same as Multi-head Atten. c c c n c softmax(-inf)=0 s Fig.4 Masked Multi-head Attention Positional Encoding, Feed-Forward layer ✓(Absolute)Positional Encoding Add positional information with sinusoid. 𝑃𝐸(𝑝𝑜𝑠,2𝑖) = sin(𝑝𝑜𝑠/100002𝑖/𝑑𝑚𝑜𝑑𝑒𝑙 ) 𝑃𝐸(𝑝𝑜𝑠,2𝑖+1) = cos(𝑝𝑜𝑠/100002𝑖/𝑑𝑚𝑜𝑑𝑒𝑙 ) dmodel : Number of vector dimensions when word Embedding is performed i denotes how many dimensions of vector, pos : the number of vectors ✓ Feed-Forward Networks(FFN) layer is a simple linear layer. 𝐹𝐹𝑁 𝑥 = max 0, 𝑥𝑊1 + 𝑏1 𝑊2 + 𝑏2 ReLU Fig.5 Positional Encoding

51.

Molecular Transformer AI Network System for Chemical Reaction : Reactants and Reagents → Prediction of the products Encoder側 ① Embedding層によって入力化合物を256次元のベクトルに圧縮 ② Positional Encoder層によって位置情報を付加 ③ Multi-Head Attention層でSelf Attentionを計算し、データ内照応関係を付加 ④ 各種Normalizationを行う ⑤ Point-wise順伝播ネットワーク（PFFN）で活性化関数を適用 ⑥ 各種Normalizationを行う ③～⑥をN=4層(回)繰り返す Decoder側 ① Embeddingレイヤによって入力化合物を２５６次元のベクトルに圧縮 ② Positional Encoder層によって位置情報を付加 ③ Masked Multi-Head AttentionでSelf Attentionを計算し、データ内照応関係を付加 ④ 各種Normalizationを行う ⑤ ここまでの出力をQueryに、Encoderの出力をKeyとValueにして Multi-Head AttentionでAttentionを計算し、異なる時系列データの照応関係情報を獲得 ⑥ 各種Normalizationを行う ⑦ PFFNで変換 ⑧ 各種Normalizationを行う ③～⑧をN=4層(回)繰り返す Ref) 1) arXiv:1706.03762v5 [cs.CL] 6 Dec 2017, “Attention Is All You Need” 2) ACS Cent. Sci. 2019, 5, 1572−1583  Molecular Transformer

52.

＜transformerCPI ＞  2D Descriptors of Compounds アミノ酸残基をベクトル化化合物原子をベクトル化

53.

＜transformerCPI ＞ Problems 1. Using inappropriate datasets 2. Hidden Ligand Bias 3. Splitting dataset inappropriately Chen L, Cruz A, Ramsey S, Dickson CJ, Duca JS, Hornak V, et al., “Hidden bias in the DUD-E dataset leads to misleading performance of deep learning in structure-based virtual screening”, PLoS ONE, 14 (2019), e0220113. https://doi.org/10.1371/journal.pone.0220113

https://doi.org/10.1371/journal.pone.0220113

54.

Chen L, Cruz A, Ramsey S, Dickson CJ, Duca JS, Hornak V, et al., “Hidden bias in the DUD-E dataset leads to misleading performance of deep learning in structure-based virtual screening”, PLoS ONE, 14 (2019), e0220113. https://doi.org/10.1371/journal.pone.0220113

https://doi.org/10.1371/journal.pone.0220113

55.

＜transformerCPI ＞ Problems 1. Using inappropriate datasets 2. Hidden Ligand Bias 3. Splitting dataset inappropriately

56.

＜PL transformer＞ Problems 1. Using inappropriate datasets 2. Hidden Ligand Bias 3. Splitting dataset inappropriately 以下の１について若干、補足します。ここで解析システムにタンパク質の立体構造を導入する場合でも、推論の際に「ユーザ」が入力するデータ自体は変わらない（アミノ酸配列と化合物）、そうした新たな計算スキームの構築を意図しています。アミノ酸残基 Possible Issues to Construct a New System for P-L Analysis Based on Transformer 1. Involvements of Protein 3D Structures → Recognition of Atoms in Side Chains 2. Equivalent Double Recognition of Protein and Ligand → Amino Acid Sequences and Compound Chemical Structures 3. Construction of Database of Decoy Ligands → Reduction of False Positives and the Bias Derived from Ligands Ref) AttentionSiteDTI : Briefings in Bioinformatics, 23 (4), July 2022, bbac272.

57.

＜PL transformer＞ amino acid/ligand atom attention 学習を融合する atom a.a. atom a.a. P P P P P Q0 P K0 別々に計算させることは可能であるが、本来は共通のはず！？それぞれのweights atom P a.a. a.a. atom P P Q0 P P P K0

58.

＜PL transformer＞ amino acid/ligand atom attention 学習を融合する atom a.a. atom a.a. P P P P P Q0 P K0 同じ Weights を用いて計算する！ atom P a.a. a.a. atom P P Q0 P P P K0 別々に計算させることは可能であるが、本来は共通のはず！？

59.

＜他の解析技術＞ transformerCPI 回帰モデルとの比較 AttentionDTA 蛋白質アミノ酸配列、化合物 SMILESからWord2vecで特徴抽出して機械学習モデルを構築（Davis datasetのSOTA） OnionNet 複合体の３次元立体構造を学習データとし、空間的な Shell ごとに、化合物と蛋白質の原子の組み合わせを抽出し、機械学習モデルを構築する https://github.com/zhaoqichang/AttentionDTA_BIBM https://github.com/zhenglz/onionnet ＊Davis dataset : (442kinase) x (68ligand)の活性値 30,056データセットで欠損値無し

60.

参考文献 1) Scikit-Learn、Keras、TensorFlowによる実践機械学習第2版, Aurélien Géron 著（英語版をお勧めします） 2) パターン認識と機械学習, C.M. ビショップ著（英語版をお勧めします） 3) ベイズ深層学習 (機械学習プロフェッショナルシリーズ), 須山敦志著  Bayes Statistics （AI に確率分布を導入する！） Ex) VAE, GPLVM, etc. 4) 曲率とトポロジ, 河野俊丈著  Differential Geometry, Riemann Geometry （AI に “曲がった空間” を導入するために、その基礎を学ぼう！）

61.

Variational Autoencoder (VAE) Loss function について若干の補足を記します。 −ℒ(𝜙, 𝜃, 𝒙) = 𝔼𝑞𝜙 𝒛|𝒙 log 𝑞𝜙 𝒛|𝒙 − log 𝑝𝜃 (𝒛) − log 𝑝𝜃 𝒙|𝒛 以下、VAE VAE における Loss Function（Lagrangian）には、ニューラルネットワークの入出力 = 𝐷𝐾𝐿 𝑞𝜙 (𝒛|𝒙)ԡ𝑝𝜃 (𝒛) − 𝔼𝑞𝜙 𝒛|𝒙 log 𝑝𝜃 𝒙|𝒛 の値の差（Error）の他に、前述の Regularization（正則化項）が KL-divergence を用いて導入されます（次ページ）。この Loss Function は、ベイズニューラルネットワー Regularization Reconstruction ク（下図を参照）に対して Principle）やMonte 平均場近似などを応 Since 𝑞𝜙 𝒛 𝒙 = 𝒩(𝝁, 𝚺)変分原理（Variational and 𝑝𝜃 𝒛 = 𝒩(𝟎, 𝑰), carlo estimation 用することにより理論的に導出されます。 by 𝒛𝑛 ∼ 𝑞𝜙 𝒛𝑛 𝒙𝑛 KL-divergence by represent in closed-form このように、ここでは物理学と情報理論の terminology が混じっていますね。変分 𝑁 𝑁 1 1 原理（物理学）は AI では変分推論⊤ などと呼びます。詳細は参考文献３などを御参 𝐷𝐾𝐿 𝑞𝜙 𝒛 𝒙 ||𝑝𝜃 𝒛 = ෍ 𝑡𝑟𝚺 + 𝝁 𝝁 − 𝐷 − log det 𝚺 ෍ 𝐿𝑜𝑠𝑠 𝒙𝑛 , 𝒚𝑛 2 𝑁 照下さい。 𝑛=1 𝑛=1 Reconstruction loss Regularization loss Mean Data 𝐷 𝒙𝑛 ∈ ℝ Encoder 𝝁𝑛 ∈ ℝ𝐿 𝑓𝜙 (𝑥) Cov. ~𝒛 Latent variable 𝑛 ∼ 𝒩(𝝁𝑛 , 𝚺𝑛 ) 𝚺𝑛 ∈ ℝ𝐿×𝐿 Prior 𝒛𝑛 ∼ 𝒩(0, 𝑰) Decoder Output 𝑔𝜃 (𝑧) 𝒚𝑛 ∈ ℝ𝐷

62.

Variational Autoencoder (VAE) Application −ℒ(𝜙, 𝜃, 𝒙) = 𝔼𝑞𝜙 Loss function 𝒛|𝒙 log 𝑞𝜙 𝒛|𝒙 − log 𝑝𝜃 (𝒛) − log 𝑝𝜃 𝒙|𝒛 = 𝐷𝐾𝐿 𝑞𝜙 (𝒛|𝒙)ԡ𝑝𝜃 (𝒛) − 𝔼𝑞𝜙 𝒛|𝒙 Regularization log 𝑝𝜃 𝒙|𝒛 Reconstruction Since 𝑞𝜙 𝒛 𝒙 = 𝒩(𝝁, 𝚺) and 𝑝𝜃 𝒛 = 𝒩(𝟎, 𝑰), Monte carlo estimation KL-divergence by represent in closed-form by 𝒛𝑛 ∼ 𝑞𝜙 𝒛𝑛 𝒙𝑛 𝑁 𝐷𝐾𝐿 𝑞𝜙 𝒛 𝒙 ||𝑝𝜃 𝒛 1 = ෍ 𝑡𝑟𝚺 + 𝝁⊤ 𝝁 − 𝐷 − log det 𝚺 2 𝑛=1 𝑁 1 ෍ 𝐿𝑜𝑠𝑠 𝒙𝑛 , 𝒚𝑛 𝑁 𝑛=1 Reconstruction loss Regularization loss Mean Data 𝐷 𝒙𝑛 ∈ ℝ Encoder 𝝁𝑛 ∈ ℝ𝐿 𝑓𝜙 (𝑥) Cov. ~𝒛 Latent variable 𝑛 ∼ 𝒩(𝝁𝑛 , 𝚺𝑛 ) 𝚺𝑛 ∈ ℝ𝐿×𝐿 Prior 𝒛𝑛 ∼ 𝒩(0, 𝑰) Decoder Output 𝑔𝜃 (𝑧) 𝒚𝑛 ∈ ℝ𝐷

63.

ODEnet I : Genome Science Application to transcriptomics with VAE-joined ODE model Zhanlin Chen, William, C. King, Aheyon Hwang, Mark Ge rstein, Jing Zhang, “DeepVelo: Single-cell Transcriptomic Deep Velocity Field Learning with Neural Ordinary Differential Equations”, arXiv, 2022 https://doi.org/10.1101/2022.02. 15.480564 ODEnet A Variational Autoencoder (VAE), trained to predict the rate of gene expression change (e.g., RNA velocity is an example of a model representing the turn over of mRNA), is embedded in the ODE, to infer continuous temporal changes in the gene expression within individual cells. The system learns the vector field of gene expression values, and thereby can accurately reproduce the rates. arXiv:1806.07366v5 [cs.LG] 14 Dec 2019

https://doi.org/10.1101/2022.02.

64.

ODEnet I : Genome Science Application to transcriptomics with VAE-joined ODE model Velocity Field Zhanlin Chen, William, C. King, Aheyon Hwang, Mark Ge rstein, Jing Zhang, “DeepVelo: Single-cell Transcriptomic Deep Velocity Field Learning with Neural Ordinary Differential Equations”, arXiv, 2022 https://doi.org/10.1101/2022.02. 15.480564 By learning the velocity field in terms of the gene expression, a VAEjoined ODE model can predict the future cellular state across the time. This can be employed to explore the crucial factors that induce the specific cellular state in our analysis, and thereby it is expected to use the neural ODE model to identify target proteins that are responsible for a disease. This is based on an accurate, high quality prediction by ODEnet, whereas other AI models do not usually accomplish. 細胞の状態空間における「場」を学習した後、そのAI（ODEnet）は細胞の状態を推論することができる！

https://doi.org/10.1101/2022.02.

65.

ODEnet I : Genome Science Application to transcriptomics with VAE-joined ODE model Gene Expression: 𝒙𝑡 𝒇𝑨 (𝒙) 𝜕𝒙𝑡 Velocity: 𝜕𝑡 ⊕ 𝜕𝒙𝑡 = 𝒇𝑨(𝒙𝑡 ) 𝜕𝑡 The first-order Euler's method for finding the state 𝒙𝑡+1 is 𝒙𝑡+1 = 𝒙𝑡 + 𝒇(𝒙𝑡 ) ODEs can be solved in the VAE-joined model. arXiv:1806.07366v5 [cs.LG] 14 Dec 2019 https://qiita.com/kenchin110100/items/7ceb5b8e8b21c551d69a Zhanlin Chen, William, C. King, Aheyon Hwang, Mark Ge rstein, Jing Zhang, “DeepVelo: Single-cell Transcriptomic Deep Velocity Field Learning with Neural Ordinary Differential Equations”, arXiv, 2022 https://doi.org/10.1101/2022.02. 15.480564 細胞の状態空間における「場」を学習した後、そのAI（ODEnet）は細胞の状態を推論することができる！

66.

Deep Learning of Wave Function to Construct a Precise XC Functional Generation of ψ Coupled with Neural Networks Solving the fractional electron problem (The delocalization error of electron density and The spin symmetry breaking) In a DFT calculation, the functional determines the charge density of a molecule by finding the configuration of electrons which minimizes energy. Thus, errors in the functional can lead to errors in the calculated electron density. And, when describing the breaking of chemical bonds, existing functionals tend to unrealistically prefer configurations in which a fundamental symmetry known as spin symmetry is broken. DM21 predicts 𝐸𝑋𝐶 𝜌 𝑟റ 𝑀𝐿𝑃 as a 𝐸𝑋𝐶 with a Neural Network and incorporating exact properties into the training data. Ref: WebSite of DeepMind https://www.deepmind.com/blog/simulating-matter-on-the-quantum-scale-with-ai

https://www.deepmind.com/blog/simulating-matter-on-the-quantum-scale-with-ai

67.

Deep Learning of Wave Function to Construct a Precise XC Functional The followings are for you reference to get an overview of the present issues. Nearly a century ago, Erwin Schrodinger propsed his famous equation govering the behavior of QM particles. Then, Pierre Hohenberg and Walter Kohn realised that it is not necessary to track each electron individually, and instead, knowing the probability for any electron to be at each position (i.e., the electron density) is sufficient to exactly compute all interactions. After proving this, Kohn founded Density Functional Theory (DFT). 𝐸𝐷𝐹𝑇 𝜌 𝑟റ = 𝑇 𝜌 𝑟റ + 𝑉𝑁𝐸 + 𝐽 𝜌 𝑟റ + 𝐸𝑋𝐶 𝜌 𝑟റ Exchange-Correlation (XC) Although DFT proves a mapping exists, for more than 50 years the exact nature of this mapping between electron density and interaction energy — the so-called density functional — has remained unknown and has to be approximated. Over the years, researchers have proposed many approximations to the exact functional with varying levels of accuracy. Then, by expressing the functional as a Neural Network and incorporating these exact properties into the training data, we learn functionals free from important systematic errors — resulting in a better description of a broad class of chemical reactions. These longstanding challenges are both related to how functionals behave when presented with a system that exhibits “fractional electron character.” By using a neural network to represent the functional and tailoring our training dataset to capture the fractional electron behaviour expected for the exact functional, we found that we could solve the problems. Ref: WebSite of DeepMind https://www.deepmind.com/blog/simulating-matter-on-the-quantum-scale-with-ai

https://www.deepmind.com/blog/simulating-matter-on-the-quantum-scale-with-ai

68.

Deep Learning of Wave Function to Construct a Precise XC Functional The followings are for you reference to get an overview of the present issues. Nearly a century ago, Erwin Schrodinger propsed his famous equation govering the behavior of QM particles. Then, Pierre Hohenberg and Walter Kohn realised that it is not necessary to track each electron individually, and instead, knowing the probability for any electron to be at each position (i.e., the electron density) is sufficient to exactly compute all interactions. After proving this, Kohn founded Density Functional Theory (DFT). 𝐸𝐷𝐹𝑇 𝜌 𝑟റ = 𝑇 𝜌 𝑟റ + 𝑉𝑁𝐸 + 𝐽 𝜌 𝑟റ + 𝐸𝑋𝐶 𝜌 𝑟റ Although DFT proves a mapping exists, for more than 50 years the exact nature of this mapping between electron The density functional has remained unknown and has to be approximated. density and interaction energy — the so-called density functional — has remained unknown and has to be By using a neural network to represent the exact functional, we found that approximated. Over the years, researchers have proposed many approximations to the exact functional with varying we could solveThen, the by problems. levels of accuracy. expressing the functional as a Neural Network and incorporating these exact properties into the training data, we learn functionals free from important systematic errors — resulting in a better Handmade (approximation) → Neural Networks (exact functional) description of a broad class of chemical reactions. These longstanding challenges are both related to how functionals behave when presented with a system that exhibits “fractional electron character.” By using a neural network to represent the functional and tailoring our training dataset to capture the fractional electron behavior expected for the exact functional, we found that we could solve the problems. Ref: WebSite of DeepMind https://www.deepmind.com/blog/simulating-matter-on-the-quantum-scale-with-ai

https://www.deepmind.com/blog/simulating-matter-on-the-quantum-scale-with-ai

69.

Deep Learning of Wave Function to Construct a Precise XC Functional Generation of ψ Coupled with Neural Networks QM 3D Descriptors of Compounds Ref: WebSite of DeepMind https://www.deepmind.com/blog/simulating-matter-on-the-quantum-scale-with-ai

https://www.deepmind.com/blog/simulating-matter-on-the-quantum-scale-with-ai

70.

Deep Learning of Wave Function to Construct a Precise XC Functional

71.

Deep Learning of Wave Function to Construct a Precise XC Functional

72.

Manifold-Based Geometrical Techniques Manifold Embedding of Biological Multi-Modal Data Multi Modal Data Patch Clump and RNA-Seq Aim : a deeper understanding of underlying complex mechanisms across scales for phenotypes Method : an interpretable regularized learning model, deepManReg Single Cell Multi-Modal Data (Patch-Seq Data) including transcriptomics and electrophysiology for neuronal cells in the mouse brain

73.

Manifold-Based Geometrical Techniques Data source: multi-modal data from mouse brain neuronal cells Mouse visual cortex • Gene expression (4020 neurons) • Electrophysiology Multi-modal feature Classification Findings: (brain layer and electrophysiological property) Neurons can be defined into 28 types Distinguished by their layer-specific connection patterns

74.

Curved Space（曲がった空間）曲がった空間における平行移動は可能か？北極点 𝒗’𝟐 地球の表面において， 𝒗’𝟏 𝒗𝟏 ：赤道上の点 O における、経線方向の接線ベクトル 𝒗𝟐 ：赤道上を移動し、点 O' から経線方向の接線ベクトルこれらのふたつの接線ベクトル 𝒗’𝟏 ，𝒗’𝟐 が北極点に移動した後に、それらのベクトルの向きを比較すると、大きく異なる！ → 空間の曲率 → 曲がった空間！赤道 O 𝒗𝟏 𝒗𝟐 O'

75.

Curved Space（曲がった空間）曲がった空間における微分は可能か？ 𝒆𝟐 x Tangent Plane（接平面） : 𝑇𝑥 (M) 曲面 M における、ある点 x での接平面。 Tangent Space（接空間）。曲面 M ● 𝒆𝟏 ⊿𝒔 𝒆′𝟐 ● x' 𝒆′ 𝟏 𝒓 𝜃 𝒓′ ● O Tangent Vector Bundle（接ベクトルバンドル） TM： Tangent Vectors の集合 𝒆𝟏 , 𝒆𝟐 ：点 x における Tangent Plane の Basis Set 𝒆′𝟏 , 𝒆′𝟐 ：点 x’ における Tangent Plane の Basis Set

76.

Curved Space（曲がった空間）曲がった空間における微分は可能か？ 𝒆𝟐 曲面 M における Vector Field 𝒖 において、曲線C 上の位置がパラメータ t によって決められているとし、これを 𝒖 𝑡 と記す。ここで 𝒖 𝑡 を Tangent Plane の Basis Set によって展開すると、曲面 M x ● 𝒆′𝟐 𝒆𝟏 ⊿𝒔 ● x' 𝒆′ 𝟏 𝒓 𝜃 𝒓′ ● O 𝒖 𝑡 = ෍ 𝑢𝜇 𝒆𝝁 = 𝑢𝜇 𝒆𝝁 𝜇 C 𝒆𝜶 A ● と書ける。ただし右辺は、Einstein’s Summation Conventionを用いた（→ Suffices が上下に揃う場合には、Summation の記号 Σ を略する）。 A' ● 𝒆′𝜶 𝒖(𝒕) 𝒖(𝒕 + 𝜺)

77.

Curved Space（曲がった空間）曲がった空間における微分は可能か？ 𝒆𝟐 曲がった空間における Basis 𝒆𝝁 の（ 𝒆𝛼 の方向への）微分は、 𝒆𝝁 𝑡 + 𝜀 − 𝒆𝝁 || (𝑡; 𝜺) ∇𝑒𝛼 𝒆𝝁 = lim 𝜺 →0 𝜺 曲面 M x ● 𝒆′𝟐 𝒆𝟏 ⊿𝒔 ● x' 𝒆′ 𝟏 𝒓 𝜃 𝒓′ ● O と書ける。よって、左図（u(t) の微分を示す）において緑色の無限小ベクトルをBasis 𝒆𝝁 に応用し、 𝜺 ∇𝑒𝛼 𝒆𝝁 と与えられる。微分は Tangent Space 上に存在する。 C 𝒆𝜶 A ● A' 𝒆′𝜶 𝒖|| 𝒕; 𝜺 𝒖(𝒕) 𝜺 ∇𝑒𝛼 𝒖 𝒖(𝒕 + 𝜺) ●

78.

Curved Space（曲がった空間）曲がった空間における微分は可能か？ 𝒆𝟐 （再掲） Vector Field 𝒖 𝑡 x 𝜇 曲面 M ● 𝒆′𝟐 𝒆𝟏 ⊿𝒔 ● x' 𝒆′ 𝟏 𝜇 𝒖 𝑡 = ෍ 𝑢 𝒆𝝁 = 𝑢 𝒆𝝁 𝒓 𝜇 このとき 𝒖 𝑡 の微分は、Leibnitz’s Rule (Chain Rule) により 𝜃 𝒓′ ● O C 𝒆𝜶 A ∇𝑒𝛼 𝑢 𝑡 = ∇𝑒𝛼 𝑢𝜇 𝒆𝜇 ● A' ● = (𝜕𝛼 𝑢𝜇 )𝒆𝝁 + 𝑢𝜇 (∇𝑒𝛼 𝒆𝜇 ) を得る。ここで第２項も Basis Set {𝒆𝛽 } で展開し、 𝒖(𝒕) 𝜺 ∇𝑒𝛼 𝒆𝝁 𝒆′𝜶 𝒖|| 𝒕; 𝜺 𝒖(𝒕 + 𝜺)

79.

Curved Space（曲がった空間）曲がった空間における微分は可能か？ 𝜇 そのために 𝒆𝛽 の係数を Γ𝛽𝑎 とおくと、 ∇𝑒𝛼 𝒆𝜇 = 𝛽 Γ𝜇𝛼 𝒆𝛽 と書ける。よって、 ∇𝑒𝛼 𝑢 𝑡 = ∇𝑒𝛼 𝑢𝜇 𝒆𝜇 𝜇 定義により明らかなように Γ𝛽𝑎 は本来 𝜇 Γ 𝛽𝑎 と書くべきであるが、スペースの節約のために以下では左のように表記する。 = (𝜕𝛼 𝑢𝜇 )𝒆𝝁 + 𝑢𝜇 (∇𝑒𝛼 𝒆𝜇 ) となるから、ここで上式を代入し、 𝜇 = (𝜕𝛼 𝑢𝜇 )𝒆𝝁 +𝑢𝛽 Γ𝛽𝛼 𝒆𝜇 C 𝒆𝜶 A ● ● となり、整理すると 𝜇 = (𝜕𝛼 𝑢𝜇 + 𝑢𝛽 Γ𝛽𝛼 ) 𝒆𝜇 を得る。 A' 𝒖(𝒕) 𝜺 ∇𝑒𝛼 𝒆𝝁 𝒆′𝜶 𝒖|| 𝒕; 𝜺 𝒖(𝒕 + 𝜺)

80.

Curved Space（曲がった空間）曲がった空間における微分は可能か？これを整理すると結局のところ、 ∇𝑒𝛼 𝑢 𝑡 = ∇𝑒𝛼 𝑢𝜇 𝒆𝜇 𝜇 定義により明らかなように Γ𝛽𝑎 は本来 𝜇 Γ 𝛽𝑎 と書くべきであるが、スペースの節約のために以下では左のように表記する。 𝜇 = (𝜕𝛼 𝑢𝜇 + 𝑢𝛽 Γ𝛽𝛼 ) 𝒆𝜇 ここで 𝜇 (definition) =𝑢;𝛼 𝒆𝜇 C と書く。すなわち、微分の成分は (definition) 𝜇 𝑢;𝛼 = 𝜕𝛼 𝑢𝜇 + 𝑢𝛽 𝜇 Γ𝛽𝛼 によって与えられる。これを特に Covariance Differentiation（共変微分）という。 𝒆𝜶 A ● A' ● 𝒖(𝒕) 𝜺 ∇𝑒𝛼 𝒆𝝁 𝒆′𝜶 𝒖|| 𝒕; 𝜺 𝒖(𝒕 + 𝜺)

81.

Curved Space（曲がった空間）曲がった空間における微分は可能か？ 𝒆𝟐 Vector Field 𝒖 𝑡 x Cf) Flat な空間における grand 𝜇 𝒆′𝟐 𝒆𝟏 ⊿𝒔 ● x' 𝒆′ 𝟏 𝜇 𝒓 𝜇 𝜃 ∇𝑒𝛼 𝑢 𝑡 = ∇𝑒𝛼 𝑢𝜇 𝒆𝜇 = grad f (r) = t( ● 𝒖 𝑡 = ෍ 𝑢 𝒆𝝁 = 𝑢 𝒆𝝁 ∇f (r) 𝜕𝐸 𝜕𝐸 𝜕𝐸 𝜕𝑥 , 𝜕𝑦 , 𝜕𝑧 曲面 M ) = (𝜕𝛼 𝑢𝜇 )𝒆𝝁 + 𝒓′ ● O 𝜇 𝛽 (𝑢 Γ𝛽𝛼 )𝒆𝜇 C 𝒆𝜶 A ● 𝒖 𝑡 のこの形の微分を特に共変微分という。上式に Leibnitz’ Rule (Chain Rule) をapply すればよい。このように曲がった空間における微分は、通常の微分に「Basis Set に対する微分」を加えることで得られる。 A' ● 𝒖(𝒕) 𝜺 ∇𝑒𝛼 𝒆𝝁 𝒆′𝜶 𝒖|| 𝒕; 𝜺 𝒖(𝒕 + 𝜺)

82.

Curved Space（曲がった空間）曲がった空間における微分は可能か？ 𝒆𝟐 Vector Field 𝒖 𝑡 x 𝜇 曲面 M ● 𝒆′𝟐 𝒆𝟏 ⊿𝒔 ● x' 𝒆′ 𝟏 𝜇 𝒖 𝑡 = ෍ 𝑢 𝒆𝝁 = 𝑢 𝒆𝝁 𝒓 𝜇 𝜃 𝜇 ∇𝑒𝛼 𝑢 𝑡 = (𝜕𝛼 𝑢𝜇 )𝒆𝝁 + 𝑢𝛽 Γ𝛽𝛼 𝒆𝜇 𝒓′ ● O C これは以下のスローガンに整理できる！ 𝒆𝜶 A ● A' ● ∇𝑒𝛼 𝑢 𝑡 = 𝜕 + Γ 𝒖(𝒕) （普通の微分） +（Basis Set の微分） 𝜺 ∇𝑒𝛼 𝒆𝝁 𝒆′𝜶 𝒖|| 𝒕; 𝜺 𝒖(𝒕 + 𝜺)

83.

Curved Space（曲がった空間）曲がった空間における平行移動は可能か？北極点 𝒗’𝟐 地球の表面において， 𝒗’𝟏 𝒗𝟏 ：赤道上の点 O における、経線方向の接線ベクトル 𝒗𝟐 ：赤道上を移動し、点 O' から経線方向の接線ベクトルこれらのふたつの接線ベクトル 𝒗’𝟏 ，𝒗’𝟐 が北極点に移動した後に、それらのベクトルの向きを比較すると、大きく異なる！ → 空間の曲率 → 曲がった空間！赤道 O 𝒗𝟏 𝒗𝟐 O'

84.

Curved Space（曲がった空間）曲がった空間における微分は可能か？ 𝜇 Γ𝛽𝛼 ： 𝒆𝟐 接続係数（Christoffel Symbol） Riemann 接続： Basis Set 間の平行移動を Christoffel Symbol によって定義する（Riemann 多様体）。・前述のように、経路によってベクトルの平行移動の結果は異なる。それに対応して共変微分もまた非可換である。これは空間の曲率によるものである。・そこで閉じた経路を１周したときに生じる、ベクトルの平行移動の差によって曲率テンソルを定義する（Riemann Curvature Tensor）。注）Christoffel Symbol 自体は Tensor ではないが、例えば接続係数の差（変分など）を考えると Tensor となり得る。曲面 M x ● 𝒆′𝟐 𝒆𝟏 ⊿𝒔 ● x' 𝒆′ 𝟏 𝒓 𝜃 𝒓′ ● O C 𝒆𝜶 A ● A' ● Ex) 電磁場におけるゲージ場は接続に相当し、具体的には（電磁場における）Vector Potential A𝜇 が接続を構成する。このようにして、曲がった空間における微分や曲率を定義できる！ 𝒖(𝒕) 𝜺 ∇𝑒𝛼 𝒆𝝁 𝒆′𝜶 𝒖|| 𝒕; 𝜺 𝒖(𝒕 + 𝜺)

85.

Manifold-Based Geometrical Techniques Manifold Embedding of Biological Multi-Modal Data Single Cell Multi-Modal Data (Patch-Seq Data) including transcriptomics and electrophysiology for neuronal cells in the mouse brain Aim : a deeper understanding of underlying complex mechanisms across scales for phenotypes Method : an interpretable regularized learning model, deepManReg Multi-Scale Mechanisms ↓ Single Cell Multi-Omics Data deepManReg High Dimensional, Multi-Modal Data 1) Alignment of multi-modal features onto a common latent space (→ Laplacian eigenmap) 2) Embedding of data as points on the Stiefel manifold based on a set of orthonormal vectors

86.

Manifold-Based Geometrical Techniques Data source: multi-modal data from mouse brain neuronal cells Characterized types → Congruent met-types 1) electrophysiological and 2) transcriptomic properties ↓ Different met-types innervate different cortical layers

87.

Manifold-Based Geometrical Techniques Overview of deepManReg Phase 1 Concatenate NN outputs Embedding in Stiefel manifold Similarity between features 1) Alignment of multimodal features onto a common latent space 2) Embedding of data as points on the Stiefel manifold based on a set of orthonormal vectors Phase 2 Classification

88.

Manifold-Based Geometrical Techniques deepManReg: Phase 1 1) Alignment of multimodal features onto a common latent space Samples: Single cells of neurons in mouse visual cortex Sample (3,654) 2) Embedding of data as points on the Stiefel manifold based on a set of orthonormal vectors Modal 1: Gene (500) Gene expression Sample 3 64 E.feat 𝑗 512 (41) 3654 E-feature Electrophysiological features E-feature Modal 2: E-feature Gene 3 64 512 Gene 𝑖 3654 Sample (3,654) Gene Embed dim Gene 𝑖 E.Feat 𝑗 Laplacian eigenmaps

89.

Manifold-Based Geometrical Techniques Phase 1: Manifold Alignment, Canonical Correlation Analysis (CCA) see a description written by Chang Wang, Peter Krat, and Sridhar Mahadevan. https://people.csail.mit.edu/pkrafft/paper s/wang-et-al-2010-manifold.pdf

https://people.csail.mit.edu/pkrafft/paper

90.

Manifold-Based Geometrical Techniques Phase 1: Manifold Alignment, Canonical Correlation Analysis (CCA) 𝑑 Cost function: Weighted Euclidean distance Embed dim E-feature Gene W(i, j) = Gene 𝑖 E.Feat 𝑗 注）𝐿𝑎𝑝𝑙𝑎𝑐𝑖𝑎𝑛 div grad f (r) = ∇・∇ f (r) = 𝜕2 𝑓 𝜕𝑥2 1, if Xi and Yj are correspondence to each other 0, otherwise + 𝜕 2 𝑓 𝜕2 𝑓 + 𝜕𝑦2 𝜕𝑧2  一般のベクトル解析における Laplacian を参考までに左に記しました。 Laplacian eigenmaps References • Wang C, Krafft P and Mahadevanand S. Manifold Alignment. Manifold Learning: Theory and Applications, 2011. https://people.csail.mit.edu/pkrafft/papers/wang-et-al-2010-manifold.pdf • Belkin Mikhail and Niyogi Partha. Laplacian eigenmaps for dimensionality reduction and data representation. Neural Computation, 15 (6):1373–1396, 2003.

https://people.csail.mit.edu/pkrafft/papers/wang-et-al-2010-manifold.pdf

91.

Manifold-Based Geometrical Techniques Phase 1: Manifold Alignment, Canonical Correlation Analysis (CCA) 𝑑 Cost function: Weighted Euclidean distance Embed dim E-feature Gene W(i, j) = Gene 𝑖 1, if Xi and Yj are correspondence to each other 0, otherwise E.Feat 𝑗 注）𝐿𝑎𝑝𝑙𝑎𝑐𝑖𝑎𝑛 This problem cannot be solved directly. Herein, the orthogonal constraint is imposed, thereby leading to avoiding the trivial solution (𝔽 = 0). Thus, manifold alignment is possible to be performed by using Laplacian eigenmaps on the joint graph Laplacian (see a description written by Chang Wang, Peter Krat, and Sridhar Mahadevan, as shown later). div grad f (r) = ∇・∇ f (r) = 𝜕2 𝑓 𝜕𝑥2 + 𝜕 2 𝑓 𝜕2 𝑓 + 𝜕𝑦2 𝜕𝑧2 This can be solved by Eigen Decomposition Laplacian eigenmaps

92.

Manifold-Based Geometrical Techniques Phase 1: Manifold Alignment, Canonical Correlation Analysis (CCA) 𝑑 E-feature Gene Embed dim Gene 𝑖 E.Feat 𝑗 これはちょうど Stiefel 多様体！ 1, if Xi and Yj are correspondence to each other W(i, j) = 0, otherwise そこで（Eigen Decomp. よりも） Stiefel 多様体の上で Gradient を計算して最適化しよう！ Cost function: Weighted Euclidean distance This problem cannot be solved directly. Herein, the orthogonal constraint is imposed, thereby leading to avoiding the trivial solution (𝔽 = 0). Thus, manifold alignment is possible to be performed by using Laplacian eigenmaps on the joint graph Laplacian (see a description written by Chang Wang, Peter Krat, and Sridhar Mahadevan, as shown later). This can be solved by Eigen Decomposition

93.

Manifold-Based Geometrical Techniques Phase 1: optimization on the manifold E-feature Gene 𝑑 Cost function Laplacian eigenmaps Eigen Decomposition Embed dim (high computational cost) Gene 𝑖 ↓ Optimization on the manifold E.Feat 𝑗 Manifold Alignment Stiefel manifold 𝑆𝑛,𝑝 (Stiefel E, Commentarii Mathematici Helvetici, 1935) Set of 𝑛 × 𝑝 matrices satisfied 𝑋 ⊤ 𝑋 = 𝕀 Feature → single point on 𝑆𝑛,𝑝

94.

Manifold-Based Geometrical Techniques Gradient Descent in Manifold ∇𝔽෡ ℓ 𝑇𝑋 𝑆𝑛,𝑝 × 𝜋 ෩𝔽෡ ℓ ∇ ෡⊤ 𝐿෠ 𝔽 ෡) は ℝ𝑛×𝑝 上で定義されたコスト関数 ℓ 𝔽 = 𝑡𝑟(𝔽 関数であり，𝑆𝑛,𝑝 上で定義されたものではない勾配 ∇ℓ が接空間 𝑇𝑋 𝑆𝑛,𝑝 上にあるとは限らない → 接空間上に正射影 (orthogonal projection) 𝑆𝑛,𝑝 𝜋: ℝ𝑛×𝑝 → 𝑇𝑋 𝑆𝑛,𝑝 「曲がった空間」と、そこでの微分（共変微分）などについては、この資料の前述の箇所および参考文献４などを御参照下さい。

95.

Manifold-Based Geometrical Techniques Similarity between different modal data Similarity between features 𝔻: distance matrix 𝔻𝑖𝑗 = σ𝑘 𝔽 𝑖, 𝑘 − 𝔽 𝑗, 𝑘 2 Normalization (Transition probability) A network consists of different modal features

96.

Manifold-Based Geometrical Techniques Phase 2: classification Regularized Learning with Networks of Features (Sandler T et al., NIPS, 2008) Cross entropy L2 5 50 200 Classification model 𝑈 541 Sample Sample E-feature Gene Sample Regularization Prediction: layer in visual cortex

97.

Manifold-Based Geometrical Techniques deepManReg: summary • Multi-Modal なデータから，Features を抽出して、それらの類似性による Regularization によってサンプルを分類 • 異なる Modals の Features を比較するため，それらの Features を同じ空間にマップした。 Laplacian eigenmaps → Stiefel manifold (manifold alignment) • 異なる Modal の Features をひとつの空間に Embed して比較する解析法は重要！（理論に改善の余地もある）

98.

Manifold-Based Geometrical Techniques Reference • Nguyen ND, Huang J and Wang D. A Deep Manifold-regularized Learning Model for Improving Phenotype Prediction from Multi-modal Data. Nature Computational Science, 2022. • Wang C, Krafft P and Mahadevanand S. Manifold Alignment. Manifold Learning: Theory and Applications, 2011. https://people.csail.mit.edu/pkrafft/papers/wang-etal-2010-manifold.pdf • Belkin Mikhail and Niyogi Partha. Laplacian eigenmaps for dimensionality reduction and data representation. Neural Computation, 15 (6):1373–1396, 2003. • Bonnabel S. Stochastic gradient descent on Riemannian manifolds. IEEE Transactions on Automatic Control, 2013. • 杉田勝実, 岡本良夫, 関根松夫. 理論物理のための微分幾何学. 森北出版, 2007.

https://people.csail.mit.edu/pkrafft/papers/wang-et-al-2010-manifold.pdf

99.

Our Tactics Optimization of Multiple Modal Parameters in Physics, Chemistry, Biology, etc. Deep Learning (DL) AI Systems and Machine Learning (ML) Techniques Data Analysis on Curved, General Spaces (e.g. the Riemann Manifold) → Unification of Multimodal Data Ex) Multi Omics Data, etc. Multi-Scale Computer Simulations → Data Augmentation, High Resolution Ex) Quantum Simulation, Non-Equilibrium Molecular Dynamics (MD) Simulation, etc. Fundamentals and Principles in Natural Sciences

100.

Fundamental Mathematical and Physical Sciences Oriented to Joined AI and First Principles-Driving DD Works 主な内容 Fundamental Lecture 数学・基礎教科書複素数 Fourier Analysis 多変数関数の微積分 Vector Analysis 線形代数・基礎教科書主な内容物理学・基礎力学・解析力学電磁気学熱力学統計物理学量子物理学数理統計学・基礎物理のための数学（岩波物理入門コース）物理の数学 (岩波基礎物理シリーズ ) 1冊でマスター大学の線形代数参考書物理学（サイエンス社）よくわかる解析力学基幹物理学（培風館）理論電磁気学マッカーリ and サイモン物理化学―分子論的アプローチマッカーリ and サイモン物理化学―分子論的アプローチマッカーリ and サイモン物理化学―分子論的アプローチ 1冊でマスター大学の統計学 Practical Training Course of Computational Analysis 場とは？ PyQ + 空間とは？ Linux Architecture of Supercomputer and Operating System (OS) Machine Learning Database Management System（DBMS）and Singular Value Decomposition (SVD) Deep Learning (AI) Neural Network System for Pattern Recognition Computer Graphics Visualization (CG) of Molecular Interactions of Protein and Compounds State Space Model  状態空間 Curved Space  曲がった空間 MD and ML Seminars 参考書主な内容教科書 TextBook （Flat な空間の方が “まれ” なのかも！？）タンパク質計算科学 ― コンピュータ・シミュレーションの基礎（第2版）: 分子のミクロな性質 Molecular Dynamics Quantum Chemistry Machine Learning 基礎と創薬への応用を解明するために新しい量子化学―電子タンパク質密度汎関数法構造の理論入門〈上〉 scikit-learn と TensorFlow による実践機械学習ベイズ深層学習理論物理のための微分幾何学

計算生命科学の基礎９（舘野　賢）

R-CCS　計算科学研究推進室

関連スライド

第1回配信講義　計算科学技術特論A （2023）

第1回配信講義　計算科学技術特論A（2025）

第2回配信講義　計算科学技術特論A （2023）

第6回配信講義　計算科学技術特論A （2023）

第8回配信講義　計算科学技術特論A （2023）

第４回配信講義　計算科学技術特論A （2023）

各ページのテキスト

計算生命科学の基礎９（舘野 賢）

R-CCS 計算科学研究推進室

関連スライド

第1回 配信講義 計算科学技術特論A （2023）

第1回 配信講義 計算科学技術特論A（2025）

第2回 配信講義 計算科学技術特論A （2023）

第6回 配信講義 計算科学技術特論A （2023）

第8回 配信講義 計算科学技術特論A （2023）

第４回 配信講義 計算科学技術特論A （2023）

各ページのテキスト

計算生命科学の基礎９（舘野　賢）

R-CCS　計算科学研究推進室

第1回配信講義　計算科学技術特論A （2023）

第1回配信講義　計算科学技術特論A（2025）

第2回配信講義　計算科学技術特論A （2023）

第6回配信講義　計算科学技術特論A （2023）

第8回配信講義　計算科学技術特論A （2023）

第４回配信講義　計算科学技術特論A （2023）