SlideShare a Scribd company logo
1


Deep Learning with Implicit Gradients
Shohei Taniguchi, Matsuo Lab (M1)
•
•
• 2
- Meta-Learning with Implicit Gradients
‣ MAML inner update 

iMAML
- RNNs Evolving on an Equilibrium Manifold: A Panacea for Vanishing 

and Exploding Gradients?
‣ ERNN
2
Outline
1.
-
-
2.
- 1
‣ Implicit Reparameterization Gradients
3. Meta-Learning with Implicit Gradients
4. RNNs Evolving on an Equilibrium Manifold: A Panacea for Vanishing and
Exploding Gradients?
3
4
• (e.g. 2 )
-
-
- NN
• (e.g. )
-
-
- 

( )
y = f (x) y = ax2
+ bx + c
f (x, y) = 0 x2
+ y2
= r2
x y 5
•
•
, ,


f(x, y) = 0
dy
dx
= −
∂f/∂x
∂f/∂y
= −
fx
fy
f(x, y) = 0 (x0, y0) fy (x0, y0)
x0 ∈ U y0 ∈ V g : U → V
{(x, g(x))|x ∈ U} = {(x, y) ∈ U × V| f(x, y) = 0}
6
• 1
-
- A
- B 

( )
• 2 Jacobian
f(x, y) = 0 (x0, y0)
fy (x0, y0)
x2
+ y2
− r2
= 0
y = r2
− x2
fy (r,0) = 2 × 0 = 0
y = ± r2
− x2
fy 7
( )
1.
- ( )
- iMAML
2.
-
- ERNN
8
( )
1.
- ( )
- iMAML
2.
-
- ERNN
9
Implicit Reparameterization Gradients
10
• NeurIPS 2018 accepted
•
- Michael Figurnov, Shakir Mohamed, Andriy Mnih
- DeepMind
• reparameterization
trick
•
iMAML ERNN
11
Reparameterization Trick
• VAE
•
reparameterization trick
•
𝔼q(z; ϕ) [log p (x|z)]−KL (q (z; ϕ)||p (z))
q ϵ = f (z; ϕ) =
z − μϕ
σϕ
ϵ ∼ 𝒩 (0,1)
ϕ ϵ
∇ϕ 𝔼q(z; ϕ) [log p (x|z)] = 𝔼p(ϵ)
[
∇ϕlog p (x|z)
z=f−1
(ϵ; ϕ)]
f f
12
Implicit Reparameterization Gradients
• 1 →
-
-
-
f
ϵ ∼ U (0,1) ϕ
z = f−1
(ϵ; ϕ)
∇ϕ 𝔼q(z; ϕ) [log p (x|z)] = 𝔼p(ϵ) [∇ϕlog p (x|z)]
= 𝔼p(ϵ) [∇zlog p (x|z)∇ϕz]
∇ϕz
13
Implicit Reparameterization Gradients
•
-
-
•
ϵ = f (z; ϕ) ⇔ f (z; ϕ) − ϵ = 0 z ϕ
∇ϕz = −
∇ϕ f (z; ϕ)
∇z f (z; ϕ)
= −
∇ϕ f (z; ϕ)
q (z; ϕ)
z q (z; ϕ) f−1
14
Meta-Learning with Implicit Gradients
15
• NeurIPS 2019 accepted
•
- Aravind Rajeswaran, Chelsea Finn, Sham Kakade, Sergey Levine
- MAML
• MAML
16
Model-Agnostic Meta-Learning (MAML)
•
-
- 1 (one-step adaptation)
θ*ML
:= argmin
θ∈Θ
F(θ),  where F(θ) =
1
M
M
∑
i=1
ℒ ( 𝒜lgi (θ), 𝒟test
i )
𝒜lgi (θ) = θ − α∇θℒ (θ, 𝒟tr
i )
17
MAML
•
-
• MAML
• 1 FOMAML
- FOMAML 

https://www.slideshare.net/DeepLearningJP2016/dl1maml
• iMAML
∇θF (θ) 𝒜lgi (θ)
18
Inner Loop
•
•
𝒜lg⋆
(θ) = argmin
ϕ′∈Φ
Gi (ϕ′, θ)
Gi (ϕ′, θ) = ̂ℒ (ϕ′)+
λ
2
ϕ′− θ
2
19
Outer Loop
• MAML outer loop
• inner loop
➡
θ ← θ − ηdθF(θ)
= θ − η
1
M
M
∑
i=1
d𝒜lgi(θ)
dθ
∇ϕℒi ( 𝒜lgi(θ))
(ϕ = 𝒜lgi(θ))
d𝒜lgi(θ)
dθ
20
Outer Loop
• inner loop
•
•
- adapt
ϕi ≡ 𝒜lg⋆
i (θ) = argmin
ϕ′∈Φ
Gi (ϕ′, θ)
∇ϕ′Gi (ϕ′, θ)
ϕ′=ϕi
= 0
∇ ̂ℒ(ϕi) + λ(𝒜lg⋆
i (θ) − θ) = 0
θ 𝒜lg⋆
(θ)
d𝒜lg⋆
(θ)
dθ
=
(
I +
1
λ
∇2 ̂ℒ (ϕi))
−1
ϕi 21
Outer Loop
• 2
① inner loop adapt
(SGD )
② 3
•
(
I +
1
λ
∇2 ̂ℒ (ϕi))
−1
ϕi
(
I +
1
λ
∇2 ̂ℒ (ϕi))
−1
∇ϕℒi ( 𝒜lgi(θ))
22
(CG )
•
•
Ax = b ⋯(1)
(1) f(x) =
1
2
xT
Ax − bT
x
x0 = 0,r0 = b − Ax0, p0 = r0
αk =
rT
k pk
pT
k Apk
xk+1 = xk + αk pk
rk+1 = rk − αkApk
pk+1 = rk+1 +
rT
k+1rk+1
rT
k rk
pk
23
(CG )
•
•
( 5 )
- (p22 ① )
‣ Appendix E
gi =
(
I +
1
λ
∇2 ̂ℒ (ϕi))
−1
∇ϕℒi ( 𝒜lgi(θ)) gi
(
I +
1
λ
∇2 ̂ℒ (ϕi))
gi = ∇ϕℒi ( 𝒜lgi(θ))
rk
𝒜lgi(θ)
24
iMAML
• inner loop
➡adapt
• outer loop inner loop
➡inner loop
‣ MAML 1
‣ iMAML Hessian-Free 2
adapt
25
•
- iMAML inner loop ( )
- FOMAML (CG )
- MAML 

(FOMAML ??)
O(1)
26
• Omniglot
- inner loop Hessian-Free iMAML
- iMAML way ( )
- FOMAML
27
• Mini-ImageNet
- Reptile (FOMAML )
-
??
28
iMAML
•
iMAML
• MAML
•
•
•
-
29
( )
1.
- ( )
- iMAML
2.
-
- ERNN
30
RNNs Evolving on an Equilibrium Manifold:
A Panacea for Vanishing and Exploding Gradients?
31
•
- Anil Kag, Ziming Zhang, Venkatesh Saligrama
- , MERL
• NeurIPS 2019 reject
•
RNN
•
•
32
RNN /
• RNN
- sigmoid tanh
• RNN /
- LSTM GRU
hk = ϕ (Uhk−1 + Wxk + b)
ϕ
∂hm
∂hn
=
∏
m≥k>n
∂hk
∂hk−1
=
∏
m≥k>n
∇ϕ (Uhk−1 + Wxk + b) U
33
RNN ODE
• RNN skip connection
(ODE)
• Neural ODE
- 

https://www.slideshare.net/DeepLearningJP2016/dlneural-ordinary-
differential-equations
dh(t)
dt
≜ h′(t) = ϕ (Uh(t) + Wxk + b)
⟹ hk = hk−1 + ηϕ (Uhk−1 + Wxk + b)
34
ODE
• ODE
• 1
➡ ( )
• ERNN
dh
dt
= f (h, x) f (h, x) = 0 ⋯(1)
(1) h x (h0, x0)
fh (h0, x0) (1)
h = g (x)
(h0, x0)
35
ERNN
• ERNN 



ODE
• 



➡ 

h′(t) = ϕ (U (h(t) + hk−1) + Wxk + b) − γ (h(t) + hk−1)
h′(t) = 0 hk
hk f (hk−1, h) = ϕ (U (h + hk−1) + Wxk + b) − γ (h + hk−1) = 0
∂h
∂hk−1
= −
∂f/∂hk−1
∂f/∂h
= − I
∂f/∂h
36
∂f/∂h
•
1. (sigmoid tanh OK)
2.
- 

( )
∂f
∂h
= ∇ϕ (U (h + hk−1) + Wxk + b) U
ϕ
U
37
•
• 5
•
-
h(0)
k
= 0
h(i+1)
k
= h(i)
k
+ η(i)
k [
ϕ
(
U (h(i)
k
+ hk−1) + Wxk + b
)
− γ (h(i)
k
+ hk−1)]
η(i)
k
38
HAR-2 RNN ERNN (log scale)
• RNN
• ERNN 1
∂hT
∂h1
39
• RNN 

ERNN
40
• SoTA
•
•
41
ERNN
• NN
1
RNN
•
• SoTA
• RNN
• accept
42
&
•
• iMAML ERNN
•
43

More Related Content

PDF
機械学習モデルのハイパパラメータ最適化
PDF
[DL輪読会]近年のエネルギーベースモデルの進展
PDF
DeepLearning 14章 自己符号化器
PPTX
変分ベイズ法の説明
PDF
Optimizer入門&最新動向
PPTX
猫でも分かるVariational AutoEncoder
PDF
Stochastic Variational Inference
PDF
クラシックな機械学習入門:付録:よく使う線形代数の公式
機械学習モデルのハイパパラメータ最適化
[DL輪読会]近年のエネルギーベースモデルの進展
DeepLearning 14章 自己符号化器
変分ベイズ法の説明
Optimizer入門&最新動向
猫でも分かるVariational AutoEncoder
Stochastic Variational Inference
クラシックな機械学習入門:付録:よく使う線形代数の公式

What's hot (20)

PDF
機械学習による統計的実験計画(ベイズ最適化を中心に)
PPTX
Curriculum Learning (関東CV勉強会)
PDF
計算論的学習理論入門 -PAC学習とかVC次元とか-
PDF
数学で解き明かす深層学習の原理
PDF
SSII2021 [OS2-01] 転移学習の基礎:異なるタスクの知識を利用するための機械学習の方法
PDF
Control as Inference (強化学習とベイズ統計)
PPTX
ベイズ統計学の概論的紹介
PDF
PRML 6.1章 カーネル法と双対表現
PPTX
ようやく分かった!最尤推定とベイズ推定
PPTX
強化学習1章
PDF
スペクトラル・クラスタリング
PDF
ベータ分布の謎に迫る
PDF
スパースモデリング、スパースコーディングとその数理(第11回WBA若手の会)
PDF
クラシックな機械学習の入門 6. 最適化と学習アルゴリズム
PDF
【論文読み会】Universal Language Model Fine-tuning for Text Classification
PDF
変分推論と Normalizing Flow
PDF
強化学習と逆強化学習を組み合わせた模倣学習
PDF
Fisher Vectorによる画像認識
PDF
多様な強化学習の概念と課題認識
PDF
[DL輪読会]A Bayesian Perspective on Generalization and Stochastic Gradient Descent
機械学習による統計的実験計画(ベイズ最適化を中心に)
Curriculum Learning (関東CV勉強会)
計算論的学習理論入門 -PAC学習とかVC次元とか-
数学で解き明かす深層学習の原理
SSII2021 [OS2-01] 転移学習の基礎:異なるタスクの知識を利用するための機械学習の方法
Control as Inference (強化学習とベイズ統計)
ベイズ統計学の概論的紹介
PRML 6.1章 カーネル法と双対表現
ようやく分かった!最尤推定とベイズ推定
強化学習1章
スペクトラル・クラスタリング
ベータ分布の謎に迫る
スパースモデリング、スパースコーディングとその数理(第11回WBA若手の会)
クラシックな機械学習の入門 6. 最適化と学習アルゴリズム
【論文読み会】Universal Language Model Fine-tuning for Text Classification
変分推論と Normalizing Flow
強化学習と逆強化学習を組み合わせた模倣学習
Fisher Vectorによる画像認識
多様な強化学習の概念と課題認識
[DL輪読会]A Bayesian Perspective on Generalization and Stochastic Gradient Descent
Ad

Similar to [DL輪読会]陰関数微分を用いた深層学習 (20)

PDF
Meta-Learning with Implicit Gradients
PDF
Deep Implicit Layers: Learning Structured Problems with Neural Networks
PDF
Learning Financial Market Data with Recurrent Autoencoders and TensorFlow
PDF
On First-Order Meta-Learning Algorithms
PDF
[update] Introductory Parts of the Book "Dive into Deep Learning"
PDF
Introduction to Machine Learning
PDF
Rnn presentation 2
PDF
Neural Networks and Deep Learning Syllabus
PPTX
Tensorflow, deep learning and recurrent neural networks without a ph d
PDF
Deep Learning in Finance
PPTX
1. Introduction to deep learning.pptx
PPTX
Machine Learning Essentials Demystified part2 | Big Data Demystified
PPTX
Practical ML
PPTX
Symbolic Background Knowledge for Machine Learning
PDF
Machine Learning Today: Current Research And Advances From AMLAB, UvA
PDF
[DL輪読会]Recent Advances in Autoencoder-Based Representation Learning
PDF
BOIL: Towards Representation Change for Few-shot Learning
PPTX
1. Introduction to deep learning.pptx
PDF
Introduction to MAML (Model Agnostic Meta Learning) with Discussions
PPTX
Deep Learning Sample Class (Jon Lederman)
Meta-Learning with Implicit Gradients
Deep Implicit Layers: Learning Structured Problems with Neural Networks
Learning Financial Market Data with Recurrent Autoencoders and TensorFlow
On First-Order Meta-Learning Algorithms
[update] Introductory Parts of the Book "Dive into Deep Learning"
Introduction to Machine Learning
Rnn presentation 2
Neural Networks and Deep Learning Syllabus
Tensorflow, deep learning and recurrent neural networks without a ph d
Deep Learning in Finance
1. Introduction to deep learning.pptx
Machine Learning Essentials Demystified part2 | Big Data Demystified
Practical ML
Symbolic Background Knowledge for Machine Learning
Machine Learning Today: Current Research And Advances From AMLAB, UvA
[DL輪読会]Recent Advances in Autoencoder-Based Representation Learning
BOIL: Towards Representation Change for Few-shot Learning
1. Introduction to deep learning.pptx
Introduction to MAML (Model Agnostic Meta Learning) with Discussions
Deep Learning Sample Class (Jon Lederman)
Ad

More from Deep Learning JP (20)

PPTX
【DL輪読会】AdaptDiffuser: Diffusion Models as Adaptive Self-evolving Planners
PPTX
【DL輪読会】事前学習用データセットについて
PPTX
【DL輪読会】 "Learning to render novel views from wide-baseline stereo pairs." CVP...
PPTX
【DL輪読会】Zero-Shot Dual-Lens Super-Resolution
PPTX
【DL輪読会】BloombergGPT: A Large Language Model for Finance arxiv
PPTX
【DL輪読会】マルチモーダル LLM
PDF
【 DL輪読会】ToolLLM: Facilitating Large Language Models to Master 16000+ Real-wo...
PPTX
【DL輪読会】AnyLoc: Towards Universal Visual Place Recognition
PDF
【DL輪読会】Can Neural Network Memorization Be Localized?
PPTX
【DL輪読会】Hopfield network 関連研究について
PPTX
【DL輪読会】SimPer: Simple self-supervised learning of periodic targets( ICLR 2023 )
PDF
【DL輪読会】RLCD: Reinforcement Learning from Contrast Distillation for Language M...
PDF
【DL輪読会】"Secrets of RLHF in Large Language Models Part I: PPO"
PPTX
【DL輪読会】"Language Instructed Reinforcement Learning for Human-AI Coordination "
PPTX
【DL輪読会】Llama 2: Open Foundation and Fine-Tuned Chat Models
PDF
【DL輪読会】"Learning Fine-Grained Bimanual Manipulation with Low-Cost Hardware"
PPTX
【DL輪読会】Parameter is Not All You Need:Starting from Non-Parametric Networks fo...
PDF
【DL輪読会】Drag Your GAN: Interactive Point-based Manipulation on the Generative ...
PDF
【DL輪読会】Self-Supervised Learning from Images with a Joint-Embedding Predictive...
PPTX
【DL輪読会】Towards Understanding Ensemble, Knowledge Distillation and Self-Distil...
【DL輪読会】AdaptDiffuser: Diffusion Models as Adaptive Self-evolving Planners
【DL輪読会】事前学習用データセットについて
【DL輪読会】 "Learning to render novel views from wide-baseline stereo pairs." CVP...
【DL輪読会】Zero-Shot Dual-Lens Super-Resolution
【DL輪読会】BloombergGPT: A Large Language Model for Finance arxiv
【DL輪読会】マルチモーダル LLM
【 DL輪読会】ToolLLM: Facilitating Large Language Models to Master 16000+ Real-wo...
【DL輪読会】AnyLoc: Towards Universal Visual Place Recognition
【DL輪読会】Can Neural Network Memorization Be Localized?
【DL輪読会】Hopfield network 関連研究について
【DL輪読会】SimPer: Simple self-supervised learning of periodic targets( ICLR 2023 )
【DL輪読会】RLCD: Reinforcement Learning from Contrast Distillation for Language M...
【DL輪読会】"Secrets of RLHF in Large Language Models Part I: PPO"
【DL輪読会】"Language Instructed Reinforcement Learning for Human-AI Coordination "
【DL輪読会】Llama 2: Open Foundation and Fine-Tuned Chat Models
【DL輪読会】"Learning Fine-Grained Bimanual Manipulation with Low-Cost Hardware"
【DL輪読会】Parameter is Not All You Need:Starting from Non-Parametric Networks fo...
【DL輪読会】Drag Your GAN: Interactive Point-based Manipulation on the Generative ...
【DL輪読会】Self-Supervised Learning from Images with a Joint-Embedding Predictive...
【DL輪読会】Towards Understanding Ensemble, Knowledge Distillation and Self-Distil...

Recently uploaded (20)

PDF
A comparative analysis of optical character recognition models for extracting...
PPTX
cloud_computing_Infrastucture_as_cloud_p
PDF
From MVP to Full-Scale Product A Startup’s Software Journey.pdf
PDF
Approach and Philosophy of On baking technology
PDF
Zenith AI: Advanced Artificial Intelligence
PDF
MIND Revenue Release Quarter 2 2025 Press Release
PDF
DP Operators-handbook-extract for the Mautical Institute
PDF
Building Integrated photovoltaic BIPV_UPV.pdf
PDF
Univ-Connecticut-ChatGPT-Presentaion.pdf
PDF
Transform Your ITIL® 4 & ITSM Strategy with AI in 2025.pdf
PDF
Encapsulation theory and applications.pdf
PPTX
A Presentation on Touch Screen Technology
PDF
project resource management chapter-09.pdf
PDF
Accuracy of neural networks in brain wave diagnosis of schizophrenia
PDF
gpt5_lecture_notes_comprehensive_20250812015547.pdf
PDF
DASA ADMISSION 2024_FirstRound_FirstRank_LastRank.pdf
PDF
Video forgery: An extensive analysis of inter-and intra-frame manipulation al...
PDF
Profit Center Accounting in SAP S/4HANA, S4F28 Col11
PDF
August Patch Tuesday
PPTX
A Presentation on Artificial Intelligence
A comparative analysis of optical character recognition models for extracting...
cloud_computing_Infrastucture_as_cloud_p
From MVP to Full-Scale Product A Startup’s Software Journey.pdf
Approach and Philosophy of On baking technology
Zenith AI: Advanced Artificial Intelligence
MIND Revenue Release Quarter 2 2025 Press Release
DP Operators-handbook-extract for the Mautical Institute
Building Integrated photovoltaic BIPV_UPV.pdf
Univ-Connecticut-ChatGPT-Presentaion.pdf
Transform Your ITIL® 4 & ITSM Strategy with AI in 2025.pdf
Encapsulation theory and applications.pdf
A Presentation on Touch Screen Technology
project resource management chapter-09.pdf
Accuracy of neural networks in brain wave diagnosis of schizophrenia
gpt5_lecture_notes_comprehensive_20250812015547.pdf
DASA ADMISSION 2024_FirstRound_FirstRank_LastRank.pdf
Video forgery: An extensive analysis of inter-and intra-frame manipulation al...
Profit Center Accounting in SAP S/4HANA, S4F28 Col11
August Patch Tuesday
A Presentation on Artificial Intelligence

[DL輪読会]陰関数微分を用いた深層学習

  • 1. 1 
 Deep Learning with Implicit Gradients Shohei Taniguchi, Matsuo Lab (M1)
  • 2. • • • 2 - Meta-Learning with Implicit Gradients ‣ MAML inner update 
 iMAML - RNNs Evolving on an Equilibrium Manifold: A Panacea for Vanishing 
 and Exploding Gradients? ‣ ERNN 2
  • 3. Outline 1. - - 2. - 1 ‣ Implicit Reparameterization Gradients 3. Meta-Learning with Implicit Gradients 4. RNNs Evolving on an Equilibrium Manifold: A Panacea for Vanishing and Exploding Gradients? 3
  • 4. 4
  • 5. • (e.g. 2 ) - - - NN • (e.g. ) - - - 
 ( ) y = f (x) y = ax2 + bx + c f (x, y) = 0 x2 + y2 = r2 x y 5
  • 6. • • , , 
 f(x, y) = 0 dy dx = − ∂f/∂x ∂f/∂y = − fx fy f(x, y) = 0 (x0, y0) fy (x0, y0) x0 ∈ U y0 ∈ V g : U → V {(x, g(x))|x ∈ U} = {(x, y) ∈ U × V| f(x, y) = 0} 6
  • 7. • 1 - - A - B 
 ( ) • 2 Jacobian f(x, y) = 0 (x0, y0) fy (x0, y0) x2 + y2 − r2 = 0 y = r2 − x2 fy (r,0) = 2 × 0 = 0 y = ± r2 − x2 fy 7
  • 8. ( ) 1. - ( ) - iMAML 2. - - ERNN 8
  • 9. ( ) 1. - ( ) - iMAML 2. - - ERNN 9
  • 11. • NeurIPS 2018 accepted • - Michael Figurnov, Shakir Mohamed, Andriy Mnih - DeepMind • reparameterization trick • iMAML ERNN 11
  • 12. Reparameterization Trick • VAE • reparameterization trick • 𝔼q(z; ϕ) [log p (x|z)]−KL (q (z; ϕ)||p (z)) q ϵ = f (z; ϕ) = z − μϕ σϕ ϵ ∼ 𝒩 (0,1) ϕ ϵ ∇ϕ 𝔼q(z; ϕ) [log p (x|z)] = 𝔼p(ϵ) [ ∇ϕlog p (x|z) z=f−1 (ϵ; ϕ)] f f 12
  • 13. Implicit Reparameterization Gradients • 1 → - - - f ϵ ∼ U (0,1) ϕ z = f−1 (ϵ; ϕ) ∇ϕ 𝔼q(z; ϕ) [log p (x|z)] = 𝔼p(ϵ) [∇ϕlog p (x|z)] = 𝔼p(ϵ) [∇zlog p (x|z)∇ϕz] ∇ϕz 13
  • 14. Implicit Reparameterization Gradients • - - • ϵ = f (z; ϕ) ⇔ f (z; ϕ) − ϵ = 0 z ϕ ∇ϕz = − ∇ϕ f (z; ϕ) ∇z f (z; ϕ) = − ∇ϕ f (z; ϕ) q (z; ϕ) z q (z; ϕ) f−1 14
  • 16. • NeurIPS 2019 accepted • - Aravind Rajeswaran, Chelsea Finn, Sham Kakade, Sergey Levine - MAML • MAML 16
  • 17. Model-Agnostic Meta-Learning (MAML) • - - 1 (one-step adaptation) θ*ML := argmin θ∈Θ F(θ),  where F(θ) = 1 M M ∑ i=1 ℒ ( 𝒜lgi (θ), 𝒟test i ) 𝒜lgi (θ) = θ − α∇θℒ (θ, 𝒟tr i ) 17
  • 18. MAML • - • MAML • 1 FOMAML - FOMAML 
 https://www.slideshare.net/DeepLearningJP2016/dl1maml • iMAML ∇θF (θ) 𝒜lgi (θ) 18
  • 19. Inner Loop • • 𝒜lg⋆ (θ) = argmin ϕ′∈Φ Gi (ϕ′, θ) Gi (ϕ′, θ) = ̂ℒ (ϕ′)+ λ 2 ϕ′− θ 2 19
  • 20. Outer Loop • MAML outer loop • inner loop ➡ θ ← θ − ηdθF(θ) = θ − η 1 M M ∑ i=1 d𝒜lgi(θ) dθ ∇ϕℒi ( 𝒜lgi(θ)) (ϕ = 𝒜lgi(θ)) d𝒜lgi(θ) dθ 20
  • 21. Outer Loop • inner loop • • - adapt ϕi ≡ 𝒜lg⋆ i (θ) = argmin ϕ′∈Φ Gi (ϕ′, θ) ∇ϕ′Gi (ϕ′, θ) ϕ′=ϕi = 0 ∇ ̂ℒ(ϕi) + λ(𝒜lg⋆ i (θ) − θ) = 0 θ 𝒜lg⋆ (θ) d𝒜lg⋆ (θ) dθ = ( I + 1 λ ∇2 ̂ℒ (ϕi)) −1 ϕi 21
  • 22. Outer Loop • 2 ① inner loop adapt (SGD ) ② 3 • ( I + 1 λ ∇2 ̂ℒ (ϕi)) −1 ϕi ( I + 1 λ ∇2 ̂ℒ (ϕi)) −1 ∇ϕℒi ( 𝒜lgi(θ)) 22
  • 23. (CG ) • • Ax = b ⋯(1) (1) f(x) = 1 2 xT Ax − bT x x0 = 0,r0 = b − Ax0, p0 = r0 αk = rT k pk pT k Apk xk+1 = xk + αk pk rk+1 = rk − αkApk pk+1 = rk+1 + rT k+1rk+1 rT k rk pk 23
  • 24. (CG ) • • ( 5 ) - (p22 ① ) ‣ Appendix E gi = ( I + 1 λ ∇2 ̂ℒ (ϕi)) −1 ∇ϕℒi ( 𝒜lgi(θ)) gi ( I + 1 λ ∇2 ̂ℒ (ϕi)) gi = ∇ϕℒi ( 𝒜lgi(θ)) rk 𝒜lgi(θ) 24
  • 25. iMAML • inner loop ➡adapt • outer loop inner loop ➡inner loop ‣ MAML 1 ‣ iMAML Hessian-Free 2 adapt 25
  • 26. • - iMAML inner loop ( ) - FOMAML (CG ) - MAML 
 (FOMAML ??) O(1) 26
  • 27. • Omniglot - inner loop Hessian-Free iMAML - iMAML way ( ) - FOMAML 27
  • 28. • Mini-ImageNet - Reptile (FOMAML ) - ?? 28
  • 30. ( ) 1. - ( ) - iMAML 2. - - ERNN 30
  • 31. RNNs Evolving on an Equilibrium Manifold: A Panacea for Vanishing and Exploding Gradients? 31
  • 32. • - Anil Kag, Ziming Zhang, Venkatesh Saligrama - , MERL • NeurIPS 2019 reject • RNN • • 32
  • 33. RNN / • RNN - sigmoid tanh • RNN / - LSTM GRU hk = ϕ (Uhk−1 + Wxk + b) ϕ ∂hm ∂hn = ∏ m≥k>n ∂hk ∂hk−1 = ∏ m≥k>n ∇ϕ (Uhk−1 + Wxk + b) U 33
  • 34. RNN ODE • RNN skip connection (ODE) • Neural ODE - 
 https://www.slideshare.net/DeepLearningJP2016/dlneural-ordinary- differential-equations dh(t) dt ≜ h′(t) = ϕ (Uh(t) + Wxk + b) ⟹ hk = hk−1 + ηϕ (Uhk−1 + Wxk + b) 34
  • 35. ODE • ODE • 1 ➡ ( ) • ERNN dh dt = f (h, x) f (h, x) = 0 ⋯(1) (1) h x (h0, x0) fh (h0, x0) (1) h = g (x) (h0, x0) 35
  • 36. ERNN • ERNN 
 
 ODE • 
 
 ➡ 
 h′(t) = ϕ (U (h(t) + hk−1) + Wxk + b) − γ (h(t) + hk−1) h′(t) = 0 hk hk f (hk−1, h) = ϕ (U (h + hk−1) + Wxk + b) − γ (h + hk−1) = 0 ∂h ∂hk−1 = − ∂f/∂hk−1 ∂f/∂h = − I ∂f/∂h 36
  • 37. ∂f/∂h • 1. (sigmoid tanh OK) 2. - 
 ( ) ∂f ∂h = ∇ϕ (U (h + hk−1) + Wxk + b) U ϕ U 37
  • 38. • • 5 • - h(0) k = 0 h(i+1) k = h(i) k + η(i) k [ ϕ ( U (h(i) k + hk−1) + Wxk + b ) − γ (h(i) k + hk−1)] η(i) k 38
  • 39. HAR-2 RNN ERNN (log scale) • RNN • ERNN 1 ∂hT ∂h1 39