SlideShare a Scribd company logo
VAE-type Deep Generative
Models (Especially RNN + VAE)
Kenta Oono oono@preferred.jp
Preferred Networks Inc.
25th Jun. 2016
Tokyo Webmining @FreakOut
1/34
Notations
• x: observable (visible) variables
• z: latent (hidden) variables
• D = {x1, x2, …, xN}: training dataset
• KL(q || p): KL divergence between two distributions q and p
• θ: parameters of generative model
• φ: parameters of inference model
• pθ: probability distribution modelled by generative model
• qφ: probability distribution modelled by inference model
• N(µ, σ2): Gaussian Distribution with mean µ and standard deviation σ
• Ber(p): Bernoulli Distribution with parameter p
• A := B, B =: A : Define A by B.
• Ex~p[ f (x)] : Expectation of f(x) with respect to x drawn from p. Namely, ∫ f(x) p(x) dx.
2/34
Abbreviations
• NN: Neural Network
• RNN: Recurrent Neural Network
• CNN: Convolutional Neural Network
• ELBO: Evidence Lower BOund
• AE: Auto Encoder
• VAE: Variational Auto Encoder
• LSTM: Long Short-Term Memory
• NLL: Negative Log-Likelihood
3/34
Agenda
• Mathematical formulation of generative models.
• Variational Auto Encoder (VAE)
• Variants of VAE
• RVAE, VRNN, DRAW, Convolutional DRAW, alignDRAW
• Chainer implementation of (Convolutional) DRAW
• Other VAE-like models
• Inverse DRAW, VAE + GAN
• Conclusion
4/34
Generative models and discriminative
models
• Discriminative model
• Models p(z | x)
• e.g. SVM, Logistic Regression Naïve Bayes Classifier etc.
• Generative model ← Todayʼs Topic
• Models p(x, z) or p(x)
• e.g. RBM, HMM, VAE etc.
5/34
Recent trend of generative models by NN
• Helmholtz machine type ← Todayʼs Topic
• Model p(x, z) as p(z) p(x | z)
• Prepare two NNs: Generative model and Inference model
• Use variational inference and train models to maximize ELBO
• e.g. VAE, ADGM, DRAW, IWAE, VRNN etc.
• Generative Adversarial Network (GAN) type
• Model p(x, z) as p(z) p(x | z)
• Prepare two NNs: Generator and Discriminator
• Train models by solving min-max problem
• e.g. GAN, DCGAN, LAPGAN, f-GAN, InfoGAN etc.
• Auto regressive type
• Model p(x) as Πi p(xi | x1, …, xi-1)
• e.g. Pixel RNN, MADE, NADE etc. 6/34
NN as a probabilistic model
• We assume p(x, z) are parameterized by NN whose
parameter (e.g. weights, biases) is θ and denote it by pθ(x, z).
• Training reduces to find θ that maximize some objective
function.
7/34
NN as a probabilistic model (example)
• prior: pθ(z) = N(0, 1)
• generation: pθ(x | z) = N(x | µθ(z), σθ
2 (z))
• µθ and σθ are deteministic NNs which
takes z as a input and outputs scalar
value.
• Although pθ(x | z) is, simple, pθ(x) can
represent complex distribution.
8/34
z
µ σ2
z ~ N(0, 1)
x x ~ N(x | µθ, σθ
2 )
deterministic NNs
sampling
pθ(x)
= ∫ pθ (x | z) pθ (z) dz
= ∫ N(x | µθ(z), σθ
2 (z)) pθ (z) dz
Generation pθ(x | z)
Difficulty of generative models
• Posterior pθ(z | x) is intractable.
9_34
z
x
pθ (x | z) is easy
to sample
×
pθ(z | x) is
intractable
pθ(z | x)
= pθ (x | z) pθ (z) / pθ (x) (Bayesʼ Thm.)
= pθ (x | z) pθ (z) / ∫ pθ (x, z’) dz’
= pθ (x | z) pθ (z) / ∫ pθ (x | z’) pθ (z’) dz’
• In typical situation, we cannot
calculate the integral analytically.
• When zʼ is high-dimensional, the
integral is difficult to estimate (e.g.
MCMC)
Variational inference
• Instead of posterior distribution pθ(z | x),
we consider the set of distributions
{qφ(z | x)}φ∈Φ .
• Φ is a some set of parameters.
• In addition to θ, we try to find φ that
approximates pθ(z | x) well in training.
• Choice of qφ(z | x)
• Easy to calculate or be sampled from.
• e.g. Mean field approximation
• e.g. VAE : NN with params. φ
10_34
Note: To fully describe the distribution qφ, we
need to specify qφ(x). Typically we employ the
empirical distribution of training dataset.
z
x
×
z
x
approximate
Inference
model
qφ(z | x)
Generative
model
pθ (z | x)
Evidence Lower BOund (ELBO)
• Consider single training example x.
11_34
L(x; θ)
L~(x; θ, φ)
difference
= KL(q(z | x) || p(z | x))
L(x; θ) := log pθ(x)
= log ∫ pθ(x, z)dz
= log ∫ qφ(z | x) pθ(x, z) / qφ(z | x) dz
≧ ∫ qφ(z | x) log pθ(x, z) / qφ(z | x) dz (Jensen)
=: L~(x; θ, φ)
• Instead of L(x; θ), we maximize L~(x; θ, φ)
with respect to θ and φ.
• We call L~ Evidence Lower BOund (ELBO).
Agenda
• Mathematical formulation of generative models.
• Variational Auto Encoder (VAE)
• Variants of VAE: RNN + VAE
• RVAE, VRNN, DRAW, Convolutional DRAW, alignDRAW
• Chainer implementation of (Convolutional) DRAW
• Other VAE-like models
• Inverse DRAW, VAE + GAN
• Conclusion
12_34
Variational AutoEncoder (VAE)
[Kingma+13]
• Use NN as an inference model.
• Training with backpropagation.
• How to calculate gradient?
• REINFORCE (a.k.a Likelihood Ratio (LR))
• Control Variate
• Reparameterization trick [Kingma+13]
(a.k.a Stochastic Gradient Variational
Bayes (SGVB) [Rezende+14])
13/34
Kingma, D. P., & Welling, M. (2013). Auto-encoding variational bayes. arXiv preprint
arXiv:1312.6114.
Rezende, D. J., Mohamed, S., & Wierstra, D. (2014). Stochastic backpropagation and approximate
inference in deep generative models. arXiv preprint arXiv:1401.4082.
x
x’
Decoder
= Generative
model
Encoder
=Inference
model
z
Training Procedure
• ELBO L~(x; θ, φ) equals to Ez~q(z | x) [log p(x | z)] - KL(q(z | x) || p(z))
• 1st term: Reconstruction Loss
• 2nd term: Regularization Loss
14/34
z
x
Inference
model
qφ
z
x’
Generative
model
pθ
2. Inference model tries to
make posterior close to the
prior of generation model
4. Generation model tries to
reconstruct the input data
Calculate Reconstruction loss
1. Input is fed to
inference model
3. Latent variable is pass
generation model.
Calculate regularized loss
NN +
sampling
NN +
sampling
Generation
• We can generate data points with trained generative models.
15/34
z
x’
Generative
model
pθ
NN +
sampling
1. sample from prior
~ pθ(z) (e.g. N(0, 1))
2. propagate down
Agenda
• Mathematical formulation of generative models.
• Variational Auto Encoder (VAE)
• Variants of VAE: RNN + VAE
• RVAE, VRNN, DRAW, Convolutional DRAW, alignDRAW
• Chainer implementation of (Convolutional) DRAW
• Misc.
• Inverse DRAW, VAE + GAN
• Conclusion
16/34
Variational Recurrent AutoEncoder (VRAE)
[Fabius+14]
• The modification of VAE where two models (inference model
and generative model) are replaced with RNNs.
17_34
Fabius, O., & van Amersfoort, J. R. (2014). Variational recurrent auto-
encoders. arXiv preprint arXiv:1412.6581.
ht ht+1 hT
z h0
x1’
xt-1 xt xT-1
ht
xt+1’
Encoder
Decoder
RNN
RNNht-1
xt’
Variational RNN (VRNN) [Chung+15]
• Inference and generative
models share the hidden
state h and update it
throughout time. Latent
variable z is sampled from
the state.
18_34
Chung, J., Kastner, K., Dinh, L., Goel, K., Courville, A. C., & Bengio, Y. (2015). A recurrent latent variable
model for sequential data. In Advances in neural information processing systems (pp. 2980-2988).
ht-1 ht ht+1
xt xt+1
ht-1 ht-1
xt’ xt+1
’
zt’ zt+1’
Encoder
Decoder
zt zt+1
RNN RNN
DRAW [Gregor+15]
• “Generative model of natural images that operates by
making a large number of small contributions to an additive
canvas using an attention model”.
• Inference and generative models are independent RNNs.
19/34
Gregor, K., Danihelka, I., Graves, A., Rezende, D. J., & Wierstra, D. (2015). DRAW: A
recurrent neural network for image generation. arXiv preprint arXiv:1502.04623.
DRAW without attention [Gregor+15]
20/34
x
ht
e
ht
d
Δct
+
x
ht+1
e
ht+1
d
Δct+1
ct +ct-1 ct+1
Encoder
Decoder
zt zt+1
cT
x’
σ
RNN
RNN
RNN
RNN
RNN
RNN
DRAW [Gregor+15]
21/34
x
rt
ht
e
ht
d
Δct
+
x
rt+1
ht+1
e
ht+1
d
Δct+1
at at+1
at
ct +ct-1
at+1
ct+1
zt zt+1
cT
x’
σ
RNN
RNN
RNN
RNN
RNN
RNN
Encoder
Decoder
Convolutional DRAW [Gregor+16]
• The variant of DRAW with following modifications:
• Linear connections are replaced with convolutions (including
connections in LSTM).
• Read and write attention mechanisms are removed.
• Instead of sampling from Standard Gaussian distribution in DRAW,
prior of generative model depends on decoderʼs state.
• But details of the implementation is not fully described in the
paper ...
22/34
Gregor, K., Besse, F., Rezende, D. J., Danihelka, I., & Wierstra, D. (2016).
Towards Conceptual Compression. arXiv preprint arXiv:1604.08772.
alignDRAW [Mansimov+15]
• Generate image from its caption.
23/34
Mansimov, E., Parisotto, E., Ba, J. L., & Salakhutdinov, R. (2015). Generating images
from captions with attention. arXiv preprint arXiv:1511.02793.
Implemantation of convolutional DRAW
with Chainer
24
Reconstruction
Generation
Generation (linear connection)
My implementation of
convolutional DRAW
25/34
y
x
+
eembe
ht
e LSTM ht
e
ztembd
+ht
d LSTM ht
d
Δct
+ct ct+1
µt
d σt
d2
µt
e σt
e2
Convolution
Linear
Identity
Samplingct
-
xt+1
’
σ
NLL loss
Deconvolution
y
Encoder
Decoder
Agenda
• Mathematical formulation of generative models.
• Variational Auto Encoder (VAE)
• Variants of VAE: RNN + VAE
• RVAE, VRNN, DRAW, Convolutional DRAW, alignDRAW
• Chainer implementation of (Convolutional) DRAW
• Other VAE-like models
• Inverse DRAW, VAE + GAN
• Conclusion
26/34
VAE + GAN [Larsen+15]
• Use generative model of VAE as
the generator of GAN.
27/34
Larsen, A. B. L., Sønderby, S. K., & Winther, O. (2015). Autoencoding beyond
pixels using a learned similarity metric. arXiv preprint arXiv:1512.09300.
Inverse DRAW
• a
28/34https://openai.com/requests-for-research/#inverse-draw
cf. InfoGAN[Chen+16]
• Make latent variables of GAN interpretable.
29/34
Chen, X., Duan, Y., Houthooft, R., Schulman, J., Sutskever, I., & Abbeel, P. (2016). InfoGAN:
Interpretable Representation Learning by Information Maximizing Generative Adversarial Nets.
arXiv preprint arXiv:1606.03657.
Agenda
• Mathematical formulation of generative models.
• Variational Auto Encoder (VAE)
• Variants of VAE: RNN + VAE
• RVAE, VRNN, DRAW, Convolutional DRAW, alignDRAW
• Chainer implementation of (Convolutional) DRAW
• Other VAE-like models
• Inverse DRAW, VAE + GAN
• Conclusion
30/34
Challenges of VAE-like generative models
• Compared to GAN, the images generated by VAE-like models
are said to be blurry.
• Difficulty of evaluation.
• The following common evaluation criteria are independent in some
situation [Theis+15].
• average log-likelihood
• Parzen window estimates
• visual fidelity of samples
• We can evaluate exactly only lower bound of log-likelihood.
• Generation of high dimensional images is still challenging.
31/34
Theis, L., Oord, A. V. D., & Bethge, M. (2015). A note on the
evaluation of generative models. arXiv preprint arXiv:1511.01844.
Many many topics are not covered today.
• VAE + Gaussian Process
• VAE-DGP, Variational GP, Recurrent GP
• Tighter lower bound of log-likelihood
• Importance Weighted AE
• Generative model with more complex prior distribution
• Hierachical Variational Model, Auxiliary Deep Generative Model,
Hamiltonial Variational Inference, Normalizing Flow, Gradient Flow,
Inverse Autoregressive Flow,
• Automatic Variational Inference
32/34
Related conferences, workshops and blogs
• NIPS 2015
• Advances in Approximate Bayesian Inference (AABI)
• http://approximateinference.org/accepted/
• Black Box Learning and Inference
• http://www.blackboxworkshop.org
• ICLR 2016
• http://www.iclr.cc/doku.php?id=iclr2016:main
• OpenAI
• Blog: Generative Models
• https://openai.com/blog/generative-models/
33/34
Summary
• VAE is a generative model that parameterize the inference
and generative models with NNs and optimize them by
maximizing the ELBO of loglikelihood.
• Recently the variant of VAE is proposed including RVAE,
VRNN, and (Convolutional) DRAW.
• Introduced the implementation of generative model with
Chainer.
34/34

More Related Content

PDF
Iclr2016 vaeまとめ
PDF
[DL輪読会]Reward Augmented Maximum Likelihood for Neural Structured Prediction
PDF
POMDP下での強化学習の基礎と応用
PPTX
[Ridge-i 論文よみかい] Wasserstein auto encoder
PPTX
[DL輪読会]大規模分散強化学習の難しい問題設定への適用
PPTX
MS COCO Dataset Introduction
PDF
Decision Transformer: Reinforcement Learning via Sequence Modeling
PPTX
[DL輪読会]相互情報量最大化による表現学習
Iclr2016 vaeまとめ
[DL輪読会]Reward Augmented Maximum Likelihood for Neural Structured Prediction
POMDP下での強化学習の基礎と応用
[Ridge-i 論文よみかい] Wasserstein auto encoder
[DL輪読会]大規模分散強化学習の難しい問題設定への適用
MS COCO Dataset Introduction
Decision Transformer: Reinforcement Learning via Sequence Modeling
[DL輪読会]相互情報量最大化による表現学習

What's hot (20)

PDF
High performance python computing for data science
PDF
Random Forestsとその応用
PDF
コンピューテーショナルフォトグラフィ
PDF
[DL輪読会]An Iterative Framework for Self-supervised Deep Speaker Representatio...
PPTX
【DL輪読会】論文解説:Offline Reinforcement Learning as One Big Sequence Modeling Problem
PPTX
近年のHierarchical Vision Transformer
PDF
【DL輪読会】"Secrets of RLHF in Large Language Models Part I: PPO"
PDF
ConvNetの歴史とResNet亜種、ベストプラクティス
PPTX
[DL輪読会]Learning Latent Dynamics for Planning from Pixels
PPTX
【DL輪読会】Contrastive Learning as Goal-Conditioned Reinforcement Learning
PPTX
Semi supervised, weakly-supervised, unsupervised, and active learning
PPTX
【DL輪読会】ViT + Self Supervised Learningまとめ
PPTX
【DL輪読会】Hyena Hierarchy: Towards Larger Convolutional Language Models
PDF
TransPose: Towards Explainable Human Pose Estimation by Transformer
PPTX
SegFormer: Simple and Efficient Design for Semantic Segmentation with Transfo...
PDF
【メタサーベイ】数式ドリブン教師あり学習
PDF
[DL輪読会]Diffusion-based Voice Conversion with Fast Maximum Likelihood Samplin...
PDF
[DL輪読会]Recent Advances in Autoencoder-Based Representation Learning
PPTX
[DLHacks]StyleGANとBigGANのStyle mixing, morphing
PDF
再帰型ニューラルネット in 機械学習プロフェッショナルシリーズ輪読会
High performance python computing for data science
Random Forestsとその応用
コンピューテーショナルフォトグラフィ
[DL輪読会]An Iterative Framework for Self-supervised Deep Speaker Representatio...
【DL輪読会】論文解説:Offline Reinforcement Learning as One Big Sequence Modeling Problem
近年のHierarchical Vision Transformer
【DL輪読会】"Secrets of RLHF in Large Language Models Part I: PPO"
ConvNetの歴史とResNet亜種、ベストプラクティス
[DL輪読会]Learning Latent Dynamics for Planning from Pixels
【DL輪読会】Contrastive Learning as Goal-Conditioned Reinforcement Learning
Semi supervised, weakly-supervised, unsupervised, and active learning
【DL輪読会】ViT + Self Supervised Learningまとめ
【DL輪読会】Hyena Hierarchy: Towards Larger Convolutional Language Models
TransPose: Towards Explainable Human Pose Estimation by Transformer
SegFormer: Simple and Efficient Design for Semantic Segmentation with Transfo...
【メタサーベイ】数式ドリブン教師あり学習
[DL輪読会]Diffusion-based Voice Conversion with Fast Maximum Likelihood Samplin...
[DL輪読会]Recent Advances in Autoencoder-Based Representation Learning
[DLHacks]StyleGANとBigGANのStyle mixing, morphing
再帰型ニューラルネット in 機械学習プロフェッショナルシリーズ輪読会
Ad

Similar to VAE-type Deep Generative Models (20)

PDF
Variational autoencoders for speech processing d.bielievtsov dataconf 21 04 18
PDF
Expectation propagation
PDF
Divide_and_Contrast__Source_free_Domain_Adaptation_via_Adaptive_Contrastive_L...
PPTX
feature matching and model fitting .pptx
PDF
Maximum likelihood estimation of regularisation parameters in inverse problem...
PDF
talk MCMC & SMC 2004
PDF
09Evaluation_Clustering.pdf
PDF
gans_copy.pdfhjsjsisidkskskkskwkduydjekedj
PDF
Introduction to Generative Adversarial Network
PDF
QMC Program: Trends and Advances in Monte Carlo Sampling Algorithms Workshop,...
PDF
Higher-order Factorization Machines(第5回ステアラボ人工知能セミナー)
PDF
Applied machine learning for search engine relevance 3
PPTX
Dirichlet processes and Applications
PDF
Mit6 094 iap10_lec03
PPTX
20230213_ComputerVision_연구.pptx
PDF
Incremental and Multi-feature Tensor Subspace Learning applied for Background...
PDF
Bayesian Deep Learning
PDF
the ABC of ABC
PDF
Coordinate sampler: A non-reversible Gibbs-like sampler
PDF
The Magic of Auto Differentiation
Variational autoencoders for speech processing d.bielievtsov dataconf 21 04 18
Expectation propagation
Divide_and_Contrast__Source_free_Domain_Adaptation_via_Adaptive_Contrastive_L...
feature matching and model fitting .pptx
Maximum likelihood estimation of regularisation parameters in inverse problem...
talk MCMC & SMC 2004
09Evaluation_Clustering.pdf
gans_copy.pdfhjsjsisidkskskkskwkduydjekedj
Introduction to Generative Adversarial Network
QMC Program: Trends and Advances in Monte Carlo Sampling Algorithms Workshop,...
Higher-order Factorization Machines(第5回ステアラボ人工知能セミナー)
Applied machine learning for search engine relevance 3
Dirichlet processes and Applications
Mit6 094 iap10_lec03
20230213_ComputerVision_연구.pptx
Incremental and Multi-feature Tensor Subspace Learning applied for Background...
Bayesian Deep Learning
the ABC of ABC
Coordinate sampler: A non-reversible Gibbs-like sampler
The Magic of Auto Differentiation
Ad

More from Kenta Oono (20)

PDF
Minimax statistical learning with Wasserstein distances (NeurIPS2018 Reading ...
PDF
Deep learning for molecules, introduction to chainer chemistry
PDF
Overview of Machine Learning for Molecules and Materials Workshop @ NIPS2017
PDF
Comparison of deep learning frameworks from a viewpoint of double backpropaga...
PDF
深層学習フレームワーク概要とChainerの事例紹介
PDF
20170422 数学カフェ Part2
PDF
20170422 数学カフェ Part1
PDF
情報幾何学の基礎、第7章発表ノート
PDF
GTC Japan 2016 Chainer feature introduction
PDF
On the benchmark of Chainer
PDF
Tokyo Webmining Talk1
PDF
Common Design of Deep Learning Frameworks
PDF
Introduction to Chainer and CuPy
PDF
Stochastic Gradient MCMC
PDF
Chainer Contribution Guide
PDF
2015年9月18日 (GTC Japan 2015) 深層学習フレームワークChainerの導入と化合物活性予測への応用
PDF
Introduction to Chainer (LL Ring Recursive)
PDF
日本神経回路学会セミナー「DeepLearningを使ってみよう!」資料
PDF
提供AMIについて
PDF
Chainerインストール
Minimax statistical learning with Wasserstein distances (NeurIPS2018 Reading ...
Deep learning for molecules, introduction to chainer chemistry
Overview of Machine Learning for Molecules and Materials Workshop @ NIPS2017
Comparison of deep learning frameworks from a viewpoint of double backpropaga...
深層学習フレームワーク概要とChainerの事例紹介
20170422 数学カフェ Part2
20170422 数学カフェ Part1
情報幾何学の基礎、第7章発表ノート
GTC Japan 2016 Chainer feature introduction
On the benchmark of Chainer
Tokyo Webmining Talk1
Common Design of Deep Learning Frameworks
Introduction to Chainer and CuPy
Stochastic Gradient MCMC
Chainer Contribution Guide
2015年9月18日 (GTC Japan 2015) 深層学習フレームワークChainerの導入と化合物活性予測への応用
Introduction to Chainer (LL Ring Recursive)
日本神経回路学会セミナー「DeepLearningを使ってみよう!」資料
提供AMIについて
Chainerインストール

Recently uploaded (20)

PDF
Accuracy of neural networks in brain wave diagnosis of schizophrenia
PPTX
OMC Textile Division Presentation 2021.pptx
PDF
Getting Started with Data Integration: FME Form 101
PDF
Agricultural_Statistics_at_a_Glance_2022_0.pdf
PDF
August Patch Tuesday
PPTX
Tartificialntelligence_presentation.pptx
PDF
NewMind AI Weekly Chronicles - August'25-Week II
PPTX
TLE Review Electricity (Electricity).pptx
PDF
Heart disease approach using modified random forest and particle swarm optimi...
PPTX
A Presentation on Artificial Intelligence
PDF
Mobile App Security Testing_ A Comprehensive Guide.pdf
PDF
Building Integrated photovoltaic BIPV_UPV.pdf
PDF
Profit Center Accounting in SAP S/4HANA, S4F28 Col11
PPTX
Group 1 Presentation -Planning and Decision Making .pptx
PDF
Advanced methodologies resolving dimensionality complications for autism neur...
PDF
gpt5_lecture_notes_comprehensive_20250812015547.pdf
PDF
Architecting across the Boundaries of two Complex Domains - Healthcare & Tech...
PDF
Mushroom cultivation and it's methods.pdf
PDF
Unlocking AI with Model Context Protocol (MCP)
PPTX
TechTalks-8-2019-Service-Management-ITIL-Refresh-ITIL-4-Framework-Supports-Ou...
Accuracy of neural networks in brain wave diagnosis of schizophrenia
OMC Textile Division Presentation 2021.pptx
Getting Started with Data Integration: FME Form 101
Agricultural_Statistics_at_a_Glance_2022_0.pdf
August Patch Tuesday
Tartificialntelligence_presentation.pptx
NewMind AI Weekly Chronicles - August'25-Week II
TLE Review Electricity (Electricity).pptx
Heart disease approach using modified random forest and particle swarm optimi...
A Presentation on Artificial Intelligence
Mobile App Security Testing_ A Comprehensive Guide.pdf
Building Integrated photovoltaic BIPV_UPV.pdf
Profit Center Accounting in SAP S/4HANA, S4F28 Col11
Group 1 Presentation -Planning and Decision Making .pptx
Advanced methodologies resolving dimensionality complications for autism neur...
gpt5_lecture_notes_comprehensive_20250812015547.pdf
Architecting across the Boundaries of two Complex Domains - Healthcare & Tech...
Mushroom cultivation and it's methods.pdf
Unlocking AI with Model Context Protocol (MCP)
TechTalks-8-2019-Service-Management-ITIL-Refresh-ITIL-4-Framework-Supports-Ou...

VAE-type Deep Generative Models

  • 1. VAE-type Deep Generative Models (Especially RNN + VAE) Kenta Oono [email protected] Preferred Networks Inc. 25th Jun. 2016 Tokyo Webmining @FreakOut 1/34
  • 2. Notations • x: observable (visible) variables • z: latent (hidden) variables • D = {x1, x2, …, xN}: training dataset • KL(q || p): KL divergence between two distributions q and p • θ: parameters of generative model • φ: parameters of inference model • pθ: probability distribution modelled by generative model • qφ: probability distribution modelled by inference model • N(µ, σ2): Gaussian Distribution with mean µ and standard deviation σ • Ber(p): Bernoulli Distribution with parameter p • A := B, B =: A : Define A by B. • Ex~p[ f (x)] : Expectation of f(x) with respect to x drawn from p. Namely, ∫ f(x) p(x) dx. 2/34
  • 3. Abbreviations • NN: Neural Network • RNN: Recurrent Neural Network • CNN: Convolutional Neural Network • ELBO: Evidence Lower BOund • AE: Auto Encoder • VAE: Variational Auto Encoder • LSTM: Long Short-Term Memory • NLL: Negative Log-Likelihood 3/34
  • 4. Agenda • Mathematical formulation of generative models. • Variational Auto Encoder (VAE) • Variants of VAE • RVAE, VRNN, DRAW, Convolutional DRAW, alignDRAW • Chainer implementation of (Convolutional) DRAW • Other VAE-like models • Inverse DRAW, VAE + GAN • Conclusion 4/34
  • 5. Generative models and discriminative models • Discriminative model • Models p(z | x) • e.g. SVM, Logistic Regression Naïve Bayes Classifier etc. • Generative model ← Todayʼs Topic • Models p(x, z) or p(x) • e.g. RBM, HMM, VAE etc. 5/34
  • 6. Recent trend of generative models by NN • Helmholtz machine type ← Todayʼs Topic • Model p(x, z) as p(z) p(x | z) • Prepare two NNs: Generative model and Inference model • Use variational inference and train models to maximize ELBO • e.g. VAE, ADGM, DRAW, IWAE, VRNN etc. • Generative Adversarial Network (GAN) type • Model p(x, z) as p(z) p(x | z) • Prepare two NNs: Generator and Discriminator • Train models by solving min-max problem • e.g. GAN, DCGAN, LAPGAN, f-GAN, InfoGAN etc. • Auto regressive type • Model p(x) as Πi p(xi | x1, …, xi-1) • e.g. Pixel RNN, MADE, NADE etc. 6/34
  • 7. NN as a probabilistic model • We assume p(x, z) are parameterized by NN whose parameter (e.g. weights, biases) is θ and denote it by pθ(x, z). • Training reduces to find θ that maximize some objective function. 7/34
  • 8. NN as a probabilistic model (example) • prior: pθ(z) = N(0, 1) • generation: pθ(x | z) = N(x | µθ(z), σθ 2 (z)) • µθ and σθ are deteministic NNs which takes z as a input and outputs scalar value. • Although pθ(x | z) is, simple, pθ(x) can represent complex distribution. 8/34 z µ σ2 z ~ N(0, 1) x x ~ N(x | µθ, σθ 2 ) deterministic NNs sampling pθ(x) = ∫ pθ (x | z) pθ (z) dz = ∫ N(x | µθ(z), σθ 2 (z)) pθ (z) dz Generation pθ(x | z)
  • 9. Difficulty of generative models • Posterior pθ(z | x) is intractable. 9_34 z x pθ (x | z) is easy to sample × pθ(z | x) is intractable pθ(z | x) = pθ (x | z) pθ (z) / pθ (x) (Bayesʼ Thm.) = pθ (x | z) pθ (z) / ∫ pθ (x, z’) dz’ = pθ (x | z) pθ (z) / ∫ pθ (x | z’) pθ (z’) dz’ • In typical situation, we cannot calculate the integral analytically. • When zʼ is high-dimensional, the integral is difficult to estimate (e.g. MCMC)
  • 10. Variational inference • Instead of posterior distribution pθ(z | x), we consider the set of distributions {qφ(z | x)}φ∈Φ . • Φ is a some set of parameters. • In addition to θ, we try to find φ that approximates pθ(z | x) well in training. • Choice of qφ(z | x) • Easy to calculate or be sampled from. • e.g. Mean field approximation • e.g. VAE : NN with params. φ 10_34 Note: To fully describe the distribution qφ, we need to specify qφ(x). Typically we employ the empirical distribution of training dataset. z x × z x approximate Inference model qφ(z | x) Generative model pθ (z | x)
  • 11. Evidence Lower BOund (ELBO) • Consider single training example x. 11_34 L(x; θ) L~(x; θ, φ) difference = KL(q(z | x) || p(z | x)) L(x; θ) := log pθ(x) = log ∫ pθ(x, z)dz = log ∫ qφ(z | x) pθ(x, z) / qφ(z | x) dz ≧ ∫ qφ(z | x) log pθ(x, z) / qφ(z | x) dz (Jensen) =: L~(x; θ, φ) • Instead of L(x; θ), we maximize L~(x; θ, φ) with respect to θ and φ. • We call L~ Evidence Lower BOund (ELBO).
  • 12. Agenda • Mathematical formulation of generative models. • Variational Auto Encoder (VAE) • Variants of VAE: RNN + VAE • RVAE, VRNN, DRAW, Convolutional DRAW, alignDRAW • Chainer implementation of (Convolutional) DRAW • Other VAE-like models • Inverse DRAW, VAE + GAN • Conclusion 12_34
  • 13. Variational AutoEncoder (VAE) [Kingma+13] • Use NN as an inference model. • Training with backpropagation. • How to calculate gradient? • REINFORCE (a.k.a Likelihood Ratio (LR)) • Control Variate • Reparameterization trick [Kingma+13] (a.k.a Stochastic Gradient Variational Bayes (SGVB) [Rezende+14]) 13/34 Kingma, D. P., & Welling, M. (2013). Auto-encoding variational bayes. arXiv preprint arXiv:1312.6114. Rezende, D. J., Mohamed, S., & Wierstra, D. (2014). Stochastic backpropagation and approximate inference in deep generative models. arXiv preprint arXiv:1401.4082. x x’ Decoder = Generative model Encoder =Inference model z
  • 14. Training Procedure • ELBO L~(x; θ, φ) equals to Ez~q(z | x) [log p(x | z)] - KL(q(z | x) || p(z)) • 1st term: Reconstruction Loss • 2nd term: Regularization Loss 14/34 z x Inference model qφ z x’ Generative model pθ 2. Inference model tries to make posterior close to the prior of generation model 4. Generation model tries to reconstruct the input data Calculate Reconstruction loss 1. Input is fed to inference model 3. Latent variable is pass generation model. Calculate regularized loss NN + sampling NN + sampling
  • 15. Generation • We can generate data points with trained generative models. 15/34 z x’ Generative model pθ NN + sampling 1. sample from prior ~ pθ(z) (e.g. N(0, 1)) 2. propagate down
  • 16. Agenda • Mathematical formulation of generative models. • Variational Auto Encoder (VAE) • Variants of VAE: RNN + VAE • RVAE, VRNN, DRAW, Convolutional DRAW, alignDRAW • Chainer implementation of (Convolutional) DRAW • Misc. • Inverse DRAW, VAE + GAN • Conclusion 16/34
  • 17. Variational Recurrent AutoEncoder (VRAE) [Fabius+14] • The modification of VAE where two models (inference model and generative model) are replaced with RNNs. 17_34 Fabius, O., & van Amersfoort, J. R. (2014). Variational recurrent auto- encoders. arXiv preprint arXiv:1412.6581. ht ht+1 hT z h0 x1’ xt-1 xt xT-1 ht xt+1’ Encoder Decoder RNN RNNht-1 xt’
  • 18. Variational RNN (VRNN) [Chung+15] • Inference and generative models share the hidden state h and update it throughout time. Latent variable z is sampled from the state. 18_34 Chung, J., Kastner, K., Dinh, L., Goel, K., Courville, A. C., & Bengio, Y. (2015). A recurrent latent variable model for sequential data. In Advances in neural information processing systems (pp. 2980-2988). ht-1 ht ht+1 xt xt+1 ht-1 ht-1 xt’ xt+1 ’ zt’ zt+1’ Encoder Decoder zt zt+1 RNN RNN
  • 19. DRAW [Gregor+15] • “Generative model of natural images that operates by making a large number of small contributions to an additive canvas using an attention model”. • Inference and generative models are independent RNNs. 19/34 Gregor, K., Danihelka, I., Graves, A., Rezende, D. J., & Wierstra, D. (2015). DRAW: A recurrent neural network for image generation. arXiv preprint arXiv:1502.04623.
  • 20. DRAW without attention [Gregor+15] 20/34 x ht e ht d Δct + x ht+1 e ht+1 d Δct+1 ct +ct-1 ct+1 Encoder Decoder zt zt+1 cT x’ σ RNN RNN RNN RNN RNN RNN
  • 21. DRAW [Gregor+15] 21/34 x rt ht e ht d Δct + x rt+1 ht+1 e ht+1 d Δct+1 at at+1 at ct +ct-1 at+1 ct+1 zt zt+1 cT x’ σ RNN RNN RNN RNN RNN RNN Encoder Decoder
  • 22. Convolutional DRAW [Gregor+16] • The variant of DRAW with following modifications: • Linear connections are replaced with convolutions (including connections in LSTM). • Read and write attention mechanisms are removed. • Instead of sampling from Standard Gaussian distribution in DRAW, prior of generative model depends on decoderʼs state. • But details of the implementation is not fully described in the paper ... 22/34 Gregor, K., Besse, F., Rezende, D. J., Danihelka, I., & Wierstra, D. (2016). Towards Conceptual Compression. arXiv preprint arXiv:1604.08772.
  • 23. alignDRAW [Mansimov+15] • Generate image from its caption. 23/34 Mansimov, E., Parisotto, E., Ba, J. L., & Salakhutdinov, R. (2015). Generating images from captions with attention. arXiv preprint arXiv:1511.02793.
  • 24. Implemantation of convolutional DRAW with Chainer 24 Reconstruction Generation Generation (linear connection)
  • 25. My implementation of convolutional DRAW 25/34 y x + eembe ht e LSTM ht e ztembd +ht d LSTM ht d Δct +ct ct+1 µt d σt d2 µt e σt e2 Convolution Linear Identity Samplingct - xt+1 ’ σ NLL loss Deconvolution y Encoder Decoder
  • 26. Agenda • Mathematical formulation of generative models. • Variational Auto Encoder (VAE) • Variants of VAE: RNN + VAE • RVAE, VRNN, DRAW, Convolutional DRAW, alignDRAW • Chainer implementation of (Convolutional) DRAW • Other VAE-like models • Inverse DRAW, VAE + GAN • Conclusion 26/34
  • 27. VAE + GAN [Larsen+15] • Use generative model of VAE as the generator of GAN. 27/34 Larsen, A. B. L., Sønderby, S. K., & Winther, O. (2015). Autoencoding beyond pixels using a learned similarity metric. arXiv preprint arXiv:1512.09300.
  • 29. cf. InfoGAN[Chen+16] • Make latent variables of GAN interpretable. 29/34 Chen, X., Duan, Y., Houthooft, R., Schulman, J., Sutskever, I., & Abbeel, P. (2016). InfoGAN: Interpretable Representation Learning by Information Maximizing Generative Adversarial Nets. arXiv preprint arXiv:1606.03657.
  • 30. Agenda • Mathematical formulation of generative models. • Variational Auto Encoder (VAE) • Variants of VAE: RNN + VAE • RVAE, VRNN, DRAW, Convolutional DRAW, alignDRAW • Chainer implementation of (Convolutional) DRAW • Other VAE-like models • Inverse DRAW, VAE + GAN • Conclusion 30/34
  • 31. Challenges of VAE-like generative models • Compared to GAN, the images generated by VAE-like models are said to be blurry. • Difficulty of evaluation. • The following common evaluation criteria are independent in some situation [Theis+15]. • average log-likelihood • Parzen window estimates • visual fidelity of samples • We can evaluate exactly only lower bound of log-likelihood. • Generation of high dimensional images is still challenging. 31/34 Theis, L., Oord, A. V. D., & Bethge, M. (2015). A note on the evaluation of generative models. arXiv preprint arXiv:1511.01844.
  • 32. Many many topics are not covered today. • VAE + Gaussian Process • VAE-DGP, Variational GP, Recurrent GP • Tighter lower bound of log-likelihood • Importance Weighted AE • Generative model with more complex prior distribution • Hierachical Variational Model, Auxiliary Deep Generative Model, Hamiltonial Variational Inference, Normalizing Flow, Gradient Flow, Inverse Autoregressive Flow, • Automatic Variational Inference 32/34
  • 33. Related conferences, workshops and blogs • NIPS 2015 • Advances in Approximate Bayesian Inference (AABI) • http://approximateinference.org/accepted/ • Black Box Learning and Inference • http://www.blackboxworkshop.org • ICLR 2016 • http://www.iclr.cc/doku.php?id=iclr2016:main • OpenAI • Blog: Generative Models • https://openai.com/blog/generative-models/ 33/34
  • 34. Summary • VAE is a generative model that parameterize the inference and generative models with NNs and optimize them by maximizing the ELBO of loglikelihood. • Recently the variant of VAE is proposed including RVAE, VRNN, and (Convolutional) DRAW. • Introduced the implementation of generative model with Chainer. 34/34