SlideShare a Scribd company logo
1
Recent Advances in Autoencoder-Based

Representation Learning
Presenter:Tatsuya Matsushima @__tmats__ , Matsuo Lab
Recent Advances in Autoencoder-Based Representation Learning
• https://arxiv.org/abs/1812.05069 (Submitted on 12 Dec 2018)
• Michael Tschannen, Olivier Bachem, Mario Lucic
• ETH Zurich, Google Brain
• NeurIPS 2018 Workshop (Bayesian Deep Learning)
• http://bayesiandeeplearning.org/
• 19 3 accept
•
•
• ( …)
※
2
TL; DR
•
•
• meta-prior
• ( )
• Rate-Distortion
3
• (SRL)
• [DL ] 

https://www.slideshare.net/DeepLearningJP2016/dl-124128933
• SRL VAE VAE
4
VAE
5
VAE
Variational Autoencoder (VAE) [Kingma+ 2014a]
•
• KL (ELBO)
• ELBO (VAE loss )
6
ℒVAE(θ, ϕ) = 𝔼 ̂p(x) [ 𝔼qϕ(z|x) [−log pθ(x|z)]] + 𝔼 ̂p(x) [
DKL (qϕ(z|x)∥p(z))]
※ VAE ELBO
𝔼 ̂p(x) [−log pθ(x)] = ℒVAE(θ, ϕ) − 𝔼 ̂p(x) [
DKL (qϕ(z|x)∥pθ(z|x))]
−ℒVAE 𝔼 ̂p(x) [−log pθ(x)]
ℒVAE
̂p(x)
VAE
VAE loss
• 1 reparametrization trick
• 2 closed-form
• ,
closed-form
•
7
ℒVAE(θ, ϕ) = 𝔼 ̂p(x) [ 𝔼qϕ(z|x) [−log pθ(x|z)]] + 𝔼 ̂p(x) [
DKL (qϕ(z|x)∥p(z))]
z(i)
∼ qϕ(z|x(i)
)
qϕ(z|x) = 𝒩
(
μϕ(x), diag (σϕ(x))) p(z) = 𝒩(0,I)
f-
• f- 





• KL divergence
• density-ratio trick f-
• GAN
8
f f(1) = 0 px py
Df (px∥py) =
∫
f
(
px(x)
py(x) )
py(x)dx
f(t) = t log t
Df (px∥py) = DKL (px∥py)
px py
GAN Density-ratio Trick KL
•
•
• 2
• Discriminator
• 



• i.i.d
9
c ∈ {0,1}px py
px(x) = p(x|c = 1) py(x) = p(x|c = 0)
Sη
px(x)
px(x)
py(x)
=
p(x|c = 1)
p(x|c = 0)
=
p(c = 1|x)
p(c = 0|x)
≈
Sη(x)
1 − Sη(x)
px N
DKL (px∥py) ≈
1
N
N
∑
i=1
log
(
Sη (x(i)
)
1 − Sη (x(i)
))
Maximum Mean Discrepancy (MMD)
MMD
• embedding
• ) MMD
•
10
k : 𝒳 → 𝒳 ℋ
φ : 𝒳 → ℋ px(x)
MMD (px, py) = 𝔼x∼px
[φ(x)] − 𝔼y∼py
[φ(y)]
2
ℋ
py(x)
𝒳 = ℋ = ℝd φ(x) = x
MMD (px, py) = μpx
− μpy
2
2
φ
Meta-Prior VAE
11
Meta-Prior
Meta-prior [Bengio+ 2013]
•
•
•
• But
• →meta-prior
12
Meta-Prior [Bengio+ 2013]
Disentanglement
•
• )
•
•
• ) ( )
13
Meta-Prior [Bengio+ 2013]
•
•
•
•
14
Meta-Prior
( ) 

• meta-prior
15
…

( )
Meta-Prior
• disentangle
•
• )
16
17
VAE
meta-prior
aggregate ( )
VAE
• aggregate ( )
• VAE
18
z ∼ qϕ(z|x)
ℒVAE(θ, ϕ) + λ1 𝔼 ̂p(x) [
R1 (qϕ(z|x))]
+ λ2R2 (qϕ(z))
qϕ(z|x) qϕ(z) = 𝔼 ̂p(x) [qϕ(z|x)] =
1
N
N
∑
i=1
qϕ(z|x(i)
)
qϕ(z)
ℒVAE
VAE
19
ℒVAE(θ, ϕ) + λ1 𝔼 ̂p(x) [
R1 (qϕ(z|x))]
+ λ2R2 (qϕ(z))
Optional
VAE
• aggregate ( )
• divergence
20
aggregate 

( ) 

qϕ(z)
Disentanglement
disentangle
•
• loss
21
v w
x ∼ p(x|v, w)
p(v|x) =
∏
j
p (vj |x)
qϕ(z|x) v
Disentanglement
Disentangle
•
• disentangle disentangle
• ( disentangle )
• [Locatello+ 2018]
•
• (a) ELBO
• (b) x z
• (c)
22
(a) ELBO
β-VAE [Higgins+ 2017]
• VAE Loss





2
•
23
ℒVAE(θ, ϕ) = 𝔼 ̂p(x) [ 𝔼qϕ(z|x) [−log pθ(x|z)]] + 𝔼 ̂p(x) [DKL (qKL(q|x)∥p(z))]
ℒβ−VAE(θ, ϕ) = ℒVAE(θ, ϕ) + λ1 𝔼 ̂p(x) [
DKL (qϕ(z|x)∥p(z))]
qϕ(z|x) p(z)
: [Higgins+ 2017]
(b) x z
VAE Loss



2
• 

aggregate ( ) KL [Hoffman+ 2016]
• FactorVAE[Kim+ 2018]
• β-TCVAE[Chen+ 2018] InfoVAE[Zhao+ 2017a] DIP-VAE[Kumar+ 2018]
24
ℒVAE(θ, ϕ) = 𝔼 ̂p(x) [ 𝔼qϕ(z|x) [−log pθ(x|z)]] + 𝔼 ̂p(x) [DKL (qKL(q|x)∥p(z))]
𝔼 ̂p(x) [
DKL (qϕ(z|x)∥p(z))]
= Iqϕ
(x; z) + DKL (qϕ(z)∥p(z))
x z Iqϕ
(x; z)
qϕ(z) p(z)
(b) x z
Factor VAE [Kim+ 2018]
• βVAE loss 

• toral correlation





• discriminator density ratio trick
• [DL ]Disentangling by Factorising

https://www.slideshare.net/DeepLearningJP2016/dldisentangling-by-factorising
25
ℒβ−VAE DKL (qϕ(z)∥p(z))
Iqϕ
(x; z)
TC (qϕ(z)) = DKL qϕ(z)∥
∏
j
qϕ (zj)
ℒFactorVAE(θ, ϕ) = ℒVAE(θ, ϕ) + λ2 TC (qϕ(z))
(c)
HSIC-VAE [Lopez+ 2018]
• Hilbert-Schmidt independence criterion (HSIC) [Gretton+2005] 

• HSIC ( AppendixA )
• 

•
HFVAE [Esmaeili+ 2018]
26
zG = {zk}k∈G
ℒHSIC−VAE(θ, ϕ) = ℒVAE(θ, ϕ) + λ2HSIC
(
qϕ (zG1), qϕ (zG2))
s
HSIC (qϕ(z), p(s))
p(s)
PixelGAN-AE [Makhzani+ 2017]
• PixelCNN[van den Oord+ 2016] 

•
• VAE loss KL 





• KL GAN
VIB[Alemi+ 2016] 

Information dropout[Achille+ 2018] 27
ℒPixelGAN−AE(θ, ϕ) = ℒVAE(θ, ϕ) − Iqϕ
(x; z)
𝔼 ̂p(x) [
DKL (qϕ(z|x)∥p(z))]
= Iqϕ
(x; z) + DKL (qϕ(z)∥p(z))
Iqϕ
(x; z)
DKL (qϕ(z)∥p(z)) : [Makhzani+ 2017]
Variational Fair Autoencoder (VFAE) [Louizos+ 2016]
•
• VAE loss MMD
•
• MMD HSIC HSIC-VAE[Lopez+ 2018]
• 2 VFAE[Louizos+ 2016] HSIC-VAE [Lopez+ 2018] 

Fader Network[Lample+ 2017] 

DC-IGN[Kulkarni+ 2015] 28
q(z|s = k)
s
s
s z
ℒVAEq(z|s = k′)
ℒVFAE(θ, ϕ) = ℒVAE + λ2
K
∑
ℓ=2
MMD (qϕ(z|s = ℓ), qϕ(z|s = 1))
qϕ(z|s = ℓ) =
∑
i:s(i)=ℓ
1
{i : s(i) = ℓ}
qϕ(z|x(i)
, s(i)
)
29
• )
30
H:
A:
N:
C: Categorical
L: Learned prior
VAE
M2 [Kingma+ 2014b]
•
•
• loss 

• M1 (M1+M2 )
•
• DL Hacks Semi-supervised Learning with Deep Generative Models

https://www.slideshare.net/YuusukeIwasawa/dl-hacks2015-0421iwasawa
• Semi-Supervised Learning with Deep Generative Models pixyz 

https://qiita.com/kogepan102/items/22b685ce7e9a51fbab98
31
qϕ(z, y|x) = qϕ(z|y, x)qϕ(y|x)
x z y
x
qϕ(z, y|x)
qϕ(z|y, x) ℒVAEy
VLAE
Varational Lossy Autoencoder (VLAE) [Chen+ 2017]
• 

• 

• ) 









PixelVAE[Gulrajani+ 2017] 

LadderVAE[Sønderby+ 2016] VLaAE[Zhao+ 2017b] 32
pθ(x|z) z
z
pθ(x|z) W(j)
pθ(x|z) =
∏
j
pθ (xj |z, xW( j))
j
33
meta-prior
• meta-prior
• ) MNIST 

) (SVAE) [Johnson+ 2016]
34
p(z)
N:
C: Categorical
M: mixture
G:
L; Learned Prior
JointVAE [Dupont 2018]
• disentanglement 

•
• Gumbel-Softmax
• KL (β-VAE 2 )
VQ-VAE[van den Oord+ 2017]
35
z c
qϕ(c|x)qϕ(z|x)
qϕ(c|x)
DKL (qϕ(z|x)qϕ(c|x)∥p(z)p(c)) = DKL (qϕ(z|x)∥p(z)) + DKL (qϕ(c|x)∥p(c))
ℒβ−VAE
36
• Denoising Autoencoder (DAE) [Vincent+ 2008]
• [Yingzhen+ 2018] [Hsieh+2018]
• [Villegas+ 2017] [Denton+ 2017] [Fraccaro+ 2017]
37
discriminator
•
• Adversarially Learned Inference (ALI) [Dumoulin+ 2017]
• Bidirectional GAN (BiGAN) [Donahue+ 2017]
38
qϕ(z|x) pθ(x|z)
pθ(x|z)p(z) qϕ(z|x) ̂p(x)
: [Dumoulin+ 2017]
: [Donahue+ 2017]
Rate-Distortion-Usefulness Tradeoff
39
Rate-Distortion Tradeoff
meta-prior
• ) βVAE [Higgins+ 2017]
FaderNetwork[Lample+ 2017]
”Rate-Distortion Tradeoff”[Alemi+ 2018a]
40
Rate-Distortion Tradeoff
•
• Distortion:
• Rate: KL
• VAE ELBO
41
H = −
∫
p(x)log p(x)dx = Ep(x)[−log p(x)]
D = −
∬
p(x)qϕ(z|x)log pθ(x|z)dxdz = Ep(x) [ 𝔼qϕ(z|x) [−log pθ(x|z)]]
R =
∬
p(x)qϕ(z|x)log
qϕ(z|x)
p(z)
dxdz = 𝔼p(x) [DKL (qθ(q|x)∥p(z))]
qϕ(z|x) p(z)
ELBO = − ℒVAE = − (D + R)
Rate-Distortion Tradeoff
Rate-Distortion Tradeoff [Alemi+ 2018a]
• Rate Distortion )
• ELBO
• Rate 

•
• [Alemi+ 2018a] Rate 

•
42
H − D ≤ R
: [Alemi+ 2018a]
D = H − R
min
ϕ,θ
D + |σ − R|
σ
Rate-Distortion Tradeoff
Rate
• ( )
• )
•
• ) 

Rate-Distortion Tradeoff
43
z
z
Rate-Distortion-Usefulness Tradeoff
Rate-Distortion-Usefulness Tradeoff
• 3 ”usefulness”
•
• 

R-D usefulness 

44
Rate-Distortion-Usefulness Tradeoff
Usefulness
•
•
•
• [Alemi+ 2018b] 

….?( )
45
Dy = −
∬
p(x, y)qϕ(z|x)log pθ(y|z)dxdydz = 𝔼p(x,y) [ 𝔼qϕ(z|x) [−log pθ(y|z)]]
y
R − Dy
46
• meta-prior 

• ( )
•
• supervision
• Rate-Distortion
• “usefulness”
47
• Rate-Distortion-Usefulness
• z
ex) GQN
• Meta-Prior
• meta-learning
• [DL ]Meta-Learning Probabilistic Inference for Prediction 

https://www.slideshare.net/DeepLearningJP2016/dlmetalearning-probabilistic-inference-for-
prediction-126167192
• usefulnes ( )
•
• Pixyz Pixyzoo ( )
48
Pixyz & Pixyzoo
Pixyz https://github.com/masa-su/pixyz
• (Pytorch )
• 



Pixyzoo https://github.com/masa-su/pixyzoo
• Pixyz
• GQN VIB
• [DLHacks]PyTorch, Pixyz Generative Query Network 

https://www.slideshare.net/DeepLearningJP2016/dlhackspytorch-pixyzgenerative-query-
network-126329901
49
Appendix
50
References
[Achille+ 2018] A. Achille and S. Soatto, “Information dropout: Learning optimal representations through noisy computation,”
IEEE Transactions on Pattern Analysis and Machine Intelligence, 2018. https://ieeexplore.ieee.org/document/8253482
[Alemi+ 2016] A. A. Alemi, I. Fischer, J. V. Dillon, and K. Murphy, “Deep variational information bottleneck,” in International
Conference on Learning Representations, 2016. https://openreview.net/forum?id=HyxQzBceg
[Alemi+ 2018a] A. Alemi, B. Poole, I. Fischer, J. Dillon, R. A. Saurous, and K. Murphy, “Fixing a broken ELBO,” in Proc. of the
International Conference on Machine Learning, 2018, pp. 159–168. http://proceedings.mlr.press/v80/alemi18a.html
[Alemi+ 2018b] A. A. Alemi and I. Fischer, “TherML: Thermodynamics of machine learning,” arXiv:1807.04162, 2018. https://
arxiv.org/abs/1807.04162
[Bengio+ 2013] Y. Bengio, A. Courville, and P. Vincent, “Representation learning: A review and new perspectives,” IEEE
Transactions on Pattern Analysis and Machine Intelligence, vol. 35, no. 8, pp. 1798–1828, 2013. https://ieeexplore.ieee.org/
document/6472238
[Chen+ 2017] X. Chen, D. P. Kingma, T. Salimans, Y. Duan, P. Dhariwal, J. Schulman, I. Sutskever, and P. Abbeel, “Variational
lossy autoencoder,” in International Conference on Learning Representations, 2017. https://openreview.net/forum?
id=BysvGP5ee
[Chen+ 2018] T. Q. Chen, X. Li, R. Grosse, and D. Duvenaud, “Isolating sources of disentanglement in variational
autoencoders,” in Advances in Neural Information Processing Systems, 2018. http://papers.nips.cc/paper/7527-isolating-
sources-of-disentanglement-in-variational-autoencoders
51
[Denton+ 2017] E. L. Denton and V. Birodkar, “Unsupervised learning of disentangled representations from video,” in Advances
in Neural Information Processing Systems, 2017, pp. 4414–4423. https://papers.nips.cc/paper/7028-unsupervised-learning-of-
disentangled-representations-from-video
[Donahue+ 2017] J. Donahue, P. Krahenb ¨ uhl, and T. Darrell, “Adversarial feature learning,” in ¨ International Conference on
Learning Representations, 2017. https://openreview.net/forum?id=BJtNZAFgg
[Dumoulin+ 2017] V. Dumoulin, I. Belghazi, B. Poole, O. Mastropietro, A. Lamb, M. Arjovsky, and A. Courville, “Adversarially
learned inference,” in International Conference on Learning Representations, 2017. https://openreview.net/forum?id=B1ElR4cgg
[Dupont 2018] E. Dupont, “Learning disentangled joint continuous and discrete representations,” in Advances in Neural
Information Processing Systems, 2018. http://papers.nips.cc/paper/7351-learning-disentangled-joint-continuous-and-discrete-
representations
[Esmaeili+ 2018] B.Esmaeili,H.Wu,S.Jain,A.Bozkurt,N.Siddharth,B.Paige,D.H.Brooks,J.Dy,andJ.-W. van de Meent, “Structured
disentangled representations,” arXiv:1804.02086, 2018. https://arxiv.org/abs/1804.02086
[Fraccaro+ 2017] M. Fraccaro, S. Kamronn, U. Paquet, and O. Winther, “A disentangled recognition and nonlinear dynamics
model for unsupervised learning,” in Advances in Neural Information Processing Systems, 2017, pp. 3601–3610. https://
papers.nips.cc/paper/6951-a-disentangled-recognition-and-nonlinear-dynamics-model-for-unsupervised-learning
[Gretton+ 2005] A. Gretton, O. Bousquet, A. Smola, and B. Scho ̈lkopf, “Measuring statistical dependence with Hilbert-Schmidt
norms,” in International Conference on Algorithmic Learning Theory. Springer, 2005, pp. 63–77. https://link.springer.com/chapter/
10.1007/11564089_7
[Gulrajani+ 2017] I. Gulrajani, K. Kumar, F. Ahmed, A. A. Taiga, F. Visin, D. Vazquez, and A. Courville, “PixelVAE: A latent
variable model for natural images,” in International Conference on Learning Representations, 2017. https://openreview.net/
forum?id=BJKYvt5lg
References
52
[Higgins+ 2017]  I. Higgins, L. Matthey, A. Pal, C. Burgess, X. Glorot, M. Botvinick, S. Mohamed, and A. Lerchner, “beta-VAE:
Learning basic visual concepts with a constrained variational framework,” in International Conference on Learning
Representations, 2017. https://openreview.net/forum?id=Sy2fzU9gl
[Hoffman+ 2016] M. D. Hoffman and M. J. Johnson, “Elbo surgery: yet another way to carve up the variational evidence lower
bound,” in Workshop in Advances in Approximate Bayesian Inference, NIPS, 2016. http://approximateinference.org/accepted/
HoffmanJohnson2016.pdf
[Hsieh+2018] J.-T. Hsieh, B. Liu, D.-A. Huang, L. Fei-Fei, and J. C. Niebles, “Learning to decompose and disentangle
representations for video prediction,” in Advances in Neural Information Processing Systems, 2018. http://papers.nips.cc/paper/
7333-learning-to-decompose-and-disentangle-representations-for-video-prediction
[Johnson+ 2016] M. Johnson, D. K. Duvenaud, A. Wiltschko, R. P. Adams, and S. R. Datta, “Composing graphical models with
neural networks for structured representations and fast inference,” in Advances in Neural Information Processing Systems,
2016, pp. 2946–2954. https://papers.nips.cc/paper/6379-composing-graphical-models-with-neural-networks-for-structured-
representations-and-fast-inference
[Kim+ 2018] H. Kim and A. Mnih, “Disentangling by factorising,” in Proc. of the International Conference on Machine Learning,
2018, pp. 2649–2658. http://proceedings.mlr.press/v80/kim18b.html
[Kingma+ 2014a] D. P. Kingma and M. Welling, “Auto-encoding variational bayes,” in International Conference on Learning
Representations, 2014. https://openreview.net/forum?id=33X9fd2-9FyZd
[Kingma+ 2014b]  D. P. Kingma, S. Mohamed, D. J. Rezende, and M. Welling, “Semi-supervised learning with deep generative
models,” in Advances in Neural Information Processing Systems, 2014, pp. 3581–3589. https://papers.nips.cc/paper/5352-semi-
supervised-learning-with-deep-generative-models
References
53
[Kulkarni+ 2015] T.D.Kulkarni, W.F.Whitney, P.Kohli, and J.Tenenbaum, “Deep convolutional inverse graphics network,” in
Advances in Neural Information Processing Systems, 2015, pp. 2539–2547. https://papers.nips.cc/paper/5851-deep-
convolutional-inverse-graphics-network
[Kumar+ 2018] A. Kumar, P. Sattigeri, and A. Balakrishnan, “Variational inference of disentangled latent concepts from
unlabeled observations,” in International Conference on Learning Representations, 2018. https://openreview.net/forum?
id=H1kG7GZAW
[Lample+ 2017] G. Lample, N. Zeghidour, N. Usunier, A. Bordes, L. Denoyer et al., “Fader networks: Manipulating images by
sliding attributes,” in Advances in Neural Information Processing Systems, 2017, pp. 5967–5976. https://papers.nips.cc/paper/
7178-fader-networksmanipulating-images-by-sliding-attributes
[Locatello+ 2018] F. Locatello, S. Bauer, M. Lucic, S. Gelly, B. Scho ̈lkopf, and O. Bachem, “Challenging common assumptions
in the unsupervised learning of disentangled representations,” arXiv:1811.12359, 2018. https://arxiv.org/abs/1811.12359
[Lopez+ 2018] R. Lopez, J. Regier, M. I. Jordan, and N. Yosef, “Information constraints on auto-encoding variational bayes,” in
Advances in Neural Information Processing Systems, 2018. https://papers.nips.cc/paper/7850-information-constraints-on-auto-
encoding-variational-bayes
[Louizos+ 2016] C. Louizos, K. Swersky, Y. Li, M. Welling, and R. Zemel, “The variational fair autoencoder,” in International
Conference on Learning Representations, 2016. https://arxiv.org/abs/1511.00830
[Makhzani+ 2017] A. Makhzani and B. J. Frey, “PixelGAN autoencoders,” in Advances in Neural Information Processing
Systems, 2017, pp. 1975–1985. https://papers.nips.cc/paper/6793-pixelgan-autoencoders
[Sønderby+ 2016] C. K. Sønderby, T. Raiko, L. Maaløe, S. K. Sønderby, and O. Winther, “Ladder variational autoencoders,” in
Advances in Neural Information Processing Systems, 2016, pp. 3738–3746. https://papers.nips.cc/paper/6275-ladder-
variational-autoencoders
References
54
[van den Oord+ 2016] A. van den Oord, N. Kalchbrenner, L. Espeholt, O. Vinyals, and A. Graves, “Conditional image
generation with PixelCNN decoders,” in Advances in Neural Information Processing Systems, 2016, pp. 4790–4798. https://
papers.nips.cc/paper/6527-conditional-image-generation-with-pixelcnn-decoders
[van den Oord+ 2017] A. van den Oord, O. Vinyals et al., “Neural discrete representation learning,” in Advances in Neural
Information Processing Systems, 2017, pp. 6306–6315. https://papers.nips.cc/paper/7210-neural-discrete-representation-
learning
[Villegas+ 2017] R. Villegas, J. Yang, S. Hong, X. Lin, and H. Lee, “Decomposing motion and content for natural video
sequence prediction,” in International Conference on Learning Representations, 2017. https://openreview.net/forum?
id=rkEFLFqee
[Vincent+ 2008] P. Vincent, H. Larochelle, Y. Bengio, and P.-A. Manzagol, “Extracting and composing robust features with
denoising autoencoders,” in Proc. of the International Conference on Machine Learning, 2008, pp. 1096–1103. https://
dl.acm.org/citation.cfm?id=1390294
[Yingzhen+ 2018] L. Yingzhen and S. Mandt, “Disentangled sequential autoencoder,” in Proc. of the International Conference
on Machine Learning, 2018, pp. 5656–5665. http://proceedings.mlr.press/v80/yingzhen18a.html
[Zhao+ 2017a] S.Zhao, J.Song, and S.Ermon,“InfoVAE: Information maximizing variational autoencoders,” arXiv:1706.02262,
2017. https://arxiv.org/abs/1706.02262
[Zhao+ 2017b] S. Zhao, J. Song, and S. Ermon, “Learning hierarchical features from deep generative models,” in Proc. of the
International Conference on Machine Learning, 2017, pp. 4091–4099. http://proceedings.mlr.press/v70/zhao17c.html
References
55

More Related Content

What's hot (20)

PDF
SSII2021 [OS2-03] 自己教師あり学習における対照学習の基礎と応用
SSII
 
PDF
「世界モデル」と関連研究について
Masahiro Suzuki
 
PDF
PRML学習者から入る深層生成モデル入門
tmtm otm
 
PPTX
[DL輪読会]Learning Latent Dynamics for Planning from Pixels
Deep Learning JP
 
PPTX
[DL輪読会]Dream to Control: Learning Behaviors by Latent Imagination
Deep Learning JP
 
PDF
Disentanglement Survey:Can You Explain How Much Are Generative models Disenta...
Hideki Tsunashima
 
PDF
Non-autoregressive text generation
nlab_utokyo
 
PDF
[DL輪読会]A Bayesian Perspective on Generalization and Stochastic Gradient Descent
Deep Learning JP
 
PPTX
【DL輪読会】High-Resolution Image Synthesis with Latent Diffusion Models
Deep Learning JP
 
PDF
【メタサーベイ】数式ドリブン教師あり学習
cvpaper. challenge
 
PDF
continual learning survey
ぱんいち すみもと
 
PDF
Deeplearning輪読会
正志 坪坂
 
PDF
グラフィカルモデル入門
Kawamoto_Kazuhiko
 
PDF
[DL輪読会]近年のエネルギーベースモデルの進展
Deep Learning JP
 
PPTX
[DL輪読会]NVAE: A Deep Hierarchical Variational Autoencoder
Deep Learning JP
 
PDF
SSII2022 [TS1] Transformerの最前線〜 畳込みニューラルネットワークの先へ 〜
SSII
 
PPTX
[DL輪読会]Life-Long Disentangled Representation Learning with Cross-Domain Laten...
Deep Learning JP
 
PDF
[DL輪読会]Control as Inferenceと発展
Deep Learning JP
 
PDF
実装レベルで学ぶVQVAE
ぱんいち すみもと
 
PDF
データに内在する構造をみるための埋め込み手法
Tatsuya Shirakawa
 
SSII2021 [OS2-03] 自己教師あり学習における対照学習の基礎と応用
SSII
 
「世界モデル」と関連研究について
Masahiro Suzuki
 
PRML学習者から入る深層生成モデル入門
tmtm otm
 
[DL輪読会]Learning Latent Dynamics for Planning from Pixels
Deep Learning JP
 
[DL輪読会]Dream to Control: Learning Behaviors by Latent Imagination
Deep Learning JP
 
Disentanglement Survey:Can You Explain How Much Are Generative models Disenta...
Hideki Tsunashima
 
Non-autoregressive text generation
nlab_utokyo
 
[DL輪読会]A Bayesian Perspective on Generalization and Stochastic Gradient Descent
Deep Learning JP
 
【DL輪読会】High-Resolution Image Synthesis with Latent Diffusion Models
Deep Learning JP
 
【メタサーベイ】数式ドリブン教師あり学習
cvpaper. challenge
 
continual learning survey
ぱんいち すみもと
 
Deeplearning輪読会
正志 坪坂
 
グラフィカルモデル入門
Kawamoto_Kazuhiko
 
[DL輪読会]近年のエネルギーベースモデルの進展
Deep Learning JP
 
[DL輪読会]NVAE: A Deep Hierarchical Variational Autoencoder
Deep Learning JP
 
SSII2022 [TS1] Transformerの最前線〜 畳込みニューラルネットワークの先へ 〜
SSII
 
[DL輪読会]Life-Long Disentangled Representation Learning with Cross-Domain Laten...
Deep Learning JP
 
[DL輪読会]Control as Inferenceと発展
Deep Learning JP
 
実装レベルで学ぶVQVAE
ぱんいち すみもと
 
データに内在する構造をみるための埋め込み手法
Tatsuya Shirakawa
 

Similar to [DL輪読会]Recent Advances in Autoencoder-Based Representation Learning (20)

PDF
[DL輪読会]近年のオフライン強化学習のまとめ —Offline Reinforcement Learning: Tutorial, Review, an...
Deep Learning JP
 
PDF
深層学習とベイズ統計
Yuta Kashino
 
PDF
Actors for Behavioural Simulation
ClarkTony
 
PDF
[論文紹介] Towards Understanding Linear Word Analogies
Makoto Takenaka
 
PDF
All You Need is Fold
Mike Harris
 
PDF
[DL輪読会]Hindsight Experience Replayを応用した再ラベリングによる効率的な強化学習
Deep Learning JP
 
PDF
ベイジアンディープニューラルネット
Yuta Kashino
 
PDF
Introduction to Polyhedral Compilation
Akihiro Hayashi
 
PDF
[DL輪読会] off-policyなメタ強化学習
Deep Learning JP
 
PDF
Program Language - Fall 2013
Yun-Yan Chi
 
PDF
Outrageous Ideas for Graph Databases
Max De Marzi
 
PDF
Triggering patterns of topology changes in dynamic attributed graphs
INSA Lyon - L'Institut National des Sciences Appliquées de Lyon
 
PDF
ggplot2: An Extensible Platform for Publication-quality Graphics
Claus Wilke
 
PDF
Pythonbrasil - 2018 - Acelerando Soluções com GPU
Paulo Sergio Lemes Queiroz
 
PPTX
[Paper Reading] Generalized Sub-Query Fusion for Eliminating Redundant I/O fr...
PingCAP
 
PDF
DF1 - Py - Ovcharenko - Theano Tutorial
MoscowDataFest
 
PDF
MUMS: Transition & SPUQ Workshop - Practical Bayesian Optimization for Urban ...
The Statistical and Applied Mathematical Sciences Institute
 
PDF
InfoGAN : Interpretable Representation Learning by Information Maximizing Gen...
Hansol Kang
 
PPT
Introduction to MATLAB
Damian T. Gordon
 
PDF
Declare Your Language: Transformation by Strategic Term Rewriting
Eelco Visser
 
[DL輪読会]近年のオフライン強化学習のまとめ —Offline Reinforcement Learning: Tutorial, Review, an...
Deep Learning JP
 
深層学習とベイズ統計
Yuta Kashino
 
Actors for Behavioural Simulation
ClarkTony
 
[論文紹介] Towards Understanding Linear Word Analogies
Makoto Takenaka
 
All You Need is Fold
Mike Harris
 
[DL輪読会]Hindsight Experience Replayを応用した再ラベリングによる効率的な強化学習
Deep Learning JP
 
ベイジアンディープニューラルネット
Yuta Kashino
 
Introduction to Polyhedral Compilation
Akihiro Hayashi
 
[DL輪読会] off-policyなメタ強化学習
Deep Learning JP
 
Program Language - Fall 2013
Yun-Yan Chi
 
Outrageous Ideas for Graph Databases
Max De Marzi
 
Triggering patterns of topology changes in dynamic attributed graphs
INSA Lyon - L'Institut National des Sciences Appliquées de Lyon
 
ggplot2: An Extensible Platform for Publication-quality Graphics
Claus Wilke
 
Pythonbrasil - 2018 - Acelerando Soluções com GPU
Paulo Sergio Lemes Queiroz
 
[Paper Reading] Generalized Sub-Query Fusion for Eliminating Redundant I/O fr...
PingCAP
 
DF1 - Py - Ovcharenko - Theano Tutorial
MoscowDataFest
 
MUMS: Transition & SPUQ Workshop - Practical Bayesian Optimization for Urban ...
The Statistical and Applied Mathematical Sciences Institute
 
InfoGAN : Interpretable Representation Learning by Information Maximizing Gen...
Hansol Kang
 
Introduction to MATLAB
Damian T. Gordon
 
Declare Your Language: Transformation by Strategic Term Rewriting
Eelco Visser
 
Ad

More from Deep Learning JP (20)

PPTX
【DL輪読会】AdaptDiffuser: Diffusion Models as Adaptive Self-evolving Planners
Deep Learning JP
 
PPTX
【DL輪読会】事前学習用データセットについて
Deep Learning JP
 
PPTX
【DL輪読会】 "Learning to render novel views from wide-baseline stereo pairs." CVP...
Deep Learning JP
 
PPTX
【DL輪読会】Zero-Shot Dual-Lens Super-Resolution
Deep Learning JP
 
PPTX
【DL輪読会】BloombergGPT: A Large Language Model for Finance arxiv
Deep Learning JP
 
PPTX
【DL輪読会】マルチモーダル LLM
Deep Learning JP
 
PDF
【 DL輪読会】ToolLLM: Facilitating Large Language Models to Master 16000+ Real-wo...
Deep Learning JP
 
PPTX
【DL輪読会】AnyLoc: Towards Universal Visual Place Recognition
Deep Learning JP
 
PDF
【DL輪読会】Can Neural Network Memorization Be Localized?
Deep Learning JP
 
PPTX
【DL輪読会】Hopfield network 関連研究について
Deep Learning JP
 
PPTX
【DL輪読会】SimPer: Simple self-supervised learning of periodic targets( ICLR 2023 )
Deep Learning JP
 
PDF
【DL輪読会】RLCD: Reinforcement Learning from Contrast Distillation for Language M...
Deep Learning JP
 
PDF
【DL輪読会】"Secrets of RLHF in Large Language Models Part I: PPO"
Deep Learning JP
 
PPTX
【DL輪読会】"Language Instructed Reinforcement Learning for Human-AI Coordination "
Deep Learning JP
 
PPTX
【DL輪読会】Llama 2: Open Foundation and Fine-Tuned Chat Models
Deep Learning JP
 
PDF
【DL輪読会】"Learning Fine-Grained Bimanual Manipulation with Low-Cost Hardware"
Deep Learning JP
 
PPTX
【DL輪読会】Parameter is Not All You Need:Starting from Non-Parametric Networks fo...
Deep Learning JP
 
PDF
【DL輪読会】Drag Your GAN: Interactive Point-based Manipulation on the Generative ...
Deep Learning JP
 
PDF
【DL輪読会】Self-Supervised Learning from Images with a Joint-Embedding Predictive...
Deep Learning JP
 
PPTX
【DL輪読会】Towards Understanding Ensemble, Knowledge Distillation and Self-Distil...
Deep Learning JP
 
【DL輪読会】AdaptDiffuser: Diffusion Models as Adaptive Self-evolving Planners
Deep Learning JP
 
【DL輪読会】事前学習用データセットについて
Deep Learning JP
 
【DL輪読会】 "Learning to render novel views from wide-baseline stereo pairs." CVP...
Deep Learning JP
 
【DL輪読会】Zero-Shot Dual-Lens Super-Resolution
Deep Learning JP
 
【DL輪読会】BloombergGPT: A Large Language Model for Finance arxiv
Deep Learning JP
 
【DL輪読会】マルチモーダル LLM
Deep Learning JP
 
【 DL輪読会】ToolLLM: Facilitating Large Language Models to Master 16000+ Real-wo...
Deep Learning JP
 
【DL輪読会】AnyLoc: Towards Universal Visual Place Recognition
Deep Learning JP
 
【DL輪読会】Can Neural Network Memorization Be Localized?
Deep Learning JP
 
【DL輪読会】Hopfield network 関連研究について
Deep Learning JP
 
【DL輪読会】SimPer: Simple self-supervised learning of periodic targets( ICLR 2023 )
Deep Learning JP
 
【DL輪読会】RLCD: Reinforcement Learning from Contrast Distillation for Language M...
Deep Learning JP
 
【DL輪読会】"Secrets of RLHF in Large Language Models Part I: PPO"
Deep Learning JP
 
【DL輪読会】"Language Instructed Reinforcement Learning for Human-AI Coordination "
Deep Learning JP
 
【DL輪読会】Llama 2: Open Foundation and Fine-Tuned Chat Models
Deep Learning JP
 
【DL輪読会】"Learning Fine-Grained Bimanual Manipulation with Low-Cost Hardware"
Deep Learning JP
 
【DL輪読会】Parameter is Not All You Need:Starting from Non-Parametric Networks fo...
Deep Learning JP
 
【DL輪読会】Drag Your GAN: Interactive Point-based Manipulation on the Generative ...
Deep Learning JP
 
【DL輪読会】Self-Supervised Learning from Images with a Joint-Embedding Predictive...
Deep Learning JP
 
【DL輪読会】Towards Understanding Ensemble, Knowledge Distillation and Self-Distil...
Deep Learning JP
 
Ad

Recently uploaded (20)

PPTX
2025 HackRedCon Cyber Career Paths.pptx Scott Stanton
Scott Stanton
 
PDF
5 Things to Consider When Deploying AI in Your Enterprise
Safe Software
 
PDF
Simplify Your FME Flow Setup: Fault-Tolerant Deployment Made Easy with Packer...
Safe Software
 
PPTX
The birth and death of Stars - earth and life science
rizellemarieastrolo
 
PDF
Automating the Geo-Referencing of Historic Aerial Photography in Flanders
Safe Software
 
PDF
Unlocking FME Flow’s Potential: Architecture Design for Modern Enterprises
Safe Software
 
PPTX
New ThousandEyes Product Innovations: Cisco Live June 2025
ThousandEyes
 
PDF
Bridging CAD, IBM TRIRIGA & GIS with FME: The Portland Public Schools Case
Safe Software
 
PDF
Understanding The True Cost of DynamoDB Webinar
ScyllaDB
 
PDF
How to Comply With Saudi Arabia’s National Cybersecurity Regulations.pdf
Bluechip Advanced Technologies
 
PDF
''Taming Explosive Growth: Building Resilience in a Hyper-Scaled Financial Pl...
Fwdays
 
PDF
Hello I'm "AI" Your New _________________
Dr. Tathagat Varma
 
PPTX
Enabling the Digital Artisan – keynote at ICOCI 2025
Alan Dix
 
PDF
Pipeline Industry IoT - Real Time Data Monitoring
Safe Software
 
PDF
FME as an Orchestration Tool with Principles From Data Gravity
Safe Software
 
PDF
Next level data operations using Power Automate magic
Andries den Haan
 
PDF
99 Bottles of Trust on the Wall — Operational Principles for Trust in Cyber C...
treyka
 
PDF
Proactive Server and System Monitoring with FME: Using HTTP and System Caller...
Safe Software
 
PDF
Why aren't you using FME Flow's CPU Time?
Safe Software
 
PDF
Quantum AI Discoveries: Fractal Patterns Consciousness and Cyclical Universes
Saikat Basu
 
2025 HackRedCon Cyber Career Paths.pptx Scott Stanton
Scott Stanton
 
5 Things to Consider When Deploying AI in Your Enterprise
Safe Software
 
Simplify Your FME Flow Setup: Fault-Tolerant Deployment Made Easy with Packer...
Safe Software
 
The birth and death of Stars - earth and life science
rizellemarieastrolo
 
Automating the Geo-Referencing of Historic Aerial Photography in Flanders
Safe Software
 
Unlocking FME Flow’s Potential: Architecture Design for Modern Enterprises
Safe Software
 
New ThousandEyes Product Innovations: Cisco Live June 2025
ThousandEyes
 
Bridging CAD, IBM TRIRIGA & GIS with FME: The Portland Public Schools Case
Safe Software
 
Understanding The True Cost of DynamoDB Webinar
ScyllaDB
 
How to Comply With Saudi Arabia’s National Cybersecurity Regulations.pdf
Bluechip Advanced Technologies
 
''Taming Explosive Growth: Building Resilience in a Hyper-Scaled Financial Pl...
Fwdays
 
Hello I'm "AI" Your New _________________
Dr. Tathagat Varma
 
Enabling the Digital Artisan – keynote at ICOCI 2025
Alan Dix
 
Pipeline Industry IoT - Real Time Data Monitoring
Safe Software
 
FME as an Orchestration Tool with Principles From Data Gravity
Safe Software
 
Next level data operations using Power Automate magic
Andries den Haan
 
99 Bottles of Trust on the Wall — Operational Principles for Trust in Cyber C...
treyka
 
Proactive Server and System Monitoring with FME: Using HTTP and System Caller...
Safe Software
 
Why aren't you using FME Flow's CPU Time?
Safe Software
 
Quantum AI Discoveries: Fractal Patterns Consciousness and Cyclical Universes
Saikat Basu
 

[DL輪読会]Recent Advances in Autoencoder-Based Representation Learning

  • 1. 1 Recent Advances in Autoencoder-Based
 Representation Learning Presenter:Tatsuya Matsushima @__tmats__ , Matsuo Lab
  • 2. Recent Advances in Autoencoder-Based Representation Learning • https://arxiv.org/abs/1812.05069 (Submitted on 12 Dec 2018) • Michael Tschannen, Olivier Bachem, Mario Lucic • ETH Zurich, Google Brain • NeurIPS 2018 Workshop (Bayesian Deep Learning) • http://bayesiandeeplearning.org/ • 19 3 accept • • • ( …) ※ 2
  • 3. TL; DR • • • meta-prior • ( ) • Rate-Distortion 3
  • 4. • (SRL) • [DL ] 
 https://www.slideshare.net/DeepLearningJP2016/dl-124128933 • SRL VAE VAE 4
  • 6. VAE Variational Autoencoder (VAE) [Kingma+ 2014a] • • KL (ELBO) • ELBO (VAE loss ) 6 ℒVAE(θ, ϕ) = 𝔼 ̂p(x) [ 𝔼qϕ(z|x) [−log pθ(x|z)]] + 𝔼 ̂p(x) [ DKL (qϕ(z|x)∥p(z))] ※ VAE ELBO 𝔼 ̂p(x) [−log pθ(x)] = ℒVAE(θ, ϕ) − 𝔼 ̂p(x) [ DKL (qϕ(z|x)∥pθ(z|x))] −ℒVAE 𝔼 ̂p(x) [−log pθ(x)] ℒVAE ̂p(x)
  • 7. VAE VAE loss • 1 reparametrization trick • 2 closed-form • , closed-form • 7 ℒVAE(θ, ϕ) = 𝔼 ̂p(x) [ 𝔼qϕ(z|x) [−log pθ(x|z)]] + 𝔼 ̂p(x) [ DKL (qϕ(z|x)∥p(z))] z(i) ∼ qϕ(z|x(i) ) qϕ(z|x) = 𝒩 ( μϕ(x), diag (σϕ(x))) p(z) = 𝒩(0,I)
  • 8. f- • f- 
 
 
 • KL divergence • density-ratio trick f- • GAN 8 f f(1) = 0 px py Df (px∥py) = ∫ f ( px(x) py(x) ) py(x)dx f(t) = t log t Df (px∥py) = DKL (px∥py) px py
  • 9. GAN Density-ratio Trick KL • • • 2 • Discriminator • 
 
 • i.i.d 9 c ∈ {0,1}px py px(x) = p(x|c = 1) py(x) = p(x|c = 0) Sη px(x) px(x) py(x) = p(x|c = 1) p(x|c = 0) = p(c = 1|x) p(c = 0|x) ≈ Sη(x) 1 − Sη(x) px N DKL (px∥py) ≈ 1 N N ∑ i=1 log ( Sη (x(i) ) 1 − Sη (x(i) ))
  • 10. Maximum Mean Discrepancy (MMD) MMD • embedding • ) MMD • 10 k : 𝒳 → 𝒳 ℋ φ : 𝒳 → ℋ px(x) MMD (px, py) = 𝔼x∼px [φ(x)] − 𝔼y∼py [φ(y)] 2 ℋ py(x) 𝒳 = ℋ = ℝd φ(x) = x MMD (px, py) = μpx − μpy 2 2 φ
  • 15. Meta-Prior ( ) 
 • meta-prior 15 …
 ( )
  • 17. 17
  • 18. VAE meta-prior aggregate ( ) VAE • aggregate ( ) • VAE 18 z ∼ qϕ(z|x) ℒVAE(θ, ϕ) + λ1 𝔼 ̂p(x) [ R1 (qϕ(z|x))] + λ2R2 (qϕ(z)) qϕ(z|x) qϕ(z) = 𝔼 ̂p(x) [qϕ(z|x)] = 1 N N ∑ i=1 qϕ(z|x(i) ) qϕ(z) ℒVAE
  • 19. VAE 19 ℒVAE(θ, ϕ) + λ1 𝔼 ̂p(x) [ R1 (qϕ(z|x))] + λ2R2 (qϕ(z)) Optional
  • 20. VAE • aggregate ( ) • divergence 20 aggregate 
 ( ) 
 qϕ(z)
  • 21. Disentanglement disentangle • • loss 21 v w x ∼ p(x|v, w) p(v|x) = ∏ j p (vj |x) qϕ(z|x) v
  • 22. Disentanglement Disentangle • • disentangle disentangle • ( disentangle ) • [Locatello+ 2018] • • (a) ELBO • (b) x z • (c) 22
  • 23. (a) ELBO β-VAE [Higgins+ 2017] • VAE Loss
 
 
 2 • 23 ℒVAE(θ, ϕ) = 𝔼 ̂p(x) [ 𝔼qϕ(z|x) [−log pθ(x|z)]] + 𝔼 ̂p(x) [DKL (qKL(q|x)∥p(z))] ℒβ−VAE(θ, ϕ) = ℒVAE(θ, ϕ) + λ1 𝔼 ̂p(x) [ DKL (qϕ(z|x)∥p(z))] qϕ(z|x) p(z) : [Higgins+ 2017]
  • 24. (b) x z VAE Loss
 
 2 • 
 aggregate ( ) KL [Hoffman+ 2016] • FactorVAE[Kim+ 2018] • β-TCVAE[Chen+ 2018] InfoVAE[Zhao+ 2017a] DIP-VAE[Kumar+ 2018] 24 ℒVAE(θ, ϕ) = 𝔼 ̂p(x) [ 𝔼qϕ(z|x) [−log pθ(x|z)]] + 𝔼 ̂p(x) [DKL (qKL(q|x)∥p(z))] 𝔼 ̂p(x) [ DKL (qϕ(z|x)∥p(z))] = Iqϕ (x; z) + DKL (qϕ(z)∥p(z)) x z Iqϕ (x; z) qϕ(z) p(z)
  • 25. (b) x z Factor VAE [Kim+ 2018] • βVAE loss 
 • toral correlation
 
 
 • discriminator density ratio trick • [DL ]Disentangling by Factorising
 https://www.slideshare.net/DeepLearningJP2016/dldisentangling-by-factorising 25 ℒβ−VAE DKL (qϕ(z)∥p(z)) Iqϕ (x; z) TC (qϕ(z)) = DKL qϕ(z)∥ ∏ j qϕ (zj) ℒFactorVAE(θ, ϕ) = ℒVAE(θ, ϕ) + λ2 TC (qϕ(z))
  • 26. (c) HSIC-VAE [Lopez+ 2018] • Hilbert-Schmidt independence criterion (HSIC) [Gretton+2005] 
 • HSIC ( AppendixA ) • 
 • HFVAE [Esmaeili+ 2018] 26 zG = {zk}k∈G ℒHSIC−VAE(θ, ϕ) = ℒVAE(θ, ϕ) + λ2HSIC ( qϕ (zG1), qϕ (zG2)) s HSIC (qϕ(z), p(s)) p(s)
  • 27. PixelGAN-AE [Makhzani+ 2017] • PixelCNN[van den Oord+ 2016] 
 • • VAE loss KL 
 
 
 • KL GAN VIB[Alemi+ 2016] 
 Information dropout[Achille+ 2018] 27 ℒPixelGAN−AE(θ, ϕ) = ℒVAE(θ, ϕ) − Iqϕ (x; z) 𝔼 ̂p(x) [ DKL (qϕ(z|x)∥p(z))] = Iqϕ (x; z) + DKL (qϕ(z)∥p(z)) Iqϕ (x; z) DKL (qϕ(z)∥p(z)) : [Makhzani+ 2017]
  • 28. Variational Fair Autoencoder (VFAE) [Louizos+ 2016] • • VAE loss MMD • • MMD HSIC HSIC-VAE[Lopez+ 2018] • 2 VFAE[Louizos+ 2016] HSIC-VAE [Lopez+ 2018] 
 Fader Network[Lample+ 2017] 
 DC-IGN[Kulkarni+ 2015] 28 q(z|s = k) s s s z ℒVAEq(z|s = k′) ℒVFAE(θ, ϕ) = ℒVAE + λ2 K ∑ ℓ=2 MMD (qϕ(z|s = ℓ), qϕ(z|s = 1)) qϕ(z|s = ℓ) = ∑ i:s(i)=ℓ 1 {i : s(i) = ℓ} qϕ(z|x(i) , s(i) )
  • 29. 29
  • 31. VAE M2 [Kingma+ 2014b] • • • loss 
 • M1 (M1+M2 ) • • DL Hacks Semi-supervised Learning with Deep Generative Models
 https://www.slideshare.net/YuusukeIwasawa/dl-hacks2015-0421iwasawa • Semi-Supervised Learning with Deep Generative Models pixyz 
 https://qiita.com/kogepan102/items/22b685ce7e9a51fbab98 31 qϕ(z, y|x) = qϕ(z|y, x)qϕ(y|x) x z y x qϕ(z, y|x) qϕ(z|y, x) ℒVAEy
  • 32. VLAE Varational Lossy Autoencoder (VLAE) [Chen+ 2017] • 
 • 
 • ) 
 
 
 
 
 PixelVAE[Gulrajani+ 2017] 
 LadderVAE[Sønderby+ 2016] VLaAE[Zhao+ 2017b] 32 pθ(x|z) z z pθ(x|z) W(j) pθ(x|z) = ∏ j pθ (xj |z, xW( j)) j
  • 33. 33
  • 34. meta-prior • meta-prior • ) MNIST 
 ) (SVAE) [Johnson+ 2016] 34 p(z) N: C: Categorical M: mixture G: L; Learned Prior
  • 35. JointVAE [Dupont 2018] • disentanglement 
 • • Gumbel-Softmax • KL (β-VAE 2 ) VQ-VAE[van den Oord+ 2017] 35 z c qϕ(c|x)qϕ(z|x) qϕ(c|x) DKL (qϕ(z|x)qϕ(c|x)∥p(z)p(c)) = DKL (qϕ(z|x)∥p(z)) + DKL (qϕ(c|x)∥p(c)) ℒβ−VAE
  • 36. 36
  • 37. • Denoising Autoencoder (DAE) [Vincent+ 2008] • [Yingzhen+ 2018] [Hsieh+2018] • [Villegas+ 2017] [Denton+ 2017] [Fraccaro+ 2017] 37
  • 38. discriminator • • Adversarially Learned Inference (ALI) [Dumoulin+ 2017] • Bidirectional GAN (BiGAN) [Donahue+ 2017] 38 qϕ(z|x) pθ(x|z) pθ(x|z)p(z) qϕ(z|x) ̂p(x) : [Dumoulin+ 2017] : [Donahue+ 2017]
  • 40. Rate-Distortion Tradeoff meta-prior • ) βVAE [Higgins+ 2017] FaderNetwork[Lample+ 2017] ”Rate-Distortion Tradeoff”[Alemi+ 2018a] 40
  • 41. Rate-Distortion Tradeoff • • Distortion: • Rate: KL • VAE ELBO 41 H = − ∫ p(x)log p(x)dx = Ep(x)[−log p(x)] D = − ∬ p(x)qϕ(z|x)log pθ(x|z)dxdz = Ep(x) [ 𝔼qϕ(z|x) [−log pθ(x|z)]] R = ∬ p(x)qϕ(z|x)log qϕ(z|x) p(z) dxdz = 𝔼p(x) [DKL (qθ(q|x)∥p(z))] qϕ(z|x) p(z) ELBO = − ℒVAE = − (D + R)
  • 42. Rate-Distortion Tradeoff Rate-Distortion Tradeoff [Alemi+ 2018a] • Rate Distortion ) • ELBO • Rate 
 • • [Alemi+ 2018a] Rate 
 • 42 H − D ≤ R : [Alemi+ 2018a] D = H − R min ϕ,θ D + |σ − R| σ
  • 43. Rate-Distortion Tradeoff Rate • ( ) • ) • • ) 
 Rate-Distortion Tradeoff 43 z z
  • 44. Rate-Distortion-Usefulness Tradeoff Rate-Distortion-Usefulness Tradeoff • 3 ”usefulness” • • 
 R-D usefulness 
 44
  • 45. Rate-Distortion-Usefulness Tradeoff Usefulness • • • • [Alemi+ 2018b] 
 ….?( ) 45 Dy = − ∬ p(x, y)qϕ(z|x)log pθ(y|z)dxdydz = 𝔼p(x,y) [ 𝔼qϕ(z|x) [−log pθ(y|z)]] y R − Dy
  • 46. 46
  • 47. • meta-prior 
 • ( ) • • supervision • Rate-Distortion • “usefulness” 47
  • 48. • Rate-Distortion-Usefulness • z ex) GQN • Meta-Prior • meta-learning • [DL ]Meta-Learning Probabilistic Inference for Prediction 
 https://www.slideshare.net/DeepLearningJP2016/dlmetalearning-probabilistic-inference-for- prediction-126167192 • usefulnes ( ) • • Pixyz Pixyzoo ( ) 48
  • 49. Pixyz & Pixyzoo Pixyz https://github.com/masa-su/pixyz • (Pytorch ) • 
 
 Pixyzoo https://github.com/masa-su/pixyzoo • Pixyz • GQN VIB • [DLHacks]PyTorch, Pixyz Generative Query Network 
 https://www.slideshare.net/DeepLearningJP2016/dlhackspytorch-pixyzgenerative-query- network-126329901 49
  • 51. References [Achille+ 2018] A. Achille and S. Soatto, “Information dropout: Learning optimal representations through noisy computation,” IEEE Transactions on Pattern Analysis and Machine Intelligence, 2018. https://ieeexplore.ieee.org/document/8253482 [Alemi+ 2016] A. A. Alemi, I. Fischer, J. V. Dillon, and K. Murphy, “Deep variational information bottleneck,” in International Conference on Learning Representations, 2016. https://openreview.net/forum?id=HyxQzBceg [Alemi+ 2018a] A. Alemi, B. Poole, I. Fischer, J. Dillon, R. A. Saurous, and K. Murphy, “Fixing a broken ELBO,” in Proc. of the International Conference on Machine Learning, 2018, pp. 159–168. http://proceedings.mlr.press/v80/alemi18a.html [Alemi+ 2018b] A. A. Alemi and I. Fischer, “TherML: Thermodynamics of machine learning,” arXiv:1807.04162, 2018. https:// arxiv.org/abs/1807.04162 [Bengio+ 2013] Y. Bengio, A. Courville, and P. Vincent, “Representation learning: A review and new perspectives,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 35, no. 8, pp. 1798–1828, 2013. https://ieeexplore.ieee.org/ document/6472238 [Chen+ 2017] X. Chen, D. P. Kingma, T. Salimans, Y. Duan, P. Dhariwal, J. Schulman, I. Sutskever, and P. Abbeel, “Variational lossy autoencoder,” in International Conference on Learning Representations, 2017. https://openreview.net/forum? id=BysvGP5ee [Chen+ 2018] T. Q. Chen, X. Li, R. Grosse, and D. Duvenaud, “Isolating sources of disentanglement in variational autoencoders,” in Advances in Neural Information Processing Systems, 2018. http://papers.nips.cc/paper/7527-isolating- sources-of-disentanglement-in-variational-autoencoders 51
  • 52. [Denton+ 2017] E. L. Denton and V. Birodkar, “Unsupervised learning of disentangled representations from video,” in Advances in Neural Information Processing Systems, 2017, pp. 4414–4423. https://papers.nips.cc/paper/7028-unsupervised-learning-of- disentangled-representations-from-video [Donahue+ 2017] J. Donahue, P. Krahenb ¨ uhl, and T. Darrell, “Adversarial feature learning,” in ¨ International Conference on Learning Representations, 2017. https://openreview.net/forum?id=BJtNZAFgg [Dumoulin+ 2017] V. Dumoulin, I. Belghazi, B. Poole, O. Mastropietro, A. Lamb, M. Arjovsky, and A. Courville, “Adversarially learned inference,” in International Conference on Learning Representations, 2017. https://openreview.net/forum?id=B1ElR4cgg [Dupont 2018] E. Dupont, “Learning disentangled joint continuous and discrete representations,” in Advances in Neural Information Processing Systems, 2018. http://papers.nips.cc/paper/7351-learning-disentangled-joint-continuous-and-discrete- representations [Esmaeili+ 2018] B.Esmaeili,H.Wu,S.Jain,A.Bozkurt,N.Siddharth,B.Paige,D.H.Brooks,J.Dy,andJ.-W. van de Meent, “Structured disentangled representations,” arXiv:1804.02086, 2018. https://arxiv.org/abs/1804.02086 [Fraccaro+ 2017] M. Fraccaro, S. Kamronn, U. Paquet, and O. Winther, “A disentangled recognition and nonlinear dynamics model for unsupervised learning,” in Advances in Neural Information Processing Systems, 2017, pp. 3601–3610. https:// papers.nips.cc/paper/6951-a-disentangled-recognition-and-nonlinear-dynamics-model-for-unsupervised-learning [Gretton+ 2005] A. Gretton, O. Bousquet, A. Smola, and B. Scho ̈lkopf, “Measuring statistical dependence with Hilbert-Schmidt norms,” in International Conference on Algorithmic Learning Theory. Springer, 2005, pp. 63–77. https://link.springer.com/chapter/ 10.1007/11564089_7 [Gulrajani+ 2017] I. Gulrajani, K. Kumar, F. Ahmed, A. A. Taiga, F. Visin, D. Vazquez, and A. Courville, “PixelVAE: A latent variable model for natural images,” in International Conference on Learning Representations, 2017. https://openreview.net/ forum?id=BJKYvt5lg References 52
  • 53. [Higgins+ 2017]  I. Higgins, L. Matthey, A. Pal, C. Burgess, X. Glorot, M. Botvinick, S. Mohamed, and A. Lerchner, “beta-VAE: Learning basic visual concepts with a constrained variational framework,” in International Conference on Learning Representations, 2017. https://openreview.net/forum?id=Sy2fzU9gl [Hoffman+ 2016] M. D. Hoffman and M. J. Johnson, “Elbo surgery: yet another way to carve up the variational evidence lower bound,” in Workshop in Advances in Approximate Bayesian Inference, NIPS, 2016. http://approximateinference.org/accepted/ HoffmanJohnson2016.pdf [Hsieh+2018] J.-T. Hsieh, B. Liu, D.-A. Huang, L. Fei-Fei, and J. C. Niebles, “Learning to decompose and disentangle representations for video prediction,” in Advances in Neural Information Processing Systems, 2018. http://papers.nips.cc/paper/ 7333-learning-to-decompose-and-disentangle-representations-for-video-prediction [Johnson+ 2016] M. Johnson, D. K. Duvenaud, A. Wiltschko, R. P. Adams, and S. R. Datta, “Composing graphical models with neural networks for structured representations and fast inference,” in Advances in Neural Information Processing Systems, 2016, pp. 2946–2954. https://papers.nips.cc/paper/6379-composing-graphical-models-with-neural-networks-for-structured- representations-and-fast-inference [Kim+ 2018] H. Kim and A. Mnih, “Disentangling by factorising,” in Proc. of the International Conference on Machine Learning, 2018, pp. 2649–2658. http://proceedings.mlr.press/v80/kim18b.html [Kingma+ 2014a] D. P. Kingma and M. Welling, “Auto-encoding variational bayes,” in International Conference on Learning Representations, 2014. https://openreview.net/forum?id=33X9fd2-9FyZd [Kingma+ 2014b]  D. P. Kingma, S. Mohamed, D. J. Rezende, and M. Welling, “Semi-supervised learning with deep generative models,” in Advances in Neural Information Processing Systems, 2014, pp. 3581–3589. https://papers.nips.cc/paper/5352-semi- supervised-learning-with-deep-generative-models References 53
  • 54. [Kulkarni+ 2015] T.D.Kulkarni, W.F.Whitney, P.Kohli, and J.Tenenbaum, “Deep convolutional inverse graphics network,” in Advances in Neural Information Processing Systems, 2015, pp. 2539–2547. https://papers.nips.cc/paper/5851-deep- convolutional-inverse-graphics-network [Kumar+ 2018] A. Kumar, P. Sattigeri, and A. Balakrishnan, “Variational inference of disentangled latent concepts from unlabeled observations,” in International Conference on Learning Representations, 2018. https://openreview.net/forum? id=H1kG7GZAW [Lample+ 2017] G. Lample, N. Zeghidour, N. Usunier, A. Bordes, L. Denoyer et al., “Fader networks: Manipulating images by sliding attributes,” in Advances in Neural Information Processing Systems, 2017, pp. 5967–5976. https://papers.nips.cc/paper/ 7178-fader-networksmanipulating-images-by-sliding-attributes [Locatello+ 2018] F. Locatello, S. Bauer, M. Lucic, S. Gelly, B. Scho ̈lkopf, and O. Bachem, “Challenging common assumptions in the unsupervised learning of disentangled representations,” arXiv:1811.12359, 2018. https://arxiv.org/abs/1811.12359 [Lopez+ 2018] R. Lopez, J. Regier, M. I. Jordan, and N. Yosef, “Information constraints on auto-encoding variational bayes,” in Advances in Neural Information Processing Systems, 2018. https://papers.nips.cc/paper/7850-information-constraints-on-auto- encoding-variational-bayes [Louizos+ 2016] C. Louizos, K. Swersky, Y. Li, M. Welling, and R. Zemel, “The variational fair autoencoder,” in International Conference on Learning Representations, 2016. https://arxiv.org/abs/1511.00830 [Makhzani+ 2017] A. Makhzani and B. J. Frey, “PixelGAN autoencoders,” in Advances in Neural Information Processing Systems, 2017, pp. 1975–1985. https://papers.nips.cc/paper/6793-pixelgan-autoencoders [Sønderby+ 2016] C. K. Sønderby, T. Raiko, L. Maaløe, S. K. Sønderby, and O. Winther, “Ladder variational autoencoders,” in Advances in Neural Information Processing Systems, 2016, pp. 3738–3746. https://papers.nips.cc/paper/6275-ladder- variational-autoencoders References 54
  • 55. [van den Oord+ 2016] A. van den Oord, N. Kalchbrenner, L. Espeholt, O. Vinyals, and A. Graves, “Conditional image generation with PixelCNN decoders,” in Advances in Neural Information Processing Systems, 2016, pp. 4790–4798. https:// papers.nips.cc/paper/6527-conditional-image-generation-with-pixelcnn-decoders [van den Oord+ 2017] A. van den Oord, O. Vinyals et al., “Neural discrete representation learning,” in Advances in Neural Information Processing Systems, 2017, pp. 6306–6315. https://papers.nips.cc/paper/7210-neural-discrete-representation- learning [Villegas+ 2017] R. Villegas, J. Yang, S. Hong, X. Lin, and H. Lee, “Decomposing motion and content for natural video sequence prediction,” in International Conference on Learning Representations, 2017. https://openreview.net/forum? id=rkEFLFqee [Vincent+ 2008] P. Vincent, H. Larochelle, Y. Bengio, and P.-A. Manzagol, “Extracting and composing robust features with denoising autoencoders,” in Proc. of the International Conference on Machine Learning, 2008, pp. 1096–1103. https:// dl.acm.org/citation.cfm?id=1390294 [Yingzhen+ 2018] L. Yingzhen and S. Mandt, “Disentangled sequential autoencoder,” in Proc. of the International Conference on Machine Learning, 2018, pp. 5656–5665. http://proceedings.mlr.press/v80/yingzhen18a.html [Zhao+ 2017a] S.Zhao, J.Song, and S.Ermon,“InfoVAE: Information maximizing variational autoencoders,” arXiv:1706.02262, 2017. https://arxiv.org/abs/1706.02262 [Zhao+ 2017b] S. Zhao, J. Song, and S. Ermon, “Learning hierarchical features from deep generative models,” in Proc. of the International Conference on Machine Learning, 2017, pp. 4091–4099. http://proceedings.mlr.press/v70/zhao17c.html References 55