SlideShare a Scribd company logo
Generative Adversarial Networks
Aaron Mishkin
UBC MLRG 2018W2
1
Generative Adversial Networks
“Two imaginary celebrities that were dreamed up by a random
number generator.”
https://research.nvidia.com/publication/2017-10 Progressive-Growing-of
2
Why care about GANs?
Why to spend your limited time learning about GANs:
• GANs are achieving state-of-the-art results in a large variety
of image generation tasks.
• There’s been a veritable explosion in GAN publications over
the last few years – many people are very excited!
• GANs are stimulating new theoretical interest in min-max
optimization problems and “smooth games”.
3
Why care about GANs: Hyper-realistic Image Generation
StyleGAN: image generatation with hierarchical style transfer [3].
https://arxiv.org/abs/1812.04948 4
Why care about GANs: Conditionally Generative Models
Conditional GANs: high-resolution image synthesis via semantic
labeling [8].
Input: Segmentation Output: Synthesized Image
https://research.nvidia.com/publication/2017-12 High-Resolution-Image-Synthesis
5
Why care about GANs: Image Super Resolution
SRGAN: Photo-realistic super-resolution [4].
Bicubic Interp. SRGAN Original Image
https://arxiv.org/abs/1609.04802
6
Why care about GANs: Publications
Approximately 500 papers GAN papers as of September 2018!
See https://github.com/hindupuravinash/the-gan-zoo for the exhaustive list of papers. 7
Generative Models
Generative Modeling
Generative Models estimate the probabilistic process that
generated a set of observations D.
• D =

xi , yi
 n
i=1
: supervised generative models learn the
joint distribution p(xi , yi ), often to compute p(yi | xi ).
• D =

xi n
i=1
: unsupervised generative models learn the
distribution of D for clustering, sampling, etc. We can:
• directly estimate p(xi
),
• introducing latents yi
and estimate p(xi
, yi
).
8
Generative Modeling: Unsupervised Parametric Approaches
• Direct Estimation: Choose a parameterized family p(x | θ)
and learn θ by maximizing the log-likelihood
θ∗
= arg max θ
n
X
i=1
log p(xi
| θ).
• Latent Variable Models: Define a joint distribution
p(x, y | θ) and learn θ by maximizing the log-marginal
likelihood
θ∗
= arg max θ
n
X
i=1
log
Z
zi
p(xi
, zi
| θ)dz.
Both approaches require that p(x | θ) is easy to evaluate.
9
Generative Modeling: Models for (Very) Complex Data
How can we learn such models for very complex data?
https://www.researchgate.net/figure/Heterogeneousness-and-diversity-of-the-CIFAR-10-entries-in-their-10-
10
Generative Modeling: Normalizing Flows and VAEs
Design parameterized densities with huge capacity!
• Normalizing flows: sequence of non-linear transformations to
a simple distribution pz(z)
p(x | θ0:k) = pz(z) where z = f −1
θk
◦ · · · ◦ f −1
θ1
◦ f −1
θ0
(x) .
f −1
θj
must be invertible with tractable log-det. Jacobians.
• VAEs: latent-variable models where inference networks
specify parameters
p(x, y | θ) = p(x | fθ(y))py(y).
The marginal likelihood is maximized via the ELBO.
11
GANs
GANs: Density-Free Models
Generative Adversial Networks (GANs) instead use an
unrestricted generator Gθg (z) such that
p(x | θg ) = pz({z}) where {z} = G−1
θg
(x).
• Problem: the inverse image of Gθg (z) may be huge!
• Problem: it’s likely intractable to preserve volume through
G(z; θg ).
So, we can’t evaluate p(x | θg ) and we can’t learn θg by maximum
likelihood.
12
GANs: Discriminators
GANs learn by comparing model samples with examples from D.
• Sampling from the generator is easy:
x̂ = Gθg (ẑ), where ẑ ∼ pz(z).
• Given a sample x̂, a discriminator tries to distinguish it from
true examples:
D(x) = Pr (x ∼ pdata) .
• The discriminator “supervises” the generator network.
13
GANs: Generator + Descriminator
https://www.slideshare.net/xavigiro/deep-learning-for-computer-vision-generative-models-and-adversarial-
training-upc-2016
14
GANs: Goodfellow et al. (2014)
• Let z ∈ Rm and pz(z) be a simple base distribution.
• The generator Gθg (z) : Rm → D̃ is a deep neural network.
• D̃ is the manifold of generated examples.
• The discriminator Dθd
(x) : D ∪ D̃ → (0, 1) is also a deep
neural network.
https://arxiv.org/abs/1511.06434
15
GANs: Saddle-Point Optimization
Saddle-Point Optimization: learn Gθg (z) and Dθd
(x) jointly via
the objective V (θd , θg ):
min
θg
max
θd
Epdata
[log Dθd
(x)]
| {z }
likelihood of true data
+ Epz(z)

log 1 − Dθd
(Gθg (z))

| {z }
likelihood of generated data
16
GANs: Optimal Discriminators
Claim: Given Gθg defining an implicit distribution pg = p(x | θg ),
the optimal descriminator is
D∗
(x) =
pdata(x)
pdata(x) + pg(x)
.
Proof Sketch:
V (θd , θg ) =
Z
D
pdata(x) log D(x)dx +
Z
D̃
p(z) log(1 − D(Gθg (z)))dz
=
Z
D∪D̃
pdata(x) log D(x) + pg (x) log(1 − D(x))dx
Maximizing the integrand for all x is sufficient and gives the result
(see bonus slides).
Previous Slide: https://commons.wikimedia.org/wiki/File:Saddle point.svg
17
GANs: Jensen-Shannon Divergence and Optimal Generators
Given an optimal discriminator D∗(x), the generator objective is
C(θg ) = Epdata

log D∗
θd
(x)

+ Epg (x)

log 1 − D∗
θd
(x)

= Epdata

log
pdata(x)
pdata(x) + pg(x)

+ Epg (x)

log
pg (x)
pdata(x) + pg(x)

∝
1
2
KL

pdata
(pdata + pg )
2

+
1
2
KL

pg
(pdata + pg )
2

| {z }
Jensen-Shannon Divergence
C(θg ) achives its global minimum at pg = pdata given an optimal
discriminator!
18
GANs: Learning Generators and Discriminators
Putting these results to use in practice:
• High-capacity discriminators Dθd
approximate the
Jensen-Shannon divergence when close to global maximum.
• Dθd
is a “differentiable program”.
• We can use Dθd
to learn Gθg with our favourite gradient
descent method.
https://arxiv.org/abs/1511.06434
19
GANs: Training Procedure
for i = 1 . . . N do
for k = 1 . . . K do
• Sample noise samples {z1, . . . , zm} ∼ pz(z)
• Sample examples {x1, . . . , xm} from pdata(x).
• Update the discriminator Dθd
:
θd = θd −αd ∇θd
1
m
m
X
i=1

log D xi

+ log 1 − D G zi

.
end for
• Sample noise samples {z1, . . . , zm} ∼ pz(z).
• Update the generator Gθg :
θg = θg − αg ∇θg
1
m
m
X
i=1
log 1 − D G zi

.
end for 20
Problems (c. 2016)
Problems with GANs
• Vanishing gradients: the discriminator becomes ”too good”
and the generator gradient vanishes.
• Non-Convergence: the generator and discriminator oscillate
without reaching an equilibrium.
• Mode Collapse: the generator distribution collapses to a
small set of examples.
• Mode Dropping: the generator distribution doesn’t fully
cover the data distribution.
21
Problems: Vanishing Gradients
• The minimax objective saturates when Dθd
is close to perfect:
V (θd , θg ) = Epdata
[log Dθd
(x)]+Epz(z)

log 1 − Dθd
(Gθg (z))

.
• A non-saturating heuristic objective for the generator is
J(Gθg ) = −Epz(z)

log Dθd
(Gθg (z))

.
https://arxiv.org/abs/1701.00160 22
Problems: Addressing Vanishing Gradients
Solutions:
• Change Objectives: use the non-saturating heuristic
objective, maximum-likelihood cost, etc.
• Limit Discriminator: restrict the capacity of the
discriminator.
• Schedule Learning: try to balance training Dθd
and Gθg .
23
Problems: Non-Convergence
Simultaneous gradient descent is not guaranteed to converge for
minimax objectives.
• Goodfellow et al. only showed convergence when updates are
made in the function space [2].
• The parameterization of Dθd
and Gθg results in highly
non-convex objective.
• In practice, training tends to oscillate – updates “undo” each
other.
24
Problems: Addressing Non-Convergence
Solutions: Lots and lots of hacks!
https://github.com/soumith/ganhacks
25
Problems: Mode Collapse and Mode Dropping
One Explanation: SGD may optimize the max-min objective
max
θd
min
θg
Epdata
[log Dθd
(x)] + Epz(z)

log 1 − Dθd
(Gθg (z))

Intuition: the generator maps all z values to the x̂ that is mostly
likely to fool the discriminator.
https://arxiv.org/abs/1701.00160
26
A Possible Solution
A Possible Solution: Alternative Divergences
There are a large variety of divergence measures for distributions:
• f-Divergences: (e.g. Jensen-Shannon, Kullback-Leibler)
Df (P ||Q) =
Z
χ
q(x)f (
p(x)
q(x)
)dx
• GANs [2], f-GANs [7], and more.
• Integral Probability Metrics: (e.g. Earth Movers Distance,
Maximum Mean Discrepancy)
γF (P ||Q) = sup
f ∈F
Z
fdP −
Z
fdQ
• Wasserstein GANs [1], Fisher GANs [6], Sobolev GANs [5] and
more.
27
A Possible Solution: Wasserstein GANs
Wasserstein GANs: Strong theory and excellent empirical results.
• “In no experiment did we see evidence of mode collapse for
the WGAN algorithm.” [1]
https://arxiv.org/abs/1701.07875
28
Summary
Summary
Recap:
• GANs are a class of density-free generative models with
(mostly) unrestricted generator functions.
• Introducing adversial discriminator networks allows GANs to
learn by minimizing the Jensen-Shannon divergence.
• Concurrently learning the generator and discriminator is
challenging due to
• Vanishing Gradients,
• Non-convergence due to oscilliation
• Mode collapse and mode dropping.
• A variety of alternative objective functions are being proposed.
29
Agknowledgements and References
There are lots of excellent references on GANs:
• Sebastian Nowozin’s presentation at MLSS 2018.
• NIPS 2016 tutorial on GANs by Ian Goodfellow.
• A nice explanation of Wasserstein GANs by Alex Irpan.
30
Bonus: Optimal Discriminators Cont.
The integrand
h(D(x)) = pdata(x) log D(x) + pg (x) log(1 − D(x))
is concave for D(x) ∈ (0, 1). We take the derivative and compute
a stationary point in the domain:
∂h(D(x))
∂D(x)
=
pdata(x)
D(x)
−
pg (x)
1 − D(x)
= 0
⇒ D(x) =
pdata(x)
pdata(x) + pg(x)
.
This minimizes the integrand over the domain of the discriminator,
completing the proof.
31
References i
Martin Arjovsky, Soumith Chintala, and Léon Bottou.
Wasserstein gan.
arXiv preprint arXiv:1701.07875, 2017.
Ian J Goodfellow, Jean Pouget-Abadie, Mehdi Mirza, Bing Xu, David
Warde-Farley, Sherjil Ozair, Aaron Courville, and Yoshua Bengio.
Generative adversarial networks. arxiv e-prints.
arXiv preprint arXiv:1406.2661, 2014.
Tero Karras, Samuli Laine, and Timo Aila.
A style-based generator architecture for generative adversarial
networks.
arXiv preprint arXiv:1812.04948, 2018.
32
References ii
Christian Ledig, Lucas Theis, Ferenc Huszár, Jose Caballero, Andrew
Cunningham, Alejandro Acosta, Andrew Aitken, Alykhan Tejani, Johannes
Totz, Zehan Wang, et al.
Photo-realistic single image super-resolution using a generative
adversarial network.
In Proceedings of the IEEE conference on computer vision and pattern
recognition, pages 4681–4690, 2017.
Youssef Mroueh, Chun-Liang Li, Tom Sercu, Anant Raj, and Yu Cheng.
Sobolev gan.
arXiv preprint arXiv:1711.04894, 2017.
Youssef Mroueh and Tom Sercu.
Fisher gan.
In Advances in Neural Information Processing Systems, pages 2513–2523,
2017.
33
References iii
Sebastian Nowozin, Botond Cseke, and Ryota Tomioka.
f-gan: Training generative neural samplers using variational
divergence minimization.
In Advances in neural information processing systems, pages 271–279,
2016.
Ting-Chun Wang, Ming-Yu Liu, Jun-Yan Zhu, Andrew Tao, Jan Kautz,
and Bryan Catanzaro.
High-resolution image synthesis and semantic manipulation with
conditional gans.
In Proceedings of the IEEE Conference on Computer Vision and Pattern
Recognition, pages 8798–8807, 2018.
34

More Related Content

PDF
1시간만에 GAN(Generative Adversarial Network) 완전 정복하기
PDF
PDF
Tutorial on Theory and Application of Generative Adversarial Networks
PPTX
GAN_SANTHOSH KUMAR_Architecture_in_network
PDF
Generative adversarial networks
PDF
Deep Generative Models - Kevin McGuinness - UPC Barcelona 2018
PPTX
ICASSP 2018 Tutorial: Generative Adversarial Network and its Applications to ...
PDF
Deep Generative Models II (DLAI D10L1 2017 UPC Deep Learning for Artificial I...
1시간만에 GAN(Generative Adversarial Network) 완전 정복하기
Tutorial on Theory and Application of Generative Adversarial Networks
GAN_SANTHOSH KUMAR_Architecture_in_network
Generative adversarial networks
Deep Generative Models - Kevin McGuinness - UPC Barcelona 2018
ICASSP 2018 Tutorial: Generative Adversarial Network and its Applications to ...
Deep Generative Models II (DLAI D10L1 2017 UPC Deep Learning for Artificial I...

Similar to Introduction to Generative Adversarial Network (20)

PDF
A Short Introduction to Generative Adversarial Networks
PPTX
GDC2019 - SEED - Towards Deep Generative Models in Game Development
PPT
GNA 13552928 deep learning for GAN a.ppt
PDF
Generative adversarial networks
PDF
Jakub Langr (University of Oxford) - Overview of Generative Adversarial Netwo...
PPTX
A (Very) Gentle Introduction to Generative Adversarial Networks (a.k.a GANs)
PDF
[GAN by Hung-yi Lee]Part 1: General introduction of GAN
PPTX
GAN Deep Learning Approaches to Image Processing Applications (1).pptx
PPTX
Generative Adversarial Networks (GAN)
PPTX
gan-190318135433 (1).pptx
PDF
PR-048: Towards Principled Methods for Training Generative Adversarial Networks
PDF
Vladislav Kolbasin “Introduction to Generative Adversarial Networks (GANs)”
PDF
Generative Adversarial Networks GAN - Santiago Pascual - UPC Barcelona 2018
PDF
[PR12] intro. to gans jaejun yoo
PDF
Deep image generating models
PDF
GANs and Applications
PDF
Deep Generative Modelling (updated)
PDF
11_gan.pdf
PDF
Unsupervised learning represenation with DCGAN
PPTX
BEGAN Boundary Equilibrium Generative Adversarial Networks
A Short Introduction to Generative Adversarial Networks
GDC2019 - SEED - Towards Deep Generative Models in Game Development
GNA 13552928 deep learning for GAN a.ppt
Generative adversarial networks
Jakub Langr (University of Oxford) - Overview of Generative Adversarial Netwo...
A (Very) Gentle Introduction to Generative Adversarial Networks (a.k.a GANs)
[GAN by Hung-yi Lee]Part 1: General introduction of GAN
GAN Deep Learning Approaches to Image Processing Applications (1).pptx
Generative Adversarial Networks (GAN)
gan-190318135433 (1).pptx
PR-048: Towards Principled Methods for Training Generative Adversarial Networks
Vladislav Kolbasin “Introduction to Generative Adversarial Networks (GANs)”
Generative Adversarial Networks GAN - Santiago Pascual - UPC Barcelona 2018
[PR12] intro. to gans jaejun yoo
Deep image generating models
GANs and Applications
Deep Generative Modelling (updated)
11_gan.pdf
Unsupervised learning represenation with DCGAN
BEGAN Boundary Equilibrium Generative Adversarial Networks
Ad

Recently uploaded (20)

PDF
III.4.1.2_The_Space_Environment.p pdffdf
PDF
BMEC211 - INTRODUCTION TO MECHATRONICS-1.pdf
PPTX
CYBER-CRIMES AND SECURITY A guide to understanding
PDF
Unit I ESSENTIAL OF DIGITAL MARKETING.pdf
PDF
The CXO Playbook 2025 – Future-Ready Strategies for C-Suite Leaders Cerebrai...
PDF
Mohammad Mahdi Farshadian CV - Prospective PhD Student 2026
PDF
TFEC-4-2020-Design-Guide-for-Timber-Roof-Trusses.pdf
PPTX
additive manufacturing of ss316l using mig welding
PDF
PREDICTION OF DIABETES FROM ELECTRONIC HEALTH RECORDS
PPTX
Sustainable Sites - Green Building Construction
PDF
Enhancing Cyber Defense Against Zero-Day Attacks using Ensemble Neural Networks
PDF
R24 SURVEYING LAB MANUAL for civil enggi
PPTX
UNIT 4 Total Quality Management .pptx
PPTX
OOP with Java - Java Introduction (Basics)
PDF
Operating System & Kernel Study Guide-1 - converted.pdf
PDF
737-MAX_SRG.pdf student reference guides
PPTX
MET 305 2019 SCHEME MODULE 2 COMPLETE.pptx
PDF
composite construction of structures.pdf
PDF
Model Code of Practice - Construction Work - 21102022 .pdf
PPT
Mechanical Engineering MATERIALS Selection
III.4.1.2_The_Space_Environment.p pdffdf
BMEC211 - INTRODUCTION TO MECHATRONICS-1.pdf
CYBER-CRIMES AND SECURITY A guide to understanding
Unit I ESSENTIAL OF DIGITAL MARKETING.pdf
The CXO Playbook 2025 – Future-Ready Strategies for C-Suite Leaders Cerebrai...
Mohammad Mahdi Farshadian CV - Prospective PhD Student 2026
TFEC-4-2020-Design-Guide-for-Timber-Roof-Trusses.pdf
additive manufacturing of ss316l using mig welding
PREDICTION OF DIABETES FROM ELECTRONIC HEALTH RECORDS
Sustainable Sites - Green Building Construction
Enhancing Cyber Defense Against Zero-Day Attacks using Ensemble Neural Networks
R24 SURVEYING LAB MANUAL for civil enggi
UNIT 4 Total Quality Management .pptx
OOP with Java - Java Introduction (Basics)
Operating System & Kernel Study Guide-1 - converted.pdf
737-MAX_SRG.pdf student reference guides
MET 305 2019 SCHEME MODULE 2 COMPLETE.pptx
composite construction of structures.pdf
Model Code of Practice - Construction Work - 21102022 .pdf
Mechanical Engineering MATERIALS Selection
Ad

Introduction to Generative Adversarial Network

  • 1. Generative Adversarial Networks Aaron Mishkin UBC MLRG 2018W2 1
  • 2. Generative Adversial Networks “Two imaginary celebrities that were dreamed up by a random number generator.” https://research.nvidia.com/publication/2017-10 Progressive-Growing-of 2
  • 3. Why care about GANs? Why to spend your limited time learning about GANs: • GANs are achieving state-of-the-art results in a large variety of image generation tasks. • There’s been a veritable explosion in GAN publications over the last few years – many people are very excited! • GANs are stimulating new theoretical interest in min-max optimization problems and “smooth games”. 3
  • 4. Why care about GANs: Hyper-realistic Image Generation StyleGAN: image generatation with hierarchical style transfer [3]. https://arxiv.org/abs/1812.04948 4
  • 5. Why care about GANs: Conditionally Generative Models Conditional GANs: high-resolution image synthesis via semantic labeling [8]. Input: Segmentation Output: Synthesized Image https://research.nvidia.com/publication/2017-12 High-Resolution-Image-Synthesis 5
  • 6. Why care about GANs: Image Super Resolution SRGAN: Photo-realistic super-resolution [4]. Bicubic Interp. SRGAN Original Image https://arxiv.org/abs/1609.04802 6
  • 7. Why care about GANs: Publications Approximately 500 papers GAN papers as of September 2018! See https://github.com/hindupuravinash/the-gan-zoo for the exhaustive list of papers. 7
  • 9. Generative Modeling Generative Models estimate the probabilistic process that generated a set of observations D. • D = xi , yi n i=1 : supervised generative models learn the joint distribution p(xi , yi ), often to compute p(yi | xi ). • D = xi n i=1 : unsupervised generative models learn the distribution of D for clustering, sampling, etc. We can: • directly estimate p(xi ), • introducing latents yi and estimate p(xi , yi ). 8
  • 10. Generative Modeling: Unsupervised Parametric Approaches • Direct Estimation: Choose a parameterized family p(x | θ) and learn θ by maximizing the log-likelihood θ∗ = arg max θ n X i=1 log p(xi | θ). • Latent Variable Models: Define a joint distribution p(x, y | θ) and learn θ by maximizing the log-marginal likelihood θ∗ = arg max θ n X i=1 log Z zi p(xi , zi | θ)dz. Both approaches require that p(x | θ) is easy to evaluate. 9
  • 11. Generative Modeling: Models for (Very) Complex Data How can we learn such models for very complex data? https://www.researchgate.net/figure/Heterogeneousness-and-diversity-of-the-CIFAR-10-entries-in-their-10- 10
  • 12. Generative Modeling: Normalizing Flows and VAEs Design parameterized densities with huge capacity! • Normalizing flows: sequence of non-linear transformations to a simple distribution pz(z) p(x | θ0:k) = pz(z) where z = f −1 θk ◦ · · · ◦ f −1 θ1 ◦ f −1 θ0 (x) . f −1 θj must be invertible with tractable log-det. Jacobians. • VAEs: latent-variable models where inference networks specify parameters p(x, y | θ) = p(x | fθ(y))py(y). The marginal likelihood is maximized via the ELBO. 11
  • 13. GANs
  • 14. GANs: Density-Free Models Generative Adversial Networks (GANs) instead use an unrestricted generator Gθg (z) such that p(x | θg ) = pz({z}) where {z} = G−1 θg (x). • Problem: the inverse image of Gθg (z) may be huge! • Problem: it’s likely intractable to preserve volume through G(z; θg ). So, we can’t evaluate p(x | θg ) and we can’t learn θg by maximum likelihood. 12
  • 15. GANs: Discriminators GANs learn by comparing model samples with examples from D. • Sampling from the generator is easy: x̂ = Gθg (ẑ), where ẑ ∼ pz(z). • Given a sample x̂, a discriminator tries to distinguish it from true examples: D(x) = Pr (x ∼ pdata) . • The discriminator “supervises” the generator network. 13
  • 16. GANs: Generator + Descriminator https://www.slideshare.net/xavigiro/deep-learning-for-computer-vision-generative-models-and-adversarial- training-upc-2016 14
  • 17. GANs: Goodfellow et al. (2014) • Let z ∈ Rm and pz(z) be a simple base distribution. • The generator Gθg (z) : Rm → D̃ is a deep neural network. • D̃ is the manifold of generated examples. • The discriminator Dθd (x) : D ∪ D̃ → (0, 1) is also a deep neural network. https://arxiv.org/abs/1511.06434 15
  • 18. GANs: Saddle-Point Optimization Saddle-Point Optimization: learn Gθg (z) and Dθd (x) jointly via the objective V (θd , θg ): min θg max θd Epdata [log Dθd (x)] | {z } likelihood of true data + Epz(z) log 1 − Dθd (Gθg (z)) | {z } likelihood of generated data 16
  • 19. GANs: Optimal Discriminators Claim: Given Gθg defining an implicit distribution pg = p(x | θg ), the optimal descriminator is D∗ (x) = pdata(x) pdata(x) + pg(x) . Proof Sketch: V (θd , θg ) = Z D pdata(x) log D(x)dx + Z D̃ p(z) log(1 − D(Gθg (z)))dz = Z D∪D̃ pdata(x) log D(x) + pg (x) log(1 − D(x))dx Maximizing the integrand for all x is sufficient and gives the result (see bonus slides). Previous Slide: https://commons.wikimedia.org/wiki/File:Saddle point.svg 17
  • 20. GANs: Jensen-Shannon Divergence and Optimal Generators Given an optimal discriminator D∗(x), the generator objective is C(θg ) = Epdata log D∗ θd (x) + Epg (x) log 1 − D∗ θd (x) = Epdata log pdata(x) pdata(x) + pg(x) + Epg (x) log pg (x) pdata(x) + pg(x) ∝ 1 2 KL pdata (pdata + pg ) 2 + 1 2 KL pg (pdata + pg ) 2 | {z } Jensen-Shannon Divergence C(θg ) achives its global minimum at pg = pdata given an optimal discriminator! 18
  • 21. GANs: Learning Generators and Discriminators Putting these results to use in practice: • High-capacity discriminators Dθd approximate the Jensen-Shannon divergence when close to global maximum. • Dθd is a “differentiable program”. • We can use Dθd to learn Gθg with our favourite gradient descent method. https://arxiv.org/abs/1511.06434 19
  • 22. GANs: Training Procedure for i = 1 . . . N do for k = 1 . . . K do • Sample noise samples {z1, . . . , zm} ∼ pz(z) • Sample examples {x1, . . . , xm} from pdata(x). • Update the discriminator Dθd : θd = θd −αd ∇θd 1 m m X i=1 log D xi + log 1 − D G zi . end for • Sample noise samples {z1, . . . , zm} ∼ pz(z). • Update the generator Gθg : θg = θg − αg ∇θg 1 m m X i=1 log 1 − D G zi . end for 20
  • 24. Problems with GANs • Vanishing gradients: the discriminator becomes ”too good” and the generator gradient vanishes. • Non-Convergence: the generator and discriminator oscillate without reaching an equilibrium. • Mode Collapse: the generator distribution collapses to a small set of examples. • Mode Dropping: the generator distribution doesn’t fully cover the data distribution. 21
  • 25. Problems: Vanishing Gradients • The minimax objective saturates when Dθd is close to perfect: V (θd , θg ) = Epdata [log Dθd (x)]+Epz(z) log 1 − Dθd (Gθg (z)) . • A non-saturating heuristic objective for the generator is J(Gθg ) = −Epz(z) log Dθd (Gθg (z)) . https://arxiv.org/abs/1701.00160 22
  • 26. Problems: Addressing Vanishing Gradients Solutions: • Change Objectives: use the non-saturating heuristic objective, maximum-likelihood cost, etc. • Limit Discriminator: restrict the capacity of the discriminator. • Schedule Learning: try to balance training Dθd and Gθg . 23
  • 27. Problems: Non-Convergence Simultaneous gradient descent is not guaranteed to converge for minimax objectives. • Goodfellow et al. only showed convergence when updates are made in the function space [2]. • The parameterization of Dθd and Gθg results in highly non-convex objective. • In practice, training tends to oscillate – updates “undo” each other. 24
  • 28. Problems: Addressing Non-Convergence Solutions: Lots and lots of hacks! https://github.com/soumith/ganhacks 25
  • 29. Problems: Mode Collapse and Mode Dropping One Explanation: SGD may optimize the max-min objective max θd min θg Epdata [log Dθd (x)] + Epz(z) log 1 − Dθd (Gθg (z)) Intuition: the generator maps all z values to the x̂ that is mostly likely to fool the discriminator. https://arxiv.org/abs/1701.00160 26
  • 31. A Possible Solution: Alternative Divergences There are a large variety of divergence measures for distributions: • f-Divergences: (e.g. Jensen-Shannon, Kullback-Leibler) Df (P ||Q) = Z χ q(x)f ( p(x) q(x) )dx • GANs [2], f-GANs [7], and more. • Integral Probability Metrics: (e.g. Earth Movers Distance, Maximum Mean Discrepancy) γF (P ||Q) = sup f ∈F Z fdP − Z fdQ • Wasserstein GANs [1], Fisher GANs [6], Sobolev GANs [5] and more. 27
  • 32. A Possible Solution: Wasserstein GANs Wasserstein GANs: Strong theory and excellent empirical results. • “In no experiment did we see evidence of mode collapse for the WGAN algorithm.” [1] https://arxiv.org/abs/1701.07875 28
  • 34. Summary Recap: • GANs are a class of density-free generative models with (mostly) unrestricted generator functions. • Introducing adversial discriminator networks allows GANs to learn by minimizing the Jensen-Shannon divergence. • Concurrently learning the generator and discriminator is challenging due to • Vanishing Gradients, • Non-convergence due to oscilliation • Mode collapse and mode dropping. • A variety of alternative objective functions are being proposed. 29
  • 35. Agknowledgements and References There are lots of excellent references on GANs: • Sebastian Nowozin’s presentation at MLSS 2018. • NIPS 2016 tutorial on GANs by Ian Goodfellow. • A nice explanation of Wasserstein GANs by Alex Irpan. 30
  • 36. Bonus: Optimal Discriminators Cont. The integrand h(D(x)) = pdata(x) log D(x) + pg (x) log(1 − D(x)) is concave for D(x) ∈ (0, 1). We take the derivative and compute a stationary point in the domain: ∂h(D(x)) ∂D(x) = pdata(x) D(x) − pg (x) 1 − D(x) = 0 ⇒ D(x) = pdata(x) pdata(x) + pg(x) . This minimizes the integrand over the domain of the discriminator, completing the proof. 31
  • 37. References i Martin Arjovsky, Soumith Chintala, and Léon Bottou. Wasserstein gan. arXiv preprint arXiv:1701.07875, 2017. Ian J Goodfellow, Jean Pouget-Abadie, Mehdi Mirza, Bing Xu, David Warde-Farley, Sherjil Ozair, Aaron Courville, and Yoshua Bengio. Generative adversarial networks. arxiv e-prints. arXiv preprint arXiv:1406.2661, 2014. Tero Karras, Samuli Laine, and Timo Aila. A style-based generator architecture for generative adversarial networks. arXiv preprint arXiv:1812.04948, 2018. 32
  • 38. References ii Christian Ledig, Lucas Theis, Ferenc Huszár, Jose Caballero, Andrew Cunningham, Alejandro Acosta, Andrew Aitken, Alykhan Tejani, Johannes Totz, Zehan Wang, et al. Photo-realistic single image super-resolution using a generative adversarial network. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 4681–4690, 2017. Youssef Mroueh, Chun-Liang Li, Tom Sercu, Anant Raj, and Yu Cheng. Sobolev gan. arXiv preprint arXiv:1711.04894, 2017. Youssef Mroueh and Tom Sercu. Fisher gan. In Advances in Neural Information Processing Systems, pages 2513–2523, 2017. 33
  • 39. References iii Sebastian Nowozin, Botond Cseke, and Ryota Tomioka. f-gan: Training generative neural samplers using variational divergence minimization. In Advances in neural information processing systems, pages 271–279, 2016. Ting-Chun Wang, Ming-Yu Liu, Jun-Yan Zhu, Andrew Tao, Jan Kautz, and Bryan Catanzaro. High-resolution image synthesis and semantic manipulation with conditional gans. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 8798–8807, 2018. 34