Score based generative model

Score-based Generative
Models
이상윤

Generative Models
“Learning causal relation”
https://web.eecs.umich.edu/~justincj/slides/eecs498/498_FA2019_lecture19.pdf

Generative Models
GAN: More focusing on sampling
VAE: Maximize the lower bound of likelihood using surrogate loss
Normalizing Flow: Exact likelihood maximization via invertible transformations
https://lilianweng.github.io/lil-log/2018/10/13/flow-based-deep-generative-models.html

Generative Models - limitations
GAN: High fidelity but hard to train
VAE: Not exact likelihood maximization
Normalizing Flow: Lack of flexibility (must be invertible)
https://lilianweng.github.io/lil-log/2018/10/13/flow-based-deep-generative-models.html

Motivations
Sampling without computing probability density through langevin
dynamics (similar to gradient ascend)

Motivations
Why we need PDF?
- MLE equal to minimizing KLD with real data distribution.
- In PDF, all densities compete with each other.
https://web.eecs.umich.edu/~justincj/slides/eecs498/498_FA2019_lecture19.pdf

Motivations
: normalizing constant
: unnormalized function
or energy function
- The integral part of normalizing constant is intractable.
- AR, NF handle this problem via special structures which make the unit normalizing constant.
It is difficult to define a flexible, high-capacity probability density function

Motivations
Score-based models naturally bypass the problem.

Score matching
Gradient is zero in data points. Local maxima regularization
But computing hessian trace
needs O(D) backprop!

Score matching - sliced score matching
Project the score onto random vectors
Equals to
Hessian-vector product, https://en.wikipedia.org/wiki/Hessian_automatic_differentiation
-> single backprop

Score matching - denoising score matching
Predict the score of the perturbed distribution instead of p(x).
Minimizing the object function above give us the optimal score function
But results could be noisy if is large
Let be gaussian,

Pitfalls
Now that we have score function, so let’s sample through langevin dynamics.
Under some conditions, it is proved that is the exact sample from p(x)!
But there are some problems...

Pitfalls - manifold hypothesis
The manifold hypothesis: Data lie on low-dimensional manifold.
Problems
- Score function is inaccurate in the low-density regions.
- It is difficult to recover the relative weights between modes.

Pitfalls - inaccurate score function
is not well-defined in the low density regions.
Since we train the objective function above via montecarlo estimation,
a model observes less samples from the low density regions.

Score matching - recovering relative weights

Score matching - recovering relative weights
Langevin dynamics failed to recover the relative weights between two modes.

Perturbed distribution
The low density regions can be filled by injecting the noises.
But how to determine the noise strength? -> gradually decrease the variance.

Noise Conditional Score Network (NCSN)
A single network estimates the score of multiple perturbed data distribution.

Results
Generative Modeling by Estimating Gradients of the Data Distribution, NeurIPS2019, Oral

Results
SCORE-BASED GENERATIVE MODELING THROUGH STOCHASTIC DIFFERENTIAL EQUATIONS,
ICLR2021 (Oral)

Inverse Problem Solving
Inverse problem is Bayesian inference problem. We want to know p(x|y) when p(y|x) is given. E.g,
super resolution, colorization, inpainting...
score matching
known forward
process
We could sample from p(x|y) via langevin dynamics!

Inverse Problem Solving
Inverse problem solving without retraining!

Conclusion
Very flexible architecture compared to NF or AR. Score function can be any function.
Therefore, modern DL architectures can be used (Resnet, UNet, etc).
Sampling from exact p(x) is possible compared to VAEs which use surrogate loss.
GAN-level fidelity without minmax game.
Naturally solves inverse problem.

References
Song, Yang, and Stefano Ermon. "Generative modeling by estimating gradients of the data distribution." arXiv
preprint arXiv:1907.05600 (2019).
Song, Yang, et al. "Score-based generative modeling through stochastic differential equations." arXiv preprint
arXiv:2011.13456 (2020).
Yang Song blog (https://yang-song.github.io/blog/2021/score/)
Stefano Ermon seminar (https://www.youtube.com/watch?v=8TcNXi3A5DI&t=562s)

Score based generative model

More Related Content

What's hot (20)

Similar to Score based generative model (20)

Recently uploaded (20)

Score based generative model