Hyperparameter Optimization with Hyperband Algorithm

Hyperparameter Optimization
with Hyperband Algorithm
Deep Learning Meetup Italy

● Gilberto Batres-Estrada
Senior Data Scientist @ Trell Technologies
● AIFI: Graduate teaching fellow
● Co-author: Big Data and Machine Learning
in Quantitative Investment, Wiley. (Ch on LSTM)
● MSc in Theoretical Physics, Stockholm University
● MSc in Engineering: Applied Mathematics and Statistics ,
(KTH Royal Institute of Technology) in Stockholm.

Goals for today’s talk
1. Make the training process of neural networks faster
2. Get better performance and accurate neural networks (better test error)
3. To get more time for exploring different architectures

Agenda
● Random Search for Hyper-Parameter Optimization
● Bayesian optimization
● Hyperband
● Other methods
● Implementations and examples

Random Search
Proposed by James Bergstra and Yoshua Bengio
http://www.jmlr.org/papers/volume13/bergstra12a/bergstra12a.pdf

Bayesian Optimization
Model the conditional probability
Where y is an evaluation metric such as test error and
is a set of hyperparameters.

Sequential Model-Based Algorithm Configuration SMAC
SMAC uses random forest to model
as a Gaussian Distribution (Hetter et al., 2011)

Tree Structured Parzen Estimator (TPE)
TPE is a non-standard Bayesian optimization algorithm based on tree-structured
Parzen density estimators (Bergstra et al., 2011)

Spearmint
Uses Gaussian Processes (GP) to model
And Performs slice sampling over GP (Sonek et al. 2012)

Hyperband
Successive Halving
Hyperband extends Successive Halving (Jamieson and Talwalkar, 2005) and uses it as a
subroutine
● Uniformly allocate a budget to a set of hyperparameter configurations
● Evaluate the performance of all configurations
● Throw out the worst half
● Repeat until one configuration remains
The algorithm allocates exponentially more resources to more promising configurations.
Lisha Li et al. (2018) http://jmlr.org/papers/volume18/16-558/16-558.pdf

Hyperband
● get_hyperparameter_configuration(n): returns a set of n i.i.d samples from some
distribution defined over the hyperparameter configuration space. Uniformly sample the hyperparameters from
a predefined space (hypercube with min and max bounds for each hyperparameter).
● run_then_return_val_loss(t, r): a function that takes a hyperparameter configuration t
and resource allocation r as input and returns the validation loss after training the configuration for the
allocated resources.
● top_k(configs, losses, k): a function that takes a set of configurations as well as their
associated losses and returns the top k performing configurations.

Hyperband: Implementation

Finding the right hyperparameter configuration
Takeaways from Figure 2, more resources are needed to differentiate between the two configurations when
either:
1. The envelope functions are wider
2. The terminal losses are closer together

Example from the Paper: LeNet, Parameter Space

Experiment in the Paper
CNN used in Snoek et al. (2012) and Domhan et al. (2015)
Data-sets
● CIFAR-10 (40k, 10k, 10k)
● Rotated MNIST with Background images (MRBI)
(Larochelle et al., 2007) (10k, 2k, 50k)
● Street View House Numbers (SVHN) (600k, 6k, 26k)

Keras Tuner: Hyperparameter search
https://keras-team.github.io/keras-tuner/
Source code for Hyperband:
https://github.com/keras-team/keras-tuner/blob/master/kerastuner/tuners/hyperband.py

Other Methods: Cyclical Learning Rate
Lesley N. Smith
https://arxiv.org/pdf/1506.01186.pdf

Cyclical Learning Rate (CLR)
Torch:

Learning Rate Scheduler tf.keras

References
Gilberto Batres-Estrada
+46703387868
gilberto.batres-estrada@live.com
Repository https://github.com/gilberto-BE/deep_learning_italia
Cyclical Learning Rate: https://arxiv.org/pdf/1506.01186.pdf
Random Search: http://www.jmlr.org/papers/volume13/bergstra12a/bergstra12a.pdf
Keras tuner: https://keras-team.github.io/keras-tuner/
Learning Rate Scheduler: fastai (pytorch high level API) https://docs.fast.ai/callbacks.one_cycle.html
Source code for Hyperband: https://github.com/keras-team/keras-tuner/blob/master/kerastuner/tuners/hyperband.py

Hyperparameter Optimization with Hyperband Algorithm

Recommended

More Related Content

What's hot (20)

Similar to Hyperparameter Optimization with Hyperband Algorithm (20)

More from Deep Learning Italia (20)

Recently uploaded (20)

Hyperparameter Optimization with Hyperband Algorithm