Getting Started with Optimization.jl
In this tutorial, we introduce the basics of Optimization.jl by showing how to easily mix local optimizers and global optimizers on the Rosenbrock equation.
The Rosenbrock equation is defined as follows:
\[f(u,p) = (p_1 - u_1)^2 + p_2 * ( u_2 - u_1^2)^2\]
This is a parameterized optimization problem where we want to solve for the vector u
s.t. u
minimizes f
. The simplest copy-pasteable code using a quasi-Newton method (LBFGS) to solve the Rosenbrock problem is the following:
# Import the package and define the problem to optimize
using Optimization, Zygote
rosenbrock(u, p) = (p[1] - u[1])^2 + p[2] * (u[2] - u[1]^2)^2
u0 = zeros(2)
p = [1.0, 100.0]
optf = OptimizationFunction(rosenbrock, AutoZygote())
prob = OptimizationProblem(optf, u0, p)
sol = solve(prob, Optimization.LBFGS())
retcode: Success
u: 2-element Vector{Float64}:
0.9999997057368228
0.999999398151528
sol.u
2-element Vector{Float64}:
0.9999997057368228
0.999999398151528
sol.objective
1.0433892998247468e-13
Tada! That's how you do it. Now let's dive in a little more into what each part means and how to customize it all to your needs.
Understanding the Solution Object
The solution object is a SciMLBase.AbstractNoTimeSolution
, and thus it follows the SciMLBase Solution Interface for non-timeseries objects and is documented at the solution type page. However, for simplicity let's show a bit of it in action.
An optimization solution has an array interface so that it acts like the array that it solves for. This array syntax is shorthand for simply grabbing the solution u
. For example:
sol[1] == sol.u[1]
true
Array(sol) == sol.u
true
sol.objective
returns the final cost of the optimization. We can validate this by plugging it into our function:
rosenbrock(sol.u, p)
1.0433892998247468e-13
sol.objective
1.0433892998247468e-13
The sol.retcode
gives us more information about the solution process.
sol.retcode
ReturnCode.Success = 1
Here it says ReturnCode.Success
which means that the solutuion successfully solved. We can learn more about the different return codes at the ReturnCode part of the SciMLBase documentation.
If we are interested about some of the statistics of the solving process, for example to help choose a better solver, we can investigate the sol.stats
sol.stats
SciMLBase.OptimizationStats
Number of iterations: 21
Time in seconds: 0.230282
Number of function evaluations: 25
Number of gradient evaluations: 25
Number of hessian evaluations: 0
That's just a bit of what's in there, check out the other pages for more information but now let's move onto customization.
Import a different solver package and solve the problem
OptimizationOptimJL is a wrapper for Optim.jl and OptimizationBBO is a wrapper for BlackBoxOptim.jl.
First let's use the NelderMead a derivative free solver from Optim.jl:
using OptimizationOptimJL
sol = solve(prob, Optim.NelderMead())
retcode: Success
u: 2-element Vector{Float64}:
0.9999634355313174
0.9999315506115275
BlackBoxOptim.jl offers derivative-free global optimization solvers that requrie the bounds to be set via lb
and ub
in the OptimizationProblem
. Let's use the BBOadaptivederand1binradiuslimited() solver:
using OptimizationBBO
prob = OptimizationProblem(rosenbrock, u0, p, lb = [-1.0, -1.0], ub = [1.0, 1.0])
sol = solve(prob, BBO_adaptive_de_rand_1_bin_radiuslimited())
retcode: MaxIters
u: 2-element Vector{Float64}:
0.999999999999988
0.9999999999999781
The solution from the original solver can always be obtained via original
:
sol.original
BlackBoxOptim.OptimizationResults("adaptive_de_rand_1_bin_radiuslimited", "Max number of steps (10000) reached", 10001, 1.747308219093924e9, 0.06201791763305664, BlackBoxOptim.ParamsDictChain[BlackBoxOptim.ParamsDictChain[Dict{Symbol, Any}(:RngSeed => 589245, :SearchRange => [(-1.0, 1.0), (-1.0, 1.0)], :TraceMode => :silent, :Method => :adaptive_de_rand_1_bin_radiuslimited, :MaxSteps => 10000),Dict{Symbol, Any}()],Dict{Symbol, Any}(:CallbackInterval => -1.0, :TargetFitness => nothing, :TraceMode => :compact, :FitnessScheme => BlackBoxOptim.ScalarFitnessScheme{true}(), :MinDeltaFitnessTolerance => 1.0e-50, :NumDimensions => :NotSpecified, :FitnessTolerance => 1.0e-8, :TraceInterval => 0.5, :MaxStepsWithoutProgress => 10000, :MaxSteps => 10000…)], 10174, BlackBoxOptim.ScalarFitnessScheme{true}(), BlackBoxOptim.TopListArchiveOutput{Float64, Vector{Float64}}(5.887367543277564e-28, [0.999999999999988, 0.9999999999999781]), BlackBoxOptim.PopulationOptimizerOutput{BlackBoxOptim.FitPopulation{Float64}}(BlackBoxOptim.FitPopulation{Float64}([0.9999999999998147 0.9999999999998774 … 0.9999999999998048 0.9999999999999069; 0.9999999999996932 0.9999999999998264 … 0.9999999999995917 0.9999999999997967], NaN, [4.418614664199695e-25, 5.262246551848751e-25, 2.974646685428223e-25, 2.4127839569510177e-25, 6.677009038690243e-24, 2.53859580819874e-25, 1.400295776241822e-25, 2.1711929627962705e-26, 2.493846194235881e-25, 7.341307216929095e-26 … 3.7160969269859356e-26, 8.268864660429934e-26, 5.389904460874207e-27, 5.887367543277564e-28, 3.610135651082452e-27, 9.728750373154569e-28, 5.783647741680556e-25, 2.0264401914356423e-25, 7.044236991164835e-26, 3.753031641087416e-26], 0, BlackBoxOptim.Candidate{Float64}[BlackBoxOptim.Candidate{Float64}([0.9999999999998461, 0.9999999999996908], 22, 2.388641238725278e-26, BlackBoxOptim.AdaptiveDiffEvoRandBin{3}(BlackBoxOptim.AdaptiveDiffEvoParameters(BlackBoxOptim.BimodalCauchy(Distributions.Cauchy{Float64}(μ=0.65, σ=0.1), Distributions.Cauchy{Float64}(μ=1.0, σ=0.1), 0.5, false, true), BlackBoxOptim.BimodalCauchy(Distributions.Cauchy{Float64}(μ=0.1, σ=0.1), Distributions.Cauchy{Float64}(μ=0.95, σ=0.1), 0.5, false, true), [0.6160741366193769, 0.6328921509386488, 0.8063688154309954, 0.3986140875593449, 0.5638862057177316, 0.8769604908360963, 0.9109635906994891, 0.6267460293777758, 0.9351746813481097, 0.5557274607805647 … 1.0, 0.7559694125647365, 1.0, 0.04044472911760044, 0.7794858320085585, 0.7805080142956227, 0.06394432744563894, 1.0, 1.0, 0.650369819458861], [0.15059910072074867, 0.916913486399358, 0.08654697317024351, 1.0, 0.7016472342439022, 1.0, 1.0, 0.1523798854278966, 0.14614121296706561, 1.0 … 0.04306434527345787, 0.19737152665661362, 0.7308796818829099, 1.0, 0.4049514547973657, 0.04570826320297758, 0.9724356294584734, 0.19729424617917873, 0.1606433392451163, 0.06830412715709144])), 0), BlackBoxOptim.Candidate{Float64}([0.9999999999998461, 0.9999999999974024], 22, 5.243581054632294e-22, BlackBoxOptim.AdaptiveDiffEvoRandBin{3}(BlackBoxOptim.AdaptiveDiffEvoParameters(BlackBoxOptim.BimodalCauchy(Distributions.Cauchy{Float64}(μ=0.65, σ=0.1), Distributions.Cauchy{Float64}(μ=1.0, σ=0.1), 0.5, false, true), BlackBoxOptim.BimodalCauchy(Distributions.Cauchy{Float64}(μ=0.1, σ=0.1), Distributions.Cauchy{Float64}(μ=0.95, σ=0.1), 0.5, false, true), [0.6160741366193769, 0.6328921509386488, 0.8063688154309954, 0.3986140875593449, 0.5638862057177316, 0.8769604908360963, 0.9109635906994891, 0.6267460293777758, 0.9351746813481097, 0.5557274607805647 … 1.0, 0.7559694125647365, 1.0, 0.04044472911760044, 0.7794858320085585, 0.7805080142956227, 0.06394432744563894, 1.0, 1.0, 0.650369819458861], [0.15059910072074867, 0.916913486399358, 0.08654697317024351, 1.0, 0.7016472342439022, 1.0, 1.0, 0.1523798854278966, 0.14614121296706561, 1.0 … 0.04306434527345787, 0.19737152665661362, 0.7308796818829099, 1.0, 0.4049514547973657, 0.04570826320297758, 0.9724356294584734, 0.19729424617917873, 0.1606433392451163, 0.06830412715709144])), 0)], Base.Threads.SpinLock(0))))
Defining the objective function
Optimization.jl assumes that your objective function takes two arguments objective(x, p)
- The optimization variables
x
. - Other parameters
p
, such as hyper parameters of the cost function. If you have no “other parameters”, you can safely disregard this argument. If your objective function is defined by someone else, you can create an anonymous function that just discards the extra parameters like this
obj = (x, p) -> objective(x) # Pass this function into OptimizationFunction
Controlling Gradient Calculations (Automatic Differentiation)
Notice that both of the above methods were derivative-free methods, and thus no gradients were required to do the optimization. However, often first order optimization (i.e., using gradients) is much more efficient. Defining gradients can be done in two ways. One way is to manually provide a gradient definition in the OptimizationFunction
constructor. However, the more convenient way to obtain gradients is to provide an AD backend type.
For example, let's now use the OptimizationOptimJL BFGS
method to solve the same problem. We will import the forward-mode automatic differentiation library (using ForwardDiff
) and then specify in the OptimizationFunction
to automatically construct the derivative functions using ForwardDiff.jl. This looks like:
using ForwardDiff
optf = OptimizationFunction(rosenbrock, Optimization.AutoForwardDiff())
prob = OptimizationProblem(optf, u0, p)
sol = solve(prob, BFGS())
retcode: Success
u: 2-element Vector{Float64}:
0.9999999999373614
0.999999999868622
We can inspect the original
to see the statistics on the number of steps required and gradients computed:
sol.original
* Status: success
* Candidate solution
Final objective value: 7.645684e-21
* Found with
Algorithm: BFGS
* Convergence measures
|x - x'| = 3.48e-07 ≰ 0.0e+00
|x - x'|/|x'| = 3.48e-07 ≰ 0.0e+00
|f(x) - f(x')| = 6.91e-14 ≰ 0.0e+00
|f(x) - f(x')|/|f(x')| = 9.03e+06 ≰ 0.0e+00
|g(x)| = 2.32e-09 ≤ 1.0e-08
* Work counters
Seconds run: 0 (vs limit Inf)
Iterations: 16
f(x) calls: 53
∇f(x) calls: 53
Sure enough, it's a lot less than the derivative-free methods!
However, the compute cost of forward-mode automatic differentiation scales via the number of inputs, and thus as our optimization problem grows large it slows down. To counteract this, for larger optimization problems (>100 state variables) one normally would want to use reverse-mode automatic differentiation. One common choice for reverse-mode automatic differentiation is Zygote.jl. We can demonstrate this via:
using Zygote
optf = OptimizationFunction(rosenbrock, Optimization.AutoZygote())
prob = OptimizationProblem(optf, u0, p)
sol = solve(prob, BFGS())
retcode: Success
u: 2-element Vector{Float64}:
0.9999999999373614
0.999999999868622
Setting Box Constraints
In many cases, one knows the potential bounds on the solution values. In Optimization.jl, these can be supplied as the lb
and ub
arguments for the lower bounds and upper bounds respectively, supplying a vector of values with one per state variable. Let's now do our gradient-based optimization with box constraints by rebuilding the OptimizationProblem:
prob = OptimizationProblem(optf, u0, p, lb = [-1.0, -1.0], ub = [1.0, 1.0])
sol = solve(prob, BFGS())
retcode: Success
u: 2-element Vector{Float64}:
0.9999999993561103
0.9999999987161009
For more information on handling constraints, in particular equality and inequality constraints, take a look at the constraints tutorial.