Getting Started with Optimization-Based ODE Parameter Estimation
In this tutorial, we will showcase how to estimate the parameters of an ordinary differential equation using DiffEqParamEstim.jl. DiffEqParamEstim.jl is a high-level tool that makes common parameter estimation tasks simple. Here, we will show how to use its cost function generation to estimate the parameters of the Lotka-Volterra equation against simulated data.
Installation
First, we will make sure DiffEqParamEstim.jl is installed correctly. To do this, we use the Julia package REPL, opened by typing ]
in the REPL and seeing pkg>
appear in blue. Then we type: add DiffEqParamEstim
and hit enter. This will run the package management sequence, and we will be good to go.
Required Dependencies
Module | Description |
---|---|
DifferentialEquations.jl | The numerical differential equation solver package |
RecursiveArrayTools.jl | Tooling for recursive arrays like vector of arrays |
Plots.jl | Tooling for plotting and visualization |
Zygote.jl | Tooling for reverse-mode automatic differentiation (gradient calculations) |
Optimization.jl | The numerical optimization package |
OptimizationOptimJL.jl | The Optim optimizers we will use for local optimization |
OptimizationBBO.jl | The BlackBoxOptim optimizers we will use for global optimization |
Parameter Estimation in the Lotka-Volterra Equation: 1 Parameter Case
We choose to optimize the parameters on the Lotka-Volterra equation. Let's start by defining the equation as a function with parameters:
using DifferentialEquations, RecursiveArrayTools, Plots, DiffEqParamEstim
using Optimization, ForwardDiff, OptimizationOptimJL, OptimizationBBO
function f(du, u, p, t)
du[1] = dx = p[1] * u[1] - u[1] * u[2]
du[2] = dy = -3 * u[2] + u[1] * u[2]
end
u0 = [1.0; 1.0]
tspan = (0.0, 10.0)
p = [1.5]
prob = ODEProblem(f, u0, tspan, p)
ODEProblem with uType Vector{Float64} and tType Float64. In-place: true
timespan: (0.0, 10.0)
u0: 2-element Vector{Float64}:
1.0
1.0
Generating Synthetic Data
We create data using the numerical result with a=1.5
:
sol = solve(prob, Tsit5())
t = collect(range(0, stop = 10, length = 200))
using RecursiveArrayTools # for VectorOfArray
randomized = VectorOfArray([(sol(t[i]) + 0.01randn(2)) for i in 1:length(t)])
data = convert(Array, randomized)
2×200 Matrix{Float64}:
0.997412 1.02184 1.06491 1.10862 … 0.995828 1.00714 1.04858
0.990941 0.890056 0.813912 0.729434 1.09535 1.00321 0.90241
Here, we used VectorOfArray
from RecursiveArrayTools.jl to turn the result of an ODE into a matrix.
If we plot the solution with the parameter at a=1.42
, we get the following:
newprob = remake(prob, p = [1.42])
newsol = solve(newprob, Tsit5())
plot(sol)
plot!(newsol)
Notice that after one period, this solution begins to drift very far off: this problem is sensitive to the choice of a
.
To build the objective function for Optim.jl, we simply call the build_loss_objective
function:
cost_function = build_loss_objective(prob, Tsit5(), L2Loss(t, data),
Optimization.AutoForwardDiff(),
maxiters = 10000, verbose = false)
(::SciMLBase.OptimizationFunction{true, ADTypes.AutoForwardDiff{nothing, Nothing}, DiffEqParamEstim.var"#29#30"{Nothing, typeof(DiffEqParamEstim.STANDARD_PROB_GENERATOR), Base.Pairs{Symbol, Integer, Tuple{Symbol, Symbol}, @NamedTuple{maxiters::Int64, verbose::Bool}}, SciMLBase.ODEProblem{Vector{Float64}, Tuple{Float64, Float64}, true, Vector{Float64}, SciMLBase.ODEFunction{true, SciMLBase.AutoSpecialize, typeof(Main.f), LinearAlgebra.UniformScaling{Bool}, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, typeof(SciMLBase.DEFAULT_OBSERVED), Nothing, Nothing}, Base.Pairs{Symbol, Union{}, Tuple{}, @NamedTuple{}}, SciMLBase.StandardODEProblem}, OrdinaryDiffEq.Tsit5{typeof(OrdinaryDiffEq.trivial_limiter!), typeof(OrdinaryDiffEq.trivial_limiter!), Static.False}, L2Loss{Vector{Float64}, Matrix{Float64}, Nothing, Nothing, Nothing}, Nothing, Tuple{}}, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, typeof(SciMLBase.DEFAULT_OBSERVED_NO_TIME), Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing}) (generic function with 1 method)
This objective function internally is calling the ODE solver to get solutions to test against the data. The keyword arguments are passed directly to the solver. Note that we set maxiters
in a way that causes the differential equation solvers to error more quickly when in bad regions of the parameter space, speeding up the process. If the integrator stops early (due to divergence), then those parameters are given an infinite loss, and thus this is a quick way to avoid bad parameters. We set verbose=false
because this divergence can get noisy. The Optimization.AutoForwardDiff()
is a choice of automatic differentiation, i.e., how the gradients are calculated. For more information on this choice, see the automatic differentiation choice API.
A good rule of thumb is to use Optimization.AutoForwardDiff()
for less than 100 parameters + states, and Optimization.AutoZygote()
for more.
Before optimizing, let's visualize our cost function by plotting it for a range of parameter values:
vals = 0.0:0.1:10.0
plot(vals, [cost_function(i) for i in vals], yscale = :log10,
xaxis = "Parameter", yaxis = "Cost", title = "1-Parameter Cost Function",
lw = 3)
Here we see that there is a very well-defined minimum in our cost function at the real parameter (because this is where the solution almost exactly fits the dataset).
Now we can use the BFGS
algorithm to optimize the parameter starting at a=1.42
. We do this by creating an optimization problem and solving that with BFGS()
:
optprob = Optimization.OptimizationProblem(cost_function, [1.42])
optsol = solve(optprob, BFGS())
retcode: Success
u: 1-element Vector{Float64}:
1.5004180956473268
Now let's see how well the fit performed:
newprob = remake(prob, p = optsol.u)
newsol = solve(newprob, Tsit5())
plot(sol)
plot!(newsol)
Note that some algorithms may be sensitive to the initial condition. For more details on using Optim.jl, see the documentation for Optim.jl.
Adding Bounds Constraints
We can improve our solution by noting that the Lotka-Volterra equation requires that the parameters are positive. Thus, following the Optimization.jl documentation we can add box constraints to ensure the optimizer only checks between 0.0
and 3.0
which improves the efficiency of our algorithm. We pass the lb
and ub
keyword arguments to the OptimizationProblem
to pass these bounds to the optimizer:
lower = [0.0]
upper = [3.0]
optprob = Optimization.OptimizationProblem(cost_function, [1.42], lb = lower, ub = upper)
result = solve(optprob, BFGS())
retcode: Success
u: 1-element Vector{Float64}:
1.5004180956473046
Estimating Multiple Parameters Simultaneously
Lastly, we can use the same tools to estimate multiple parameters simultaneously. Let's use the Lotka-Volterra equation with all parameters free:
function f2(du, u, p, t)
du[1] = dx = p[1] * u[1] - p[2] * u[1] * u[2]
du[2] = dy = -p[3] * u[2] + p[4] * u[1] * u[2]
end
u0 = [1.0; 1.0]
tspan = (0.0, 10.0)
p = [1.5, 1.0, 3.0, 1.0]
prob = ODEProblem(f2, u0, tspan, p)
ODEProblem with uType Vector{Float64} and tType Float64. In-place: true
timespan: (0.0, 10.0)
u0: 2-element Vector{Float64}:
1.0
1.0
We can build an objective function and solve the multiple parameter version just as before:
cost_function = build_loss_objective(prob, Tsit5(), L2Loss(t, data),
Optimization.AutoForwardDiff(),
maxiters = 10000, verbose = false)
optprob = Optimization.OptimizationProblem(cost_function, [1.3, 0.8, 2.8, 1.2])
result_bfgs = solve(optprob, BFGS())
retcode: Success
u: 4-element Vector{Float64}:
1.5003720528087503
1.0011299311146817
2.9998546807822644
1.0002390260081448
Alternative Cost Functions for Increased Robustness
The build_loss_objective
with L2Loss
is the most naive approach for parameter estimation. There are many others.
We can also use First-Differences in L2Loss by passing the kwarg differ_weight
which decides the contribution of the differencing loss to the total loss.
cost_function = build_loss_objective(prob, Tsit5(),
L2Loss(t, data, differ_weight = 0.3,
data_weight = 0.7),
Optimization.AutoForwardDiff(),
maxiters = 10000, verbose = false)
optprob = Optimization.OptimizationProblem(cost_function, [1.3, 0.8, 2.8, 1.2])
result_bfgs = solve(optprob, BFGS())
retcode: Success
u: 4-element Vector{Float64}:
1.5002651210131168
1.0008554416883255
3.0000388657607213
1.0003212202182212
We can also use Multiple Shooting method by creating a multiple_shooting_objective
function ms_f1(du, u, p, t)
du[1] = p[1] * u[1] - p[2] * u[1] * u[2]
du[2] = -3.0 * u[2] + u[1] * u[2]
end
ms_u0 = [1.0; 1.0]
tspan = (0.0, 10.0)
ms_p = [1.5, 1.0]
ms_prob = ODEProblem(ms_f1, ms_u0, tspan, ms_p)
t = collect(range(0, stop = 10, length = 200))
data = Array(solve(ms_prob, Tsit5(), saveat = t, abstol = 1e-12, reltol = 1e-12))
bound = Tuple{Float64, Float64}[(0, 10), (0, 10), (0, 10), (0, 10),
(0, 10), (0, 10), (0, 10), (0, 10),
(0, 10), (0, 10), (0, 10), (0, 10),
(0, 10), (0, 10), (0, 10), (0, 10), (0, 10), (0, 10)]
ms_obj = multiple_shooting_objective(ms_prob, Tsit5(), L2Loss(t, data),
Optimization.AutoForwardDiff();
discontinuity_weight = 1.0, abstol = 1e-12,
reltol = 1e-12)
(::SciMLBase.OptimizationFunction{true, ADTypes.AutoForwardDiff{nothing, Nothing}, DiffEqParamEstim.var"#43#48"{Nothing, Float64, DiffEqParamEstim.var"#1#2", Base.Pairs{Symbol, Float64, Tuple{Symbol, Symbol}, @NamedTuple{abstol::Float64, reltol::Float64}}, SciMLBase.ODEProblem{Vector{Float64}, Tuple{Float64, Float64}, true, Vector{Float64}, SciMLBase.ODEFunction{true, SciMLBase.AutoSpecialize, typeof(Main.ms_f1), LinearAlgebra.UniformScaling{Bool}, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, typeof(SciMLBase.DEFAULT_OBSERVED), Nothing, Nothing}, Base.Pairs{Symbol, Union{}, Tuple{}, @NamedTuple{}}, SciMLBase.StandardODEProblem}, OrdinaryDiffEq.Tsit5{typeof(OrdinaryDiffEq.trivial_limiter!), typeof(OrdinaryDiffEq.trivial_limiter!), Static.False}, L2Loss{Vector{Float64}, Matrix{Float64}, Nothing, Nothing, Nothing}, Nothing}, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, typeof(SciMLBase.DEFAULT_OBSERVED_NO_TIME), Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing}) (generic function with 1 method)
This creates the objective function that can be passed to an optimizer, from which we can then get the parameter values and the initial values of the short time periods, keeping in mind the indexing. Now we mix this with a global optimization method to improve robustness even more:
optprob = Optimization.OptimizationProblem(ms_obj, zeros(18), lb = first.(bound),
ub = last.(bound))
optsol_ms = solve(optprob, BBO_adaptive_de_rand_1_bin_radiuslimited(), maxiters = 10_000)
retcode: Failure
u: 18-element Vector{Float64}:
0.891681988758575
1.9620739986413511
2.5474315025134335
0.3429689784420859
3.985584636250989
4.281781938286505
1.2753300729796457
1.2810089440438477
3.9604493227940365
0.3481082069922014
1.7124047296091047
4.159608442216425
1.388993377385026
0.8729266104668201
6.365431376248737
0.7342708064021675
2.2367266398828476
1.3779047538577185
optsol_ms.u[(end - 1):end]
2-element Vector{Float64}:
2.2367266398828476
1.3779047538577185
Here as our model had 2 parameters, we look at the last 2 indexes of result
to get our parameter values and the rest of the values are the initial values of the shorter timespans as described in the reference section. We can also use a gradient-based optimizer with the multiple shooting objective.
optsol_ms = solve(optprob, BFGS())
optsol_ms.u[(end - 1):end]
2-element Vector{Float64}:
1.5000000000037266
1.000000000000063
The objective function for the Two Stage method can be created and passed to an optimizer as
two_stage_obj = two_stage_objective(ms_prob, t, data, Optimization.AutoForwardDiff())
optprob = Optimization.OptimizationProblem(two_stage_obj, [1.3, 0.8, 2.8, 1.2])
result = solve(optprob, Optim.BFGS())
retcode: Success
u: 4-element Vector{Float64}:
1.5035938533664905
0.99257311537469
2.8
1.2
The default kernel used in the method is Epanechnikov
, available others are Uniform
, Triangular
, Quartic
, Triweight
, Tricube
, Gaussian
, Cosine
, Logistic
and Sigmoid
, this can be passed by the kernel
keyword argument. loss_func
keyword argument can be used to pass the loss function (cost function) you want to use and passing a valid adtype
argument enables Auto Differentiation.
Conclusion
There are many more choices for how to improve the robustness of a parameter estimation. With all of these tools, one likely should never do the simple “solve it with p
and check the L2 loss”. Instead, we should use these tricks to improve the loss landscape and increase the ability for optimizers to find globally the best parameters.