Getting Started with Optimization-Based ODE Parameter Estimation

In this tutorial, we will showcase how to estimate the parameters of an ordinary differential equation using DiffEqParamEstim.jl. DiffEqParamEstim.jl is a high-level tool that makes common parameter estimation tasks simple. Here, we will show how to use its cost function generation to estimate the parameters of the Lotka-Volterra equation against simulated data.

Installation

First, we will make sure DiffEqParamEstim.jl is installed correctly. To do this, we use the Julia package REPL, opened by typing ] in the REPL and seeing pkg> appear in blue. Then we type: add DiffEqParamEstim and hit enter. This will run the package management sequence, and we will be good to go.

Required Dependencies

ModuleDescription
DifferentialEquations.jlThe numerical differential equation solver package
RecursiveArrayTools.jlTooling for recursive arrays like vector of arrays
Plots.jlTooling for plotting and visualization
Zygote.jlTooling for reverse-mode automatic differentiation (gradient calculations)
Optimization.jlThe numerical optimization package
OptimizationOptimJL.jlThe Optim optimizers we will use for local optimization
OptimizationBBO.jlThe BlackBoxOptim optimizers we will use for global optimization

Parameter Estimation in the Lotka-Volterra Equation: 1 Parameter Case

We choose to optimize the parameters on the Lotka-Volterra equation. Let's start by defining the equation as a function with parameters:

using DifferentialEquations, RecursiveArrayTools, Plots, DiffEqParamEstim
using Optimization, ForwardDiff, OptimizationOptimJL, OptimizationBBO

function f(du, u, p, t)
    du[1] = dx = p[1] * u[1] - u[1] * u[2]
    du[2] = dy = -3 * u[2] + u[1] * u[2]
end

u0 = [1.0; 1.0]
tspan = (0.0, 10.0)
p = [1.5]
prob = ODEProblem(f, u0, tspan, p)
ODEProblem with uType Vector{Float64} and tType Float64. In-place: true
timespan: (0.0, 10.0)
u0: 2-element Vector{Float64}:
 1.0
 1.0

Generating Synthetic Data

We create data using the numerical result with a=1.5:

sol = solve(prob, Tsit5())
t = collect(range(0, stop = 10, length = 200))
using RecursiveArrayTools # for VectorOfArray
randomized = VectorOfArray([(sol(t[i]) + 0.01randn(2)) for i in 1:length(t)])
data = convert(Array, randomized)
2×200 Matrix{Float64}:
 0.997412  1.02184   1.06491   1.10862   …  0.995828  1.00714  1.04858
 0.990941  0.890056  0.813912  0.729434     1.09535   1.00321  0.90241

Here, we used VectorOfArray from RecursiveArrayTools.jl to turn the result of an ODE into a matrix.

If we plot the solution with the parameter at a=1.42, we get the following:

newprob = remake(prob, p = [1.42])
newsol = solve(newprob, Tsit5())
plot(sol)
plot!(newsol)
Example block output

Notice that after one period, this solution begins to drift very far off: this problem is sensitive to the choice of a.

To build the objective function for Optim.jl, we simply call the build_loss_objective function:

cost_function = build_loss_objective(prob, Tsit5(), L2Loss(t, data),
                                     Optimization.AutoForwardDiff(),
                                     maxiters = 10000, verbose = false)
(::SciMLBase.OptimizationFunction{true, ADTypes.AutoForwardDiff{nothing, Nothing}, DiffEqParamEstim.var"#29#30"{Nothing, typeof(DiffEqParamEstim.STANDARD_PROB_GENERATOR), Base.Pairs{Symbol, Integer, Tuple{Symbol, Symbol}, @NamedTuple{maxiters::Int64, verbose::Bool}}, SciMLBase.ODEProblem{Vector{Float64}, Tuple{Float64, Float64}, true, Vector{Float64}, SciMLBase.ODEFunction{true, SciMLBase.AutoSpecialize, typeof(Main.f), LinearAlgebra.UniformScaling{Bool}, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, typeof(SciMLBase.DEFAULT_OBSERVED), Nothing, Nothing}, Base.Pairs{Symbol, Union{}, Tuple{}, @NamedTuple{}}, SciMLBase.StandardODEProblem}, OrdinaryDiffEq.Tsit5{typeof(OrdinaryDiffEq.trivial_limiter!), typeof(OrdinaryDiffEq.trivial_limiter!), Static.False}, L2Loss{Vector{Float64}, Matrix{Float64}, Nothing, Nothing, Nothing}, Nothing, Tuple{}}, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, typeof(SciMLBase.DEFAULT_OBSERVED_NO_TIME), Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing}) (generic function with 1 method)

This objective function internally is calling the ODE solver to get solutions to test against the data. The keyword arguments are passed directly to the solver. Note that we set maxiters in a way that causes the differential equation solvers to error more quickly when in bad regions of the parameter space, speeding up the process. If the integrator stops early (due to divergence), then those parameters are given an infinite loss, and thus this is a quick way to avoid bad parameters. We set verbose=false because this divergence can get noisy. The Optimization.AutoForwardDiff() is a choice of automatic differentiation, i.e., how the gradients are calculated. For more information on this choice, see the automatic differentiation choice API.

Note

A good rule of thumb is to use Optimization.AutoForwardDiff() for less than 100 parameters + states, and Optimization.AutoZygote() for more.

Before optimizing, let's visualize our cost function by plotting it for a range of parameter values:

vals = 0.0:0.1:10.0
plot(vals, [cost_function(i) for i in vals], yscale = :log10,
     xaxis = "Parameter", yaxis = "Cost", title = "1-Parameter Cost Function",
     lw = 3)
Example block output

Here we see that there is a very well-defined minimum in our cost function at the real parameter (because this is where the solution almost exactly fits the dataset).

Now we can use the BFGS algorithm to optimize the parameter starting at a=1.42. We do this by creating an optimization problem and solving that with BFGS():

optprob = Optimization.OptimizationProblem(cost_function, [1.42])
optsol = solve(optprob, BFGS())
retcode: Success
u: 1-element Vector{Float64}:
 1.5004180956473268

Now let's see how well the fit performed:

newprob = remake(prob, p = optsol.u)
newsol = solve(newprob, Tsit5())
plot(sol)
plot!(newsol)
Example block output

Note that some algorithms may be sensitive to the initial condition. For more details on using Optim.jl, see the documentation for Optim.jl.

Adding Bounds Constraints

We can improve our solution by noting that the Lotka-Volterra equation requires that the parameters are positive. Thus, following the Optimization.jl documentation we can add box constraints to ensure the optimizer only checks between 0.0 and 3.0 which improves the efficiency of our algorithm. We pass the lb and ub keyword arguments to the OptimizationProblem to pass these bounds to the optimizer:

lower = [0.0]
upper = [3.0]
optprob = Optimization.OptimizationProblem(cost_function, [1.42], lb = lower, ub = upper)
result = solve(optprob, BFGS())
retcode: Success
u: 1-element Vector{Float64}:
 1.5004180956473046

Estimating Multiple Parameters Simultaneously

Lastly, we can use the same tools to estimate multiple parameters simultaneously. Let's use the Lotka-Volterra equation with all parameters free:

function f2(du, u, p, t)
    du[1] = dx = p[1] * u[1] - p[2] * u[1] * u[2]
    du[2] = dy = -p[3] * u[2] + p[4] * u[1] * u[2]
end

u0 = [1.0; 1.0]
tspan = (0.0, 10.0)
p = [1.5, 1.0, 3.0, 1.0]
prob = ODEProblem(f2, u0, tspan, p)
ODEProblem with uType Vector{Float64} and tType Float64. In-place: true
timespan: (0.0, 10.0)
u0: 2-element Vector{Float64}:
 1.0
 1.0

We can build an objective function and solve the multiple parameter version just as before:

cost_function = build_loss_objective(prob, Tsit5(), L2Loss(t, data),
                                     Optimization.AutoForwardDiff(),
                                     maxiters = 10000, verbose = false)
optprob = Optimization.OptimizationProblem(cost_function, [1.3, 0.8, 2.8, 1.2])
result_bfgs = solve(optprob, BFGS())
retcode: Success
u: 4-element Vector{Float64}:
 1.5003720528087503
 1.0011299311146817
 2.9998546807822644
 1.0002390260081448

Alternative Cost Functions for Increased Robustness

The build_loss_objective with L2Loss is the most naive approach for parameter estimation. There are many others.

We can also use First-Differences in L2Loss by passing the kwarg differ_weight which decides the contribution of the differencing loss to the total loss.

cost_function = build_loss_objective(prob, Tsit5(),
                                     L2Loss(t, data, differ_weight = 0.3,
                                            data_weight = 0.7),
                                     Optimization.AutoForwardDiff(),
                                     maxiters = 10000, verbose = false)
optprob = Optimization.OptimizationProblem(cost_function, [1.3, 0.8, 2.8, 1.2])
result_bfgs = solve(optprob, BFGS())
retcode: Success
u: 4-element Vector{Float64}:
 1.5002651210131168
 1.0008554416883255
 3.0000388657607213
 1.0003212202182212

We can also use Multiple Shooting method by creating a multiple_shooting_objective

function ms_f1(du, u, p, t)
    du[1] = p[1] * u[1] - p[2] * u[1] * u[2]
    du[2] = -3.0 * u[2] + u[1] * u[2]
end
ms_u0 = [1.0; 1.0]
tspan = (0.0, 10.0)
ms_p = [1.5, 1.0]
ms_prob = ODEProblem(ms_f1, ms_u0, tspan, ms_p)
t = collect(range(0, stop = 10, length = 200))
data = Array(solve(ms_prob, Tsit5(), saveat = t, abstol = 1e-12, reltol = 1e-12))
bound = Tuple{Float64, Float64}[(0, 10), (0, 10), (0, 10), (0, 10),
                                (0, 10), (0, 10), (0, 10), (0, 10),
                                (0, 10), (0, 10), (0, 10), (0, 10),
                                (0, 10), (0, 10), (0, 10), (0, 10), (0, 10), (0, 10)]

ms_obj = multiple_shooting_objective(ms_prob, Tsit5(), L2Loss(t, data),
                                     Optimization.AutoForwardDiff();
                                     discontinuity_weight = 1.0, abstol = 1e-12,
                                     reltol = 1e-12)
(::SciMLBase.OptimizationFunction{true, ADTypes.AutoForwardDiff{nothing, Nothing}, DiffEqParamEstim.var"#43#48"{Nothing, Float64, DiffEqParamEstim.var"#1#2", Base.Pairs{Symbol, Float64, Tuple{Symbol, Symbol}, @NamedTuple{abstol::Float64, reltol::Float64}}, SciMLBase.ODEProblem{Vector{Float64}, Tuple{Float64, Float64}, true, Vector{Float64}, SciMLBase.ODEFunction{true, SciMLBase.AutoSpecialize, typeof(Main.ms_f1), LinearAlgebra.UniformScaling{Bool}, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, typeof(SciMLBase.DEFAULT_OBSERVED), Nothing, Nothing}, Base.Pairs{Symbol, Union{}, Tuple{}, @NamedTuple{}}, SciMLBase.StandardODEProblem}, OrdinaryDiffEq.Tsit5{typeof(OrdinaryDiffEq.trivial_limiter!), typeof(OrdinaryDiffEq.trivial_limiter!), Static.False}, L2Loss{Vector{Float64}, Matrix{Float64}, Nothing, Nothing, Nothing}, Nothing}, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, typeof(SciMLBase.DEFAULT_OBSERVED_NO_TIME), Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing}) (generic function with 1 method)

This creates the objective function that can be passed to an optimizer, from which we can then get the parameter values and the initial values of the short time periods, keeping in mind the indexing. Now we mix this with a global optimization method to improve robustness even more:

optprob = Optimization.OptimizationProblem(ms_obj, zeros(18), lb = first.(bound),
                                           ub = last.(bound))
optsol_ms = solve(optprob, BBO_adaptive_de_rand_1_bin_radiuslimited(), maxiters = 10_000)
retcode: Failure
u: 18-element Vector{Float64}:
 0.891681988758575
 1.9620739986413511
 2.5474315025134335
 0.3429689784420859
 3.985584636250989
 4.281781938286505
 1.2753300729796457
 1.2810089440438477
 3.9604493227940365
 0.3481082069922014
 1.7124047296091047
 4.159608442216425
 1.388993377385026
 0.8729266104668201
 6.365431376248737
 0.7342708064021675
 2.2367266398828476
 1.3779047538577185
optsol_ms.u[(end - 1):end]
2-element Vector{Float64}:
 2.2367266398828476
 1.3779047538577185

Here as our model had 2 parameters, we look at the last 2 indexes of result to get our parameter values and the rest of the values are the initial values of the shorter timespans as described in the reference section. We can also use a gradient-based optimizer with the multiple shooting objective.

optsol_ms = solve(optprob, BFGS())
optsol_ms.u[(end - 1):end]
2-element Vector{Float64}:
 1.5000000000037266
 1.000000000000063

The objective function for the Two Stage method can be created and passed to an optimizer as

two_stage_obj = two_stage_objective(ms_prob, t, data, Optimization.AutoForwardDiff())
optprob = Optimization.OptimizationProblem(two_stage_obj, [1.3, 0.8, 2.8, 1.2])
result = solve(optprob, Optim.BFGS())
retcode: Success
u: 4-element Vector{Float64}:
 1.5035938533664905
 0.99257311537469
 2.8
 1.2

The default kernel used in the method is Epanechnikov, available others are Uniform, Triangular, Quartic, Triweight, Tricube, Gaussian, Cosine, Logistic and Sigmoid, this can be passed by the kernel keyword argument. loss_func keyword argument can be used to pass the loss function (cost function) you want to use and passing a valid adtype argument enables Auto Differentiation.

Conclusion

There are many more choices for how to improve the robustness of a parameter estimation. With all of these tools, one likely should never do the simple “solve it with p and check the L2 loss”. Instead, we should use these tricks to improve the loss landscape and increase the ability for optimizers to find globally the best parameters.