I ran into a strange issue where very small changes to a non-allocating function affect whether it’s ForwardDiff.gradient!
is allocating or not.
Here is an MWE. It’s a vastly simplified version of my original structure, so it may look a bit contrived.
using BenchmarkTools
using StaticArrays
using ForwardDiff
function run_regions(K, B, R, DI, DU, Z)
rshare = R/DU
ϵIII = (K ≈ 0.0) ? 0.0 : B / K
ΔIII = (K ≈ 0.0) ? 0.0 : R / K
# Assemble output
prob = (PI = K, PII = R, PIII = K, PIV = K, PV = DU)
condexp_ϵ = (ϵI = rshare, ϵII = B, ϵIII = ϵIII, ϵIV = DI, ϵV = DI)
condexp_δ = (δII = Z, ΔIII = ΔIII)
return prob, condexp_ϵ, condexp_δ
end
function qU_hh(K, B, R, DI, DU, Z)
prob, condexp_ϵ = run_regions(K, B, R, DI, DU, Z)
return prob.PI * condexp_ϵ.ϵI
end
function test_func()
f = f(x) = qU_hh(
x[1], x[2], x[3], x[4], x[5], 0.639441367934362
)
arg = SVector{5}(
0.006703564978428907, # K
0.007390565022486318, # B
0.0006867044273688475, # R
0.013733951206491474, # DI
1.373408854737695e-7, # DU
)
@btime $f($arg)
end
function test_deriv()
f = f(x) = qU_hh(
x[1], x[2], x[3], x[4], x[5], 0.639441367934362
)
arg = SVector{5}(
0.006703564978428907, # K
0.007390565022486318, # B
0.0006867044273688475, # R
0.013733951206491474, # DI
1.373408854737695e-7, # DU
)
cfg = ForwardDiff.GradientConfig(f, arg)
res = similar(arg)
@btime ForwardDiff.gradient!($res, $f, $arg, $cfg)
end
julia> test_func()
2.100 ns (0 allocations: 0 bytes)
julia> test_deriv()
544.086 ns (23 allocations: 2.75 KiB)
If I EITHER (1) get rid of the ternary operators in run_regions
, OR (2) make run_regions
output a scalar instead of the tuple of NamedTuples it outputs right now, I can lower the allocations to zero. Replacing ≈ with == has no effect.
Why? Why is it OR instead of an AND?
I tried changing chunk sizes in GradientConfig, and it didn’t do anything.
Any advice is appreciated!
N.B.: I understand that for this specific MWE I can avoid this issue by easily combining run_regions and qU_hh into one function, but it’s not easy to do in my main code. So I’m curious to see what causes this allocation and how I can avoid it without re-factoring functions.