ForwardDiff Allocations

I ran into a strange issue where very small changes to a non-allocating function affect whether it’s ForwardDiff.gradient! is allocating or not.

Here is an MWE. It’s a vastly simplified version of my original structure, so it may look a bit contrived.

using BenchmarkTools
using StaticArrays
using ForwardDiff

function run_regions(K, B, R, DI, DU, Z)
    rshare = R/DU
    
    ϵIII = (K ≈ 0.0) ? 0.0 : B / K
    ΔIII = (K ≈ 0.0) ? 0.0 : R / K

    # Assemble output
    prob = (PI = K, PII = R, PIII = K, PIV = K, PV = DU)
    condexp_ϵ = (ϵI = rshare, ϵII = B, ϵIII = ϵIII, ϵIV = DI, ϵV = DI)
    condexp_δ = (δII = Z, ΔIII = ΔIII)
    return prob, condexp_ϵ, condexp_δ
end

function qU_hh(K, B, R, DI, DU, Z)
    prob, condexp_ϵ = run_regions(K, B, R, DI, DU, Z)

    return prob.PI * condexp_ϵ.ϵI
end

function test_func()
    f = f(x) = qU_hh(
        x[1], x[2], x[3], x[4], x[5], 0.639441367934362
    )
    arg = SVector{5}(
        0.006703564978428907,  # K
        0.007390565022486318,  # B
        0.0006867044273688475, # R
        0.013733951206491474,  # DI
        1.373408854737695e-7,  # DU
    )
    @btime $f($arg) 
end

function test_deriv()
    f = f(x) = qU_hh(
        x[1], x[2], x[3], x[4], x[5], 0.639441367934362
    )
    arg = SVector{5}(
        0.006703564978428907,  # K
        0.007390565022486318,  # B
        0.0006867044273688475, # R
        0.013733951206491474,  # DI
        1.373408854737695e-7,  # DU
    )
    cfg = ForwardDiff.GradientConfig(f, arg)
    res = similar(arg)
    @btime ForwardDiff.gradient!($res, $f, $arg, $cfg) 
end

julia> test_func()
  2.100 ns (0 allocations: 0 bytes)

julia> test_deriv()
  544.086 ns (23 allocations: 2.75 KiB)

If I EITHER (1) get rid of the ternary operators in run_regions, OR (2) make run_regions output a scalar instead of the tuple of NamedTuples it outputs right now, I can lower the allocations to zero. Replacing ≈ with == has no effect.

Why? Why is it OR instead of an AND?

I tried changing chunk sizes in GradientConfig, and it didn’t do anything.

Any advice is appreciated!

N.B.: I understand that for this specific MWE I can avoid this issue by easily combining run_regions and qU_hh into one function, but it’s not easy to do in my main code. So I’m curious to see what causes this allocation and how I can avoid it without re-factoring functions.

1 Like

I think this happens because your ternary operators are not type-stable: they return either a Float64 or a typeof(B / K) (which will be a Dual number at differentiation time). I am not at my computer but the first thing I would try is replacing 0.0 in the first branch of the ternary operator with zero(Base.promote_type(typeof(B), typeof(K)))

2 Likes

Bingo!

In my case, I can rely on K and B to be of the same type, so just zero(K) works.

In the full code, I have a few more statements like this so I just defined real_zero = zero(K) and am now using that everywhere instead of 0.0.

Thanks!

2 Likes

As a side note, you should be careful about checking approximate equality with zero

2 Likes