Non-sorted SparseMatrixCSC

Well, not for this kind of use with ForwardDiff. In fact, the performance very much depends on the way things are called internally in ForwardDiff which I did not investigate.

Your function is kind of the worst case for the linked list internal format used in ExtendableSparse, as it generates a full row in the matrix (will have to mention this caveat in the readme: I assume that “sparse” means that all rows/columns have << n entries). But it seems that this is not the problem here.

However, this brings me back to my previous post: “atomic” assembly of the finite difference operator without too much assumptions on the inner workings of ForwardDiff. So let me extend your example a bit (VoronoiFVM is my package):

using ExtendableSparse
using ForwardDiff
using LinearAlgebra
using SparseArrays
using VoronoiFVM

#
# Finite difference operator for
# -\Delta u^2 +  u^2 = 1 
#
function ffd(y,x)
    n=length(x)
    h=1/(n-1)
    y[1]=(x[1]^2-1)*0.5*h
    for i=2:n-1
        y[i]=(x[i]^2-1.0)*h
    end
    y[end]=(x[end]^2-1)*0.5*h

    for i=1:n-1
        dx=(x[i+1]^2-x[i]^2)/h
        y[i+1]+=dx
        y[i]-=dx
    end
end

function flux!(f,u,edge)
    f[1]=u[1,1]^2-u[1,2]^2
end

## Storage term
function storage!(f,u,node)
    f[1]=u[1]
end

function reaction!(f,u,node)
    f[1]=u[1]^2-1.0
end

function runtest(n)
    
    jac_ext = ExtendableSparseMatrix(n, n);
    @time begin
        ForwardDiff.jacobian!(jac_ext, ffd, ones(n), ones(n));
        flush!(jac_ext)
    end
    
    jac = spzeros(n, n);
    @time ForwardDiff.jacobian!(jac, ffd, ones(n), ones(n));
    all(jac .== jac_ext)

    # Set up stuff for VoronoiFVM which uses atomic assembly
    h=1.0/convert(Float64,n-1)
    X=collect(0:h:1)
    grid=VoronoiFVM.Grid(X)
    physics=VoronoiFVM.Physics(flux=flux!, storage=storage!,reaction=reaction!)
    sys=VoronoiFVM.System(grid,physics,unknown_storage=:dense)
    enable_species!(sys,1,[1])
    solution=unknowns(sys,inival=1.0)
    oldsol=unknowns(sys,inival=1.0)
    residual=unknowns(sys,inival=1.0)
    tstep=1.0e100 # large timestep aproximates stationary problem
    
    @time begin
        VoronoiFVM.eval_and_assemble(sys,solution,oldsol,residual,tstep)
        flush!(sys.matrix)
    end
    
    all(sys.matrix.≈jac_ext)
end

second run:

julia> runtest(20000)
  5.806884 seconds (32 allocations: 7.174 MiB)
  3.925591 seconds (41 allocations: 6.275 MiB)
  0.025753 seconds (117.03 k allocations: 3.849 MiB)
true

I intentionally use @time here, as @btime probably won’t be dominated by
the initial structure buildup phase. This is the kind of things I also benchmarked
before (see the benchmarking stuff with fdrand!() in ExtendableSparse).

In VoronoiFVM, I call ForwardDiff with flux!, reaction!, storage! and get local matrices which I then assemble into the global matrix.