-
Notifications
You must be signed in to change notification settings - Fork 18
Open
Description
Hello!
I noticed that there is a difference between feeding in Vector{SVector{2,Float64}}
if it is packed in a StructArray
or if it is "raw". My findings are here:
using Base.Threads
using Bumper
using StrideArrays # Not necessary, but can make operations like broadcasting with Bumper.jl faster.
using Polyester
using BenchmarkTools
using ChunkSplitters
using StaticArrays
using StructArrays
struct DimensionalData{D, T <: AbstractFloat}
vectors::Tuple{Vararg{Vector{T}, D}}
V::StructArray{SVector{D, T}, 1, Tuple{Vararg{Vector{T}, D}}}
# General constructor for vectors
function DimensionalData(vectors::Vector{T}...) where {T}
D = length(vectors)
V = StructArray{SVector{D, T}}(vectors)
new{D, T}(Tuple(vectors), V)
end
# Constructor for initializing with all zeros, adapting to dimension D
function DimensionalData{D, T}(len::Int) where {D, T}
vectors = ntuple(d -> zeros(T, len), D) # Create D vectors of zeros
V = StructArray{SVector{D, T}}(vectors)
new{D, T}(vectors, V)
end
end
function ReductionFunctionChunk!(dρdtI, I, J, drhopLp, drhopLn)
XT = eltype(dρdtI); XL = length(dρdtI); X0 = zero(XT)
nchunks = nthreads() # Assuming nchunks is defined somewhere as nthreads()
@inbounds @no_escape begin
local_X = @alloc(XT, XL, nchunks)
fill!(local_X,X0)
# Directly iterate over the chunks
@batch for ichunk in 1:nchunks
chunk_inds = getchunk(I, ichunk; n=nchunks)
for idx in chunk_inds
i = I[idx]
j = J[idx]
# Accumulate the contributions into the correct place
local_X[i, ichunk] += drhopLp[idx]
local_X[j, ichunk] += drhopLn[idx]
end
end
# Reduction step
# using @tturbo is slightly faster than batch (28 mus), but @batch (30 mus) works for svector, so we prefer this.
@batch for ix in 1:XL
for chunk in 1:nchunks
dρdtI[ix] += local_X[ix, chunk]
end
end
end
return nothing
end
begin
ProblemScaleFactor = 1
NumberOfPoints = 6195*ProblemScaleFactor
NumberOfInterations = 50000*ProblemScaleFactor
I = rand(1:NumberOfPoints, NumberOfInterations)
J = I; #rand(1:NumberOfPoints, NumberOfInterations)
V = zeros(SVector{2,Float64},NumberOfPoints)
VL = rand(eltype(V),NumberOfInterations)
VD = DimensionalData{2, Float64}(NumberOfPoints)
VDL = DimensionalData{2, Float64}(NumberOfInterations)
VDL.V .= VL
V .*= 0
ReductionFunctionChunk!(V,I,J,VL,VL)
println("Value when doing chunk svector reduction: ", sum(V))
VD.V .*= 0
ReductionFunctionChunk!(VD.V,I,J,VDL.V,VDL.V)
println("Value when doing chunk svector reduction with dimensional data: ", sum(VD.V))
println("Chunk function:")
display(@benchmark ReductionFunctionChunk!($V,$I,$J,$VL,$VL))
println("Chunk function with struct:")
K = VD.V
KL = VDL.V
@benchmark ReductionFunctionChunk!($K,$I,$J,$KL,$KL)
end
Value when doing chunk svector reduction: [50079.66957585603, 50105.60196762815]
Value when doing chunk svector reduction with dimensional data: [50079.66957585603, 50105.60196762815]
Chunk function:
BenchmarkTools.Trial: 10000 samples with 1 evaluation.
Range (min … max): 37.800 μs … 434.800 μs ┊ GC (min … max): 0.00% … 0.00%
Time (median): 54.200 μs ┊ GC (median): 0.00%
Time (mean ± σ): 51.479 μs ± 9.509 μs ┊ GC (mean ± σ): 0.00% ± 0.00%
▁ █
▁█▇▃▁▁▂▃▁▃▃▂▄▄▂▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▂▅▆██▃▁▂▆▇▆▄▆▂▃▂▂▂▂▂▂▂▂▁▁▁▁▁ ▂
37.8 μs Histogram: frequency by time 65 μs <
Memory estimate: 0 bytes, allocs estimate: 0.
Chunk function with struct:
BenchmarkTools.Trial: 10000 samples with 1 evaluation.
Range (min … max): 47.900 μs … 355.600 μs ┊ GC (min … max): 0.00% … 0.00%
Time (median): 77.800 μs ┊ GC (median): 0.00%
Time (mean ± σ): 68.857 μs ± 15.964 μs ┊ GC (mean ± σ): 0.00% ± 0.00%
▅█▁▃▂▁ ▄▆█▇▆▅▃▂▂▁▁ ▂
███████▇███▇█▆▇▆▅▄▆▄▄▅▄▅▃▄▄▇▆▆█▇████▆▇████████████▇▇▆▆▅▆▅▄▄▂ █
47.9 μs Histogram: log(frequency) by time 92.7 μs <
Memory estimate: 176 bytes, allocs estimate: 2.
My findings are as follows:
The 3 * 64 bytes, seem to come from indexing operations. By uncommenting an index operation inside of ReductionFunctionChunk!
the amount of memory usage goes down.
More importantly, when removing @batch
everything goes to zero allocations as expected.
Kind regards
Metadata
Metadata
Assignees
Labels
No labels