This reply to another topic mentions that reduce(hcat, A) and reduce(vcat, A) are optimized to perform better than hcat(A...) and vcat(A...). Is there such thing for cat()?
You’ll probably get the best possible performance by doing this in the simplest way: Construct a new array of the appropriate size and then write a loop that fills in each slice of the stacked array. One-liner solutions are convenient, but a less-clever approach often offers the best performance.
Yeah, the loop is more than 100X faster than most of the options above and allocates vastly less memory:
julia> function assemble(A)
stacked = Array{Int, 3}(undef, size(first(A))..., length(A))
for i in 1:length(A)
stacked[:, :, i] = A[i]
end
stacked
end
assemble (generic function with 1 method)
julia> using BenchmarkTools
julia> A = [rand(Int, (28,28)) for _ ∈ 1:10000];
julia> @btime assemble($A);
24.793 ms (2 allocations: 59.81 MiB)
Unless you want to use a package, in which case you can make a view of them, instead of a new dense array. These should be fast to construct but possibly slower to use, in whatever the next step is.
julia> B = reduce(cat(dims=3), A);
julia> B ≈ JuliennedArrays.Align(A, 1,2)
true
julia> B ≈ LazyStack.stack(A)
true
julia> B ≈ RecursiveArrayTools.VectorOfArray(A)
true