-
-
Notifications
You must be signed in to change notification settings - Fork 5.6k
WIP: Use calloc for zeros #19852
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
WIP: Use calloc for zeros #19852
Conversation
b0f306e
to
04a6f84
Compare
b280137
to
984051a
Compare
It seemed like they were related, but in fact jl_malloc works with the GC and jl_malloc_aligned does not. So jl_malloc is now jl_gc_malloc. Also renamed jl_gc_free and jl_gc_realloc to match.
I know I need to update this branch to fix the conflicts, but I have also been meaning to post some benchmark results. Let me do the latter now. I'll find time soon to fix the conflicts. I also need to figure out why the tests fail on systems other than macOS. I benchmarked both I'm comparing master to the point in my PR after I've switched array creation to use
Did you all expect the gc malloc change to so drastically improve the times for large arrays? Is the slight degradation for small malloced arrays (size = 2049) OK? The good news is that using |
984051a
to
d512d83
Compare
Another good thing to check on is the performance of the shootout and the micro benchmarks. The slight degradation for 2049 appears ok to me for all the other gains. |
Here are the benchmark results: (Any["micro", "quicksort"], TrialJudgement(+0.17% => invariant))
(Any["micro", "parseint"], TrialJudgement(-1.27% => invariant))
(Any["micro", "randmatstat"], TrialJudgement(+1.40% => invariant))
(Any["micro", "pisum"], TrialJudgement(+0.02% => invariant))
(Any["micro", "fib"], TrialJudgement(-0.73% => invariant))
(Any["micro", "randmatmul"], TrialJudgement(+9.96% => invariant))
(Any["micro", "mandel"], TrialJudgement(+0.42% => invariant))
(Any["shootout", "k_nucleotide"], TrialJudgement(+4.76% => invariant))
(Any["shootout", "fasta"], TrialJudgement(+1.26% => invariant))
(Any["shootout", "spectralnorm"], TrialJudgement(-0.00% => invariant))
(Any["shootout", "revcomp"], TrialJudgement(+1.61% => invariant))
(Any["shootout", "pidigits"], TrialJudgement(+49.25% => regression))
(Any["shootout", "binary_trees"], TrialJudgement(+11.79% => invariant))
(Any["shootout", "nbody"], TrialJudgement(+0.01% => invariant))
(Any["shootout", "regex_dna"], TrialJudgement(+8.55% => invariant))
(Any["shootout", "meteor_contest"], TrialJudgement(+7.98% => invariant))
(Any["shootout", "mandelbrot"], TrialJudgement(+0.28% => invariant))
(Any["shootout", "fannkuch"], TrialJudgement(-0.63% => invariant))
(Any["shootout", "nbody_vec"], TrialJudgement(+0.10% => invariant)) The pidigits regression is caused by spending a lot more time in the |
Looks good. I'm going to rebase and take all but the last two commits, and implement #9147 instead, which will be simpler since we won't need the extra argument to request zeros. |
@danielmatz Please review #22953 if you get a chance. I think that part should be ok to merge and you can submit future PRs to do the follow-up items if you want. |
This resolves #130. The main goal is to use calloc when creating large arrays of zeros (only opting in to this behavior for appropriate eltypes). But I had to make a number of other changes along the way:
memset
jl_malloc
(and friends) tojl_gc_malloc
to disambiguate fromjl_malloc_aligned
, which allocates memory that isn't gc countedjl_gc_malloc_aligned
(and friends)jl_gc_managed_malloc
, the old gc counted, alignedmalloc
, which didn't have a matchingcalloc
jl_array_flags_t
by tracking whether the memory is aligned or not inmallocarray_t
insteadBefore I submit this, I'd still like to do the following:
jl_gc_malloc_aligned
andjl_gc_calloc_aligned
jl_gc_realloc_aligned
, be smarter and onlymemcpy
the memory being used by the arrayzeros
tests