WIP: Use calloc for zeros #19852

danielmatz · 2017-01-04T03:20:58Z

This resolves #130. The main goal is to use calloc when creating large arrays of zeros (only opting in to this behavior for appropriate eltypes). But I had to make a number of other changes along the way:

Small arrays of zeros are allocated as before, but zeroed using memset
Renamed jl_malloc (and friends) to jl_gc_malloc to disambiguate from jl_malloc_aligned, which allocates memory that isn't gc counted
Added jl_gc_malloc_aligned (and friends)
Removed jl_gc_managed_malloc, the old gc counted, aligned malloc, which didn't have a matching calloc
Freed up a bit in jl_array_flags_t by tracking whether the memory is aligned or not in mallocarray_t instead

Before I submit this, I'd still like to do the following:

Check that these changes do indeed make things faster! (need to check both large and small arrays)
Add special casing to jl_gc_malloc_aligned and jl_gc_calloc_aligned
Add checks on the alignment argument (?)
When using jl_gc_realloc_aligned, be smarter and only memcpy the memory being used by the array
Double check the zeros tests
Use calloc for more eltypes

It seemed like they were related, but in fact jl_malloc works with the GC and jl_malloc_aligned does not. So jl_malloc is now jl_gc_malloc. Also renamed jl_gc_free and jl_gc_realloc to match.

danielmatz · 2017-03-11T17:58:22Z

I know I need to update this branch to fix the conflicts, but I have also been meaning to post some benchmark results. Let me do the latter now. I'll find time soon to fix the conflicts. I also need to figure out why the tests fail on systems other than macOS.

I benchmarked both zeros(Float64, s) and Array(Float64, s) for a variety of sizes. The 2048 and 2049 sizes are right at the breakpoint in _new_array_ where it switches from allocating the buffer inline with the array or separately with malloc or calloc.

I'm comparing master to the point in my PR after I've switched array creation to use jl_gc_malloc and then to the full PR which uses calloc for zeros.

Array(Float64, s):

size	master (min/median)	gc malloc change (min/median)	full pr (min/median)
10	110 ns / 120 ns	110 ns / 130 ns	110 ns / 120 ns
2048	240 ns / 1.7 µs	250 ns / 1.4 µs	240 ns / 1.3 µs
2049	240 ns / 1.5 µs	270 ns / 2.1 µs	390 ns / 2.1 µs
10000	1.4 µs / 5.5 µs	600 ns / 2.4 µs	760 ns / 2.4 µs
100000	7.3 µs / 8.7 µs	3.5 µs / 3.9 µs	3.6 µs / 3.9 µs
1000000	300 µs / 340 µs	1.8 µs / 4.1 µs	2.4 µs / 4.1 µs

zeros(Float64, s):

size	master (min/median)	gc malloc change (min/median)	full pr (min/median)
10	42 ns / 56 ns	43 ns / 110 ns	38 ns / 100 ns
2048	1.2 µs / 1.9 µs	1.8 µs / 7.6 µs	1.9 µs / 7.5 µs
2049	1.5 µs / 2.8 µs	2.2 µs / 7.5 µs	2.3 µs / 7.5 µs
10000	6.1 µs / 13 µs	10 µs / 34 µs	9.9 µs / 34 µs
100000	91 µs / 230 µs	100 µs / 370 µs	3.4 µs / 3.6 µs
1000000	1.0 ms / 2.2 ms	1.1 ms / 3.6 ms	3.6 µs / 4.0 µs

Did you all expect the gc malloc change to so drastically improve the times for large arrays?

Is the slight degradation for small malloced arrays (size = 2049) OK?

The good news is that using calloc does help a lot for large arrays of zeros.

ViralBShah · 2017-03-12T15:15:21Z

Another good thing to check on is the performance of the shootout and the micro benchmarks. The slight degradation for 2049 appears ok to me for all the other gains.

danielmatz · 2017-03-21T04:23:53Z

Here are the benchmark results:

(Any["micro", "quicksort"], TrialJudgement(+0.17% => invariant))
(Any["micro", "parseint"], TrialJudgement(-1.27% => invariant))
(Any["micro", "randmatstat"], TrialJudgement(+1.40% => invariant))
(Any["micro", "pisum"], TrialJudgement(+0.02% => invariant))
(Any["micro", "fib"], TrialJudgement(-0.73% => invariant))
(Any["micro", "randmatmul"], TrialJudgement(+9.96% => invariant))
(Any["micro", "mandel"], TrialJudgement(+0.42% => invariant))
(Any["shootout", "k_nucleotide"], TrialJudgement(+4.76% => invariant))
(Any["shootout", "fasta"], TrialJudgement(+1.26% => invariant))
(Any["shootout", "spectralnorm"], TrialJudgement(-0.00% => invariant))
(Any["shootout", "revcomp"], TrialJudgement(+1.61% => invariant))
(Any["shootout", "pidigits"], TrialJudgement(+49.25% => regression))
(Any["shootout", "binary_trees"], TrialJudgement(+11.79% => invariant))
(Any["shootout", "nbody"], TrialJudgement(+0.01% => invariant))
(Any["shootout", "regex_dna"], TrialJudgement(+8.55% => invariant))
(Any["shootout", "meteor_contest"], TrialJudgement(+7.98% => invariant))
(Any["shootout", "mandelbrot"], TrialJudgement(+0.28% => invariant))
(Any["shootout", "fannkuch"], TrialJudgement(-0.63% => invariant))
(Any["shootout", "nbody_vec"], TrialJudgement(+0.10% => invariant))

The pidigits regression is caused by spending a lot more time in the BigInt() constructor, which is getting called by <<(::BigInt, ::UInt64). I'm not sure yet why my changes would cause that...

JeffBezanson · 2017-07-25T15:34:30Z

Looks good. I'm going to rebase and take all but the last two commits, and implement #9147 instead, which will be simpler since we won't need the extra argument to request zeros.

JeffBezanson · 2017-07-25T15:57:41Z

@danielmatz Please review #22953 if you get a chance. I think that part should be ok to merge and you can submit future PRs to do the follow-up items if you want.

StefanKarpinski requested a review from JeffBezanson January 4, 2017 19:30

danielmatz force-pushed the use_calloc_for_zeros branch from b0f306e to 04a6f84 Compare January 5, 2017 05:00

kshyatt added the performance Must go faster label Jan 5, 2017

danielmatz force-pushed the use_calloc_for_zeros branch 2 times, most recently from b280137 to 984051a Compare January 23, 2017 04:27

danielmatz added 6 commits February 12, 2017 22:37

Disambiguate jl_malloc from jl_malloc_aligned

47fe03d

It seemed like they were related, but in fact jl_malloc works with the GC and jl_malloc_aligned does not. So jl_malloc is now jl_gc_malloc. Also renamed jl_gc_free and jl_gc_realloc to match.

Add gc counted, aligned malloc

d31de09

Remove old aligned malloc functions and use new manually aligned ones

abd13dc

Remove the isaligned array flag

d128dcd

Add option to array creation functions to make zeros arrays

49823ef

Use calloc for zeros for eltype of Float64

d512d83

danielmatz force-pushed the use_calloc_for_zeros branch from 984051a to d512d83 Compare March 11, 2017 17:58

vtjnash added this to the 1.0 milestone May 1, 2017

JeffBezanson self-assigned this Jul 13, 2017

JeffBezanson mentioned this pull request Jul 25, 2017

clean up gc_malloc functions #22953

Closed

JeffBezanson closed this Jul 25, 2017

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

WIP: Use calloc for zeros #19852

WIP: Use calloc for zeros #19852

danielmatz commented Jan 4, 2017 •

edited

Loading

Uh oh!

danielmatz commented Mar 11, 2017

Uh oh!

ViralBShah commented Mar 12, 2017

Uh oh!

danielmatz commented Mar 21, 2017

Uh oh!

JeffBezanson commented Jul 25, 2017

Uh oh!

JeffBezanson commented Jul 25, 2017

Uh oh!

Uh oh!

Uh oh!

WIP: Use calloc for zeros #19852

WIP: Use calloc for zeros #19852

Conversation

danielmatz commented Jan 4, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

danielmatz commented Mar 11, 2017

Uh oh!

ViralBShah commented Mar 12, 2017

Uh oh!

danielmatz commented Mar 21, 2017

Uh oh!

JeffBezanson commented Jul 25, 2017

Uh oh!

JeffBezanson commented Jul 25, 2017

Uh oh!

Uh oh!

danielmatz commented Jan 4, 2017 •

edited

Loading