faster rand(::MersenneTwister, UnitRange{<:BitInteger}) (fix #22600) #22612

rfourquet · 2017-06-29T17:00:35Z

This is a solution simirar to #14772: generation with MersenneTwister is fast, so it's cheaper to waste entropy bits rather than using divisions to save those bits as much as we can (it may have been different when MersenneTwister couldn't use the fast SIMD-optimized array functions).

I intend to extend this work (when applicable) before the next release, but this first step will probably be the most visible.

Few timings I get (using in fact @btime from BenchmarkTools, and m being a MersenneTwister object), showing master/PR, or master/randi/PR when applicable (using randi from StatsBase):

julia> @time rand(m, false:true)
21ns / 9ns

julia> @time rand(m, 1:2)
33ns / 21ns / 8.5ns

julia> @time rand(m, 1:2^30)
33ns / 29ns / 8.5ns

julia> @time rand(m, 1:2^30+1)
37ns / 29ns / 19ns
 
julia> @time rand(m, 1:2^52)
38ns / 31ns / 8.5ns

julia> @time rand(m, 1:2^52+1)
38ns / 31ns / 28ns

julia> @time rand(m, 1:2^53)
38ns / 31ns / 13ns

julia> @time rand(m, 1:Int128(2)^104)
102ns / 19ns

julia> @time rand(m, 1:Int128(2)^104+1)
101ns / 39ns

julia> @time rand(m, 1:Int128(2)^105)
102ns / 24ns

Basically, with the rejection sampling in this PR, when the lenght of the interval is a bit above a power of 2, then about 1 sample over two is rejected, which explains the change of performances around a power of 2. Also, this takes advantage of the fact that MersenneTwister nativily generates 52 bits of entropy, hence a change in performances after lengths of 2^52 and 2^104.

rfourquet · 2017-06-29T17:04:36Z

cc @nalimilan @mschauer

nalimilan · 2017-06-29T17:21:08Z

Sorry, I'm really not qualified to comment on the RNG code.

GregPlowman · 2017-06-29T23:23:15Z

Would this function also benefit from using the new method for UnitRange, rather than using RangeGenerator?

function rand!(rng::AbstractRNG, A::AbstractArray, r::AbstractArray)
     g = RangeGenerator(1:(length(r)))
     for i in eachindex(A)
         @inbounds A[i] = r[rand(rng, g)]
     end
     return A
 end

Or perhaps a new RangeGenerator appropriate for new method?

rfourquet · 2017-06-30T11:01:13Z

Would this function also benefit from using the new method for UnitRange, rather than using RangeGenerator

According to my preliminary testings, mostly, but with less speed-up and even a slow-down in some cases, e.g. for the range 1:2^52+1 (when calling rand!(m, A, 1:2^52+1)). So I'm not yet sure what's the best strategy. Concerning modifying RangeGenerator, it would also require checking that this new version is allways faster for all (most) the RNGs types we can test, or to introduce a trait for different RNGs, and call e.g. RangeGenerator(1:2, rng), to choose the best implementation depending on the rng type. It will require a bit more time, so I prefered proposing first the simplest fix (which is unambiguously faster).

ViralBShah · 2017-07-03T00:00:53Z

Perhaps @simonbyrne can also take a look.

mschauer · 2017-07-06T13:28:58Z

Probably we get only one shot at changing this so I think it is worthwhile to think a bit about the problem space.

Basically we have to keep track of three sources of cost:

computation of the generator (the cost of mm(n,k) = div(n,k)*k)
resampling (e.g. rand_ui52_raw)
modulo operation

in your approach all costs are source 2, previously dominant costs are 1. and 3 and this might well be the right solution.

On this machine, costs for mm, rand_ui52_raw and % k are roughly proportional to 5 ; 1 and 1/4.

There is not much what can be done about 3, but (on this machine) it is also only 1/4 of the sampling costs, so worthwhile to do if it decreases the number of rejections by a factor.

mm(n,k) is expensive so it is worthwhile if more than say a handful random numbers are needed. As a basic use case is sampling random integeras from all ranges from 1:1 to 1:n your pull request might be the right thing to do. I just wanted to point out that in this case also the mm(n,k) computation becomes cheaper. The basic observation is that the mm(n,k) can be computed recursivley.

I have a "proof of concept" implemenation here https://gist.github.com/mschauer/d6a7a091bada2a828e1ac74e7baa106b

It defines an iterator which counts downwards from i = k, k-1, k-2 and gives pairs of (i, rand(1:i)) in each iteration. It is cheap if k is small and i is close to k.

So in praxis one might divide k by a factor and restart the iterator when i becomes relatively small. I have not fully understood this yet but wanted to invite you to think about it, in the meanwhile I continue reading here.

ggggggggg · 2017-07-06T15:25:02Z

If this is PR/master, doesn't this show that master is faster than your PR?

julia> @time rand(m, false:true)
21ns / 9ns

rfourquet · 2017-07-06T16:15:21Z

Probably we get only one shot at changing this

What do you mean?

The basic observation is that the mm(n,k) can be computed recursivley.

Very interesting! I will take time to read your proof of concept, and compare efficiency. That said, I guess the simplicity of this PR may be preferable for the general case.

If this is PR/master, doesn't this show that master is faster than your PR?

My mistake, thanks for reporting (editing OP).

mschauer · 2017-07-06T18:25:10Z

one shot

There are some reasons not to change the random number generator too liberally, because old versions (at least say from julia 1.0 onwards) should be available for reproducibility of numerical experiments. R still has a normal variate generator called "Buggy Kinderman-Ramage" (kept for reproducibility...)

rfourquet · 2017-07-07T09:03:45Z

old versions (at least say from julia 1.0 onwards) should be available for reproducibility of numerical experiments

I was vaguely thinking of this reason, but didn't remember having seen this stated clearly in the forums. Do you think the generators could be changed (i.e. the reproducibility broken) between 1.0 and 2.0 ?

StefanKarpinski · 2017-07-11T15:30:36Z

I would really like for RNG to be explicitly set when srand is called so that we can change RNGs and other algorithms in the future without breaking code. As a second-best, if we could change them without breaking code silently, that would also be acceptable. The main problem with this is that we don't generally pass RNGs around explicitly – we often use the global default RNG whose type cannot be changed. This seems like something that we very much need a solution to.

rfourquet · 2017-07-11T16:19:36Z

@StefanKarpinski I'm afraid I don't understand your concern here: in which case code will be broken?

StefanKarpinski · 2017-07-11T19:55:02Z

What I was trying to explain (poorly) is that if everything one needs to know about how random values are generated is specified when seeding the RNG, then we're free to change anything we want. In this case, we could have one RNG type for using resampling and one RNG type for using a pre-computed generator. This not only allows the user to decide what they want based on their use case, but also means that if we change the default in the future, user code won't break since it will keep using whatever they chose when they seed their RNG.

rfourquet · 2017-07-12T10:26:56Z

I think I understand a bit better (this also relates to point 3. of #17918 (comment)), but not fully yet. But by "breaking code", do you mean that the stream of random numbers is modified when we change the implementation (i.e. non-reproducibility) ? Would you like to prevent the use of srand([seed]), and force something like srand(MersenneTwister, [seed]) to explicitly tell which RNG type to use as the implicit one? then, if the implementation changes, the old one is kept with the old name, and the new updated RNG is named e.g. MersenneTwister2. Then, we could still allow srand(), with proper documentation, meaning whatever best current RNG is available as a default, when the user doesn't care about this problem.

rfourquet · 2017-07-12T10:50:40Z

I feel incline to merge this soon, as that's the best we have for now, while still exploring @mschauer idea.

tkelman · 2017-07-12T11:33:21Z

are there benchmarks that would track this? @nanosoldier runbenchmarks(ALL, vs = ":master")

rfourquet · 2017-07-12T12:09:18Z

are there benchmarks that would track this?

under review, see JuliaCI/BaseBenchmarks.jl#82.

nanosoldier · 2017-07-12T14:31:16Z

Your benchmark job has completed - possible performance regressions were detected. A full report can be found here. cc @ararslan

rfourquet · 2017-08-11T11:06:09Z

@nanosoldier runbenchmarks("random", vs = ":master")

nanosoldier · 2017-08-11T12:11:17Z

Your benchmark job has completed - no performance regressions were detected. A full report can be found here. cc @ararslan

rfourquet added the randomness Random number generation and the Random stdlib label Jun 29, 2017

rfourquet added the performance Must go faster label Jun 29, 2017

rfourquet changed the title ~~faster rand(::MersenneTwister, BitInteger) (fix #22600)~~ faster rand(::MersenneTwister, UnitRange{<:BitInteger}) (fix #22600) Jul 4, 2017

faster rand(::MersenneTwister, BitInteger)

07b4eac

rfourquet force-pushed the rf/MT-randrange branch from 14a6e6e to 07b4eac Compare August 11, 2017 12:23

rfourquet merged commit 143c9e2 into master Aug 12, 2017

fredrikekre deleted the rf/MT-randrange branch August 12, 2017 13:26

Uh oh!

faster rand(::MersenneTwister, UnitRange{<:BitInteger}) (fix #22600) #22612

faster rand(::MersenneTwister, UnitRange{<:BitInteger}) (fix #22600) #22612

Uh oh!

Conversation

rfourquet commented Jun 29, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

rfourquet commented Jun 29, 2017

Uh oh!

nalimilan commented Jun 29, 2017

Uh oh!

GregPlowman commented Jun 29, 2017

Uh oh!

rfourquet commented Jun 30, 2017

Uh oh!

ViralBShah commented Jul 3, 2017

Uh oh!

mschauer commented Jul 6, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ggggggggg commented Jul 6, 2017

Uh oh!

rfourquet commented Jul 6, 2017

Uh oh!

mschauer commented Jul 6, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

rfourquet commented Jul 7, 2017

Uh oh!

StefanKarpinski commented Jul 11, 2017

Uh oh!

rfourquet commented Jul 11, 2017

Uh oh!

StefanKarpinski commented Jul 11, 2017

Uh oh!

rfourquet commented Jul 12, 2017

Uh oh!

rfourquet commented Jul 12, 2017

Uh oh!

tkelman commented Jul 12, 2017

Uh oh!

rfourquet commented Jul 12, 2017

Uh oh!

nanosoldier commented Jul 12, 2017

Uh oh!

rfourquet commented Aug 11, 2017

Uh oh!

nanosoldier commented Aug 11, 2017

Uh oh!

Uh oh!

rfourquet commented Jun 29, 2017 •

edited

Loading

mschauer commented Jul 6, 2017 •

edited

Loading

mschauer commented Jul 6, 2017 •

edited

Loading