Using np.random.seed(number)
has been a best practice when using NumPy in Python to create reproducible work. Setting the random seed means that your work is reproducible to others who use your code. But now when you look at the docs for np.random.seed
, the description reads:
This is a convenient, legacy function.
The best practice is to not reseed a BitGenerator, but rather to recreate a new one. This method is here for legacy reasons only.
So what’s changed? I’ll explain the old method and the issues with it. Then I’ll demonstrate the new best practice and its benefits.
Why to Stop Using NumPy’s Global Random Seed
Using np.random.seed(number) sets what NumPy calls the global random seed, which affects all uses to the np.random.* module. If imported code or other scripts explicitly call np.random.seed()
, they can overwrite the global random state, potentially breaking reproducibility.
How NumPy Random Seed Works in Python
If you look up tutorials using np.random
you see many of them using np.random.seed
to set the seed for reproducible work. We can see how this works:
>>> import numpy as np
>>> import numpy as np
>>> np.random.rand(4)
array([0.96176779, 0.7088082 , 0.06416725, 0.82679036])
>>> np.random.rand(4)
array([0.15051909, 0.77788803, 0.67073372, 0.32134285])
As you can see, two calls to the function lead to two completely different answers. If you want somebody to be able to reproduce your projects, you can set the seed with the following code snippet:
>>> np.random.seed(2021)
>>> np.random.rand(4)
array([0.60597828, 0.73336936, 0.13894716, 0.31267308])
>>> np.random.seed(2021)
>>> np.random.rand(4)
array([0.60597828, 0.73336936, 0.13894716, 0.31267308])
You see the results are the same. If you need to prove this to yourself, you can enter the above code on your Python setup.
Setting the seed means the next random call is the same; it sets the sequence of random numbers such that any code that produces or uses random numbers (with NumPy) will now produce the same sequence of numbers. For example, look at the following:
>>> np.random.seed(2021)
>>> np.random.rand(4)
array([0.60597828, 0.73336936, 0.13894716, 0.31267308])
>>> np.random.rand(4)
array([0.99724328, 0.12816238, 0.17899311, 0.75292543])
>>> np.random.rand(4)
array([0.66216051, 0.78431013, 0.0968944 , 0.05857129])
>>> np.random.rand(4)
array([0.96239599, 0.61655744, 0.08662996, 0.56127236])
>>> np.random.seed(2021)
>>> np.random.rand(4)
array([0.60597828, 0.73336936, 0.13894716, 0.31267308])
>>> np.random.rand(4)
array([0.99724328, 0.12816238, 0.17899311, 0.75292543])
>>> np.random.rand(4)
array([0.66216051, 0.78431013, 0.0968944 , 0.05857129])
>>> np.random.rand(4)
array([0.96239599, 0.61655744, 0.08662996, 0.56127236])
The Problem With NumPy’s Global Random Seed
While global seeds work in isolated scripts, they fall short in modular, multi-script workflows. You can create reproducible calls, which means that all random numbers generated after setting the seed will be the same on any machine. For the most part, this is true; and for many projects, you may not need to worry about this.
The problem comes in larger projects or projects with imports that could also set the seed. Using np.random.seed(number)
sets what NumPy calls the global random seed, which affects all uses to the np.random.*
module. Some imported packages or other scripts could reset the global random seed to another random seed with np.random.seed(another_number)
, which may lead to undesirable changes to your output and your results becoming unreproducible. For the most part, you will only need to ensure you use the same random numbers for specific parts of your code (like tests or functions).
Np.random.default_rng(): The Solution to NumPy Random Seed
This is one of the reasons NumPy has moved toward advising users to create a random number generator for specific tasks (or to even pass around when you need parts to be reproducible).
“The preferred best practice for getting reproducible pseudorandom numbers is to instantiate a generator object with a seed and pass it around.” — Robert Kern, NEP19.
Using this new best practice looks like this:
import numpy as np
>>> rng = np.random.default_rng(2021)
>>> rng.random(4)
array([0.75694783, 0.94138187, 0.59246304, 0.31884171])
As you can see, these numbers are different from the earlier example because NumPy introduced default_rng()
in version 1.17 as the preferred generator interface, though np.random
continues to use RandomState
by default for backward compatibility.
>>> rng = np.random.RandomState(2021)
>>> rng.rand(4)
array([0.60597828, 0.73336936, 0.13894716, 0.31267308])
Use RandomState
only when maintaining legacy code — it does not offer the improvements or statistical guarantees of the newer Generator API.
The Benefits of Using np.random.default_rng() vs. NumPy Random Seed
You can pass random number generators around between functions and classes, meaning each individual or function could have its own random state without resetting the global seed. In addition, each script could pass a random number generator to functions that need to be reproducible. The benefit is you know exactly what random number generator is used in each part of your project.
def f(x, rng): return rng.random(1)
#Intialise a random number generator
rng = np.random.default_rng(2021)
#pass the rng to functions which you would like to use it
random_number = f(x, rng)
Other benefits arise with parallel processing, as Albert Thomas shows us.
Using independent random number generators can help improve the reproducibility of your results. You can do this by not relying on the global random state (which can be reset or used without knowing). Passing around a random number generator means you can keep track of when and how it was used and ensure your results are the same.
Frequently Asked Questions
What does np.random.seed() do in NumPy?
In Python, np.random.seed()
sets a global seed that ensures random number generation is reproducible across runs. However, it affects all random calls using np.random
, which can be unintentionally altered by other scripts or packages.
What’s the recommended way to generate reproducible random numbers in NumPy?
To generate reproducible random numbers in NumPy, use np.random.default_rng()
. This method creates a local generator and isolates random number generation, which improves reproducibility and avoids interference from other code.