Closed
Description
Bug report
Bug description:
I don't have a succinct reproducer for this bug yet, but I saw the following race in JAX CI:
WARNING: ThreadSanitizer: data race (pid=208275)
Write of size 8 at 0x555555d43b60 by thread T121:
#0 grow_thread_array /__w/jax/jax/cpython/Python/qsbr.c:101:19 (python3.13+0x4a3905) (BuildId: 8f8869b5f3143bd14dda26aa2bf37336b4902370)
#1 _Py_qsbr_reserve /__w/jax/jax/cpython/Python/qsbr.c:203:13 (python3.13+0x4a3905)
#2 new_threadstate /__w/jax/jax/cpython/Python/pystate.c:1569:27 (python3.13+0x497df2) (BuildId: 8f8869b5f3143bd14dda26aa2bf37336b4902370)
#3 PyGILState_Ensure /__w/jax/jax/cpython/Python/pystate.c:2766:16 (python3.13+0x49af78) (BuildId: 8f8869b5f3143bd14dda26aa2bf37336b4902370)
#4 nanobind::gil_scoped_acquire::gil_scoped_acquire() /proc/self/cwd/external/nanobind/include/nanobind/nb_misc.h:15:43 (xla_extension.so+0xa4fe551) (BuildId: 32eac14928efa68545d22a6013f16aa63a686fef)
#5 xla::CpuCallback::PrepareAndCall(void*, void**) /proc/self/cwd/external/xla/xla/python/callback.cc:67:26 (xla_extension.so+0xa4fe551)
#6 xla::XlaPythonCpuCallback(void*, void**, XlaCustomCallStatus_*) /proc/self/cwd/external/xla/xla/python/callback.cc:177:22 (xla_extension.so+0xa500c9a) (BuildId: 32eac14928efa68545d22a6013f16aa63a686fef)
...
Previous read of size 8 at 0x555555d43b60 by thread T124:
#0 _Py_qsbr_reserve /__w/jax/jax/cpython/Python/qsbr.c:216:47 (python3.13+0x4a3ad7) (BuildId: 8f8869b5f3143bd14dda26aa2bf37336b4902370)
#1 new_threadstate /__w/jax/jax/cpython/Python/pystate.c:1569:27 (python3.13+0x497df2) (BuildId: 8f8869b5f3143bd14dda26aa2bf37336b4902370)
#2 PyGILState_Ensure /__w/jax/jax/cpython/Python/pystate.c:2766:16 (python3.13+0x49af78) (BuildId: 8f8869b5f3143bd14dda26aa2bf37336b4902370)
#3 nanobind::gil_scoped_acquire::gil_scoped_acquire() /proc/self/cwd/external/nanobind/include/nanobind/nb_misc.h:15:43 (xla_extension.so+0xa4fe551) (BuildId: 32eac14928efa68545d22a6013f16aa63a686fef)
#4 xla::CpuCallback::PrepareAndCall(void*, void**) /proc/self/cwd/external/xla/xla/python/callback.cc:67:26 (xla_extension.so+0xa4fe551)
#5 xla::XlaPythonCpuCallback(void*, void**, XlaCustomCallStatus_*) /proc/self/cwd/external/xla/xla/python/callback.cc:177:22 (xla_extension.so+0xa500c9a) (BuildId: 32eac14928efa68545d22a6013f16aa63a686fef)
...
I think what's happening here is that two threads that were not created by Python are calling PyGILState_Ensure
concurrently, so they can call into CPython APIs.
This appears to be an unlocked access on shared->array
and it would probably be sufficient to move that read under the mutex in _Py_qsbr_reserve
.
CPython versions tested on:
3.13
Operating systems tested on:
Linux