Closed
Description
There are a few remaining scaling bottlenecks in the free-threaded build that we should fix.
I have been using the following benchmark to detect bottlenecks that were previously issues in older versions of the nogil forks:
https://gist.github.com/colesbury/429fe9f90036d43ad43576c3d357a12e
Note that for reliable results the above benchmark requires some setup:
- Adjust
NTHREADS
if necessary on your system - Disable turbo boost or equivalent on your system
- Avoid running on hyper-threading siblings (i.e., use
taskset -c 0-<N>
to choose separate physical cores)
Current bottlenecks
- cmodule_function
- load_string_const
- load_tuple_const
- create_closure
Underlying issues
- Reference count contention on non-string constants. We will want to immortalize most constants in
PyCodeObject
. - Reference count contention on
func.__qualname__
orcode.co_qualname
(when creating closure) - Reference count contention on module-level
PyCFunctionObjects
Linked PRs
- gh-118527: Use
_Py_ID(__main__)
for main module name #118528 - gh-118527: Use deferred reference counting for C functions on modules #118529
- gh-118527: Intern filename, name, and qualname in code objects. #118558
- gh-118527: Intern code name and filename on default build #118576
- gh-118527: Intern code consts in free-threaded build #118667