Description
The contract of _PyExecutorObject
currently is that it executes zero or more bytecode instructions.
We should change that so that it must execute at least one instruction.
The reason for this change is so that we can use ENTER_EXECUTOR
anywhere, and that we will need to replace arbitrary instructions with ENTER_EXECUTOR
(see below for why)
If a _PyExecutorObject
executes zero instructions, then ENTER_EXECUTOR
is responsible for executing the original instruction.
If it executes one or more instructions the behavior of the first instruction is handled by the _PyExecutorObject
so ENTER_EXECUTOR
is just a simple, and fast, (tail) call.
We also want to change the signature of the execute
function pointer to take _PyExecutorObject **
instead of _PyExecutorObject *
.
See faster-cpython/ideas#621 for details.
We might as well make both changes at once.
I think our only "real" optimizer already executes at least three instructions, so it should be a fairly easy change.
Why do we need insert executors at arbitrary instructions?
Consider a nested if with at least two balanced hot paths, and at least one cold path.
At the join point, we want both paths to continue in optimized code, but as neither represents more than 50% of the flow, they will likely stop at the join point. Ideally, they will both jump into the same, following optimized code. But in order to find it, it needs to be attached to the tier 1 instructions using ENTER_EXECUTOR
and that join point could be an arbitrary instruction, likely a LOAD_FAST
or LOAD_GLOBAL
.
Note that this makes life harder for the optimizer, as it cannot simply exit the optimized code if a guard fails in the first instruction. It is obliged to fully execute that instruction.
Other optimizers might also want to overwrite instructions with ENTER_EXECUTOR
; PyTorch Dynamo, for example.