-
-
Notifications
You must be signed in to change notification settings - Fork 32.2k
bpo-46372: Try to mutate the LHS during fast float
ops
#30594
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
I think |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A few general comments:
Could you attach the benchmark results to the PR, not the issue? The speedup depends on the actual implementation, not the general idea.
I'm a bit surprised that this offers much of a speedup given the complexity of the int operators. Maybe the current implementation is just really slow?
Does the specialization of ints add much, or is it mainly the float specialization giving the speedup? I'd be curious to see how a float-only version compares.
It would be worth adding stats to give us some idea what fraction of operations can use this optimization or something like it. In other words, what fraction of operations have the lhs with a ref-count of 1 (or 2 when assigning to same variable)? That would be generally useful information beyond this particular optimization.
As a general rule, if something can be done at compile time, it should be. If not then it should be done at specialization time. Only do stuff at runtime, if we can't avoid it.
We can determine at compile time whether a binary operator if of the form s += x
or s = s + x
(where s
is a "fast" local). We should do that.
I don't know if it is worth having two specializations, one for the s = s + x
and one for the general form, or embedding the information in the cache. I suspect that it won't matter.
I like the float optimization, but I think our efforts to speed up int operations should focus or making int objects more efficient first. Otherwise the coupling of the interpreter to the int implementation is going to make improving the latter even more difficult.
When you're done making the requested changes, leave the comment: |
Yeah, that makes sense. They can also be updated as the implementation changes, too.
Looks like your intuition is correct. Here is a comparison of
I've updated the PR to use the adaptive entry for this purpose. I understand the benefits of a compile-time check, but doing the check at specialization time seems a lot simpler in this particular case.
Shall I drop the |
Looks good. |
Do you have time for a quick Teams call @markshannon? Might be easier to chat about the stats I gathered in-person. (If not, maybe tomorrow or Monday.) |
Somewhat surprising benchmarking results. |
pidigits slowdown should be somewhat expected, since it's mostly huge-integer operations, which are specialized and handled by _PyLong_Add and such before this PR but not after. |
int
/float
opsfloat
ops
Hm, looks like removing the I’ll go ahead and try the less-branchy version that we discussed this morning (an adaptive superinstruction based on the existing in-place string addition op). |
LGTM. Feel free to merge once the merge conflicts are fixed. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice and simple!
Python/ceval.c
Outdated
Py_DECREF(rhs); \ | ||
STACK_SHRINK(1); \ | ||
if (Py_REFCNT(lhs) == 1) { \ | ||
PyFloat_AS_DOUBLE(lhs) = d; \ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Presumably (see PEP 674) Victor won't like this use of a macro as a target. :-)
Nevertheless, this whole macro is so nice and straightforward compared to ints!
Python/ceval.c
Outdated
double d = l OP r; \ | ||
Py_DECREF(rhs); \ | ||
STACK_SHRINK(1); \ | ||
if (Py_REFCNT(lhs) == 1) { \ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would there be any benefit in checking whether perhaps rhs has a refcount of 1, in case lhs doesn't? While people are probably more likely to write return x+1
rather than return 1+x
(though even I could sometimes see a reason to do that, from "rhyming" with some other operations or just how the textbook presents the formula), writing return 2*x
is more likely than return x*2
. And of course for asymmetric operations you may not have a choice, like return 1/x
.
Of course, there's a cost (more code, more branches). There's also the fact that +=
and friends steer this in the direction you're taking here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Of course, there's a cost (more code, more branches). There's also the fact that
+=
and friends steer this in the direction you're taking here.
Yeah, those are my main motivations for sticking with the LHS here.
Okay, I just pushed variants of This gets us back up to a 1% improvement:
|
Seems you have a merge conflict. Also, Mark should probably re-review this? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not a fan of the big macro. Otherwise, looks good though.
@@ -857,6 +857,43 @@ static const binaryfunc binary_ops[] = { | |||
[NB_INPLACE_XOR] = PyNumber_InPlaceXor, | |||
}; | |||
|
|||
// NOTE: INPLACE must be a compile-time constant to avoid runtime branching! | |||
#define BINARY_OP_FAST_FLOAT(OP, INPLACE) \ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not a fan of giant macros and I don't like the hidden compile time control flow.
Given that the inplace and non-inplace forms specialized forms differ in a fundamental way, I'd like that to be explicit in the instructions themselves.
Maybe have two different macros, one for the inplace form and one for the non-inplace form.
You should be able to factor out some of the common code into helper macros.
That might be clearer. Up to you.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Makes sense. I'll go ahead and split it up.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This has merge conflicts now.
When you're done making the requested changes, leave the comment: |
This modifies the existing
int andfloat
specializations to mutate the LHS operand in-place when possible. It also unifies the implementations of all of these operations.https://bugs.python.org/issue46372