-
-
Notifications
You must be signed in to change notification settings - Fork 18.6k
ENH: Add engine="numba" to groupby mean #43731
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
mroeschke
commented
Sep 24, 2021
- tests added / passed
- Ensure all linting tests pass, see here for how to run them
- whatsnew entry
result = result.ravel() | ||
else: | ||
result_kwargs = {"columns": data.columns} | ||
return data._constructor(result, index=index, **result_kwargs) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
need to be careful about 1) self.axis, 2) self.as_index, 3) self.observed. should be able to re-use one of the existing result-wrapping methods to handle all these appropriately
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fair point. Do you have any suggestions on which result-wrapping method I should use where this method passes a 1 or 2D numpy array?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
none of them take ndarrays directly. my suggestion would be to make the numba function into a blk_func that gets passed to mgr.grouped_reduce, and then gets wrapped with _wrap_agged_manager
@@ -1781,6 +1818,23 @@ def mean(self, numeric_only: bool | lib.NoDefault = lib.no_default): | |||
Include only float, int, boolean columns. If None, will attempt to use | |||
everything, then use only numeric data. | |||
|
|||
engine : str, default None |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
im not wild about adding to the API. is there a viable way to just detect if numba is available and if so always use it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybe_use_numba
can detect if the user has specified the numba option as a config
option.
Though, these keyword options are consistent with the reduction methods on windows ops and my plan to also expose this regular DataFrame/Series
reduction methods as well.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OT: should make this into a template that can append rather than copy-pasting
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Though, these keyword options are consistent with the reduction methods on windows ops and my plan to also expose this regular DataFrame/Series reduction methods as well.
mm ok i guess. longer-term it'd be nice to get back to the simpler API
@@ -1781,6 +1818,23 @@ def mean(self, numeric_only: bool | lib.NoDefault = lib.no_default): | |||
Include only float, int, boolean columns. If None, will attempt to use | |||
everything, then use only numeric data. | |||
|
|||
engine : str, default None |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OT: should make this into a template that can append rather than copy-pasting
expected = ser.groupby(level=0, sort=sort).mean() | ||
tm.assert_series_equal(result, expected) | ||
|
||
def test_as_index_false_unsupported(self): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jreback will make follow up issues about supporting these options
Ready for review
rebase to make sure typing is ok (merged the typing numba PR). |
thanks @mroeschke |