Skip to content

ENH: Add engine="numba" to groupby mean #43731

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 30 commits into from
Nov 5, 2021

Conversation

mroeschke
Copy link
Member

  • tests added / passed
  • Ensure all linting tests pass, see here for how to run them
  • whatsnew entry

@mroeschke mroeschke changed the title ENH: Add engine="numba" to groupby mean WIP: ENH: Add engine="numba" to groupby mean Sep 24, 2021
result = result.ravel()
else:
result_kwargs = {"columns": data.columns}
return data._constructor(result, index=index, **result_kwargs)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

need to be careful about 1) self.axis, 2) self.as_index, 3) self.observed. should be able to re-use one of the existing result-wrapping methods to handle all these appropriately

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fair point. Do you have any suggestions on which result-wrapping method I should use where this method passes a 1 or 2D numpy array?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

none of them take ndarrays directly. my suggestion would be to make the numba function into a blk_func that gets passed to mgr.grouped_reduce, and then gets wrapped with _wrap_agged_manager

@@ -1781,6 +1818,23 @@ def mean(self, numeric_only: bool | lib.NoDefault = lib.no_default):
Include only float, int, boolean columns. If None, will attempt to use
everything, then use only numeric data.

engine : str, default None
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

im not wild about adding to the API. is there a viable way to just detect if numba is available and if so always use it?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe_use_numba can detect if the user has specified the numba option as a config option.

Though, these keyword options are consistent with the reduction methods on windows ops and my plan to also expose this regular DataFrame/Series reduction methods as well.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OT: should make this into a template that can append rather than copy-pasting

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Though, these keyword options are consistent with the reduction methods on windows ops and my plan to also expose this regular DataFrame/Series reduction methods as well.

mm ok i guess. longer-term it'd be nice to get back to the simpler API

@jreback jreback added this to the 1.4 milestone Sep 25, 2021
@@ -1781,6 +1818,23 @@ def mean(self, numeric_only: bool | lib.NoDefault = lib.no_default):
Include only float, int, boolean columns. If None, will attempt to use
everything, then use only numeric data.

engine : str, default None
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OT: should make this into a template that can append rather than copy-pasting

@mroeschke mroeschke changed the title WIP: ENH: Add engine="numba" to groupby mean ENH: Add engine="numba" to groupby mean Oct 19, 2021
expected = ser.groupby(level=0, sort=sort).mean()
tm.assert_series_equal(result, expected)

def test_as_index_false_unsupported(self):
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jreback will make follow up issues about supporting these options

Ready for review

@jreback
Copy link
Contributor

jreback commented Oct 31, 2021

rebase to make sure typing is ok (merged the typing numba PR).

@jreback jreback merged commit e87ad05 into pandas-dev:master Nov 5, 2021
@jreback
Copy link
Contributor

jreback commented Nov 5, 2021

thanks @mroeschke

@mroeschke mroeschke deleted the enh/groupby_numba_mean branch November 5, 2021 02:47
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants