ENH: Add engine="numba" to groupby mean #43731

mroeschke · 2021-09-24T05:49:53Z

tests added / passed
Ensure all linting tests pass, see here for how to run them
whatsnew entry

…_mean

jbrockmendel · 2021-09-24T15:52:22Z

pandas/core/groupby/groupby.py

+            result = result.ravel()
+        else:
+            result_kwargs = {"columns": data.columns}
+        return data._constructor(result, index=index, **result_kwargs)


need to be careful about 1) self.axis, 2) self.as_index, 3) self.observed. should be able to re-use one of the existing result-wrapping methods to handle all these appropriately

Fair point. Do you have any suggestions on which result-wrapping method I should use where this method passes a 1 or 2D numpy array?

none of them take ndarrays directly. my suggestion would be to make the numba function into a blk_func that gets passed to mgr.grouped_reduce, and then gets wrapped with _wrap_agged_manager

jbrockmendel · 2021-09-24T22:19:50Z

pandas/core/groupby/groupby.py

@@ -1781,6 +1818,23 @@ def mean(self, numeric_only: bool | lib.NoDefault = lib.no_default):
            Include only float, int, boolean columns. If None, will attempt to use
            everything, then use only numeric data.

+        engine : str, default None


im not wild about adding to the API. is there a viable way to just detect if numba is available and if so always use it?

maybe_use_numba can detect if the user has specified the numba option as a config option.

Though, these keyword options are consistent with the reduction methods on windows ops and my plan to also expose this regular DataFrame/Series reduction methods as well.

OT: should make this into a template that can append rather than copy-pasting

Though, these keyword options are consistent with the reduction methods on windows ops and my plan to also expose this regular DataFrame/Series reduction methods as well.

mm ok i guess. longer-term it'd be nice to get back to the simpler API

jreback · 2021-09-25T23:52:53Z

pandas/core/groupby/groupby.py

@@ -1781,6 +1818,23 @@ def mean(self, numeric_only: bool | lib.NoDefault = lib.no_default):
            Include only float, int, boolean columns. If None, will attempt to use
            everything, then use only numeric data.

+        engine : str, default None


OT: should make this into a template that can append rather than copy-pasting

…_mean

mroeschke · 2021-10-20T04:14:58Z

pandas/tests/groupby/test_numba.py

+        expected = ser.groupby(level=0, sort=sort).mean()
+        tm.assert_series_equal(result, expected)
+
+    def test_as_index_false_unsupported(self):


@jreback will make follow up issues about supporting these options

Ready for review

…_mean

jreback · 2021-10-31T14:55:44Z

rebase to make sure typing is ok (merged the typing numba PR).

…_mean

jreback · 2021-11-05T00:59:17Z

thanks @mroeschke

mroeschke added 5 commits September 22, 2021 21:20

Add mean kernel

c63fc88

Start scaffolding

e365f97

Merge remote-tracking branch 'upstream/master' into enh/groupby_numba…

2f203c0

…_mean

Add tests, add whatsnew, finish internals

af92d6e

Add PR number

a6f0bce

mroeschke changed the title ~~ENH: Add engine="numba" to groupby mean~~ WIP: ENH: Add engine="numba" to groupby mean Sep 24, 2021

Fix cache string bug

59039e2

jbrockmendel reviewed Sep 24, 2021

View reviewed changes

jreback added the Groupby label Sep 25, 2021

jreback added this to the 1.4 milestone Sep 25, 2021

jreback requested changes Sep 25, 2021

View reviewed changes

Merge remote-tracking branch 'upstream/master' into enh/groupby_numba…

7058ede

…_mean

mroeschke mentioned this pull request Oct 5, 2021

API/DISC: engine="numba" (computational API) location in reductions functions or constructors #43886

Closed

mroeschke added 7 commits October 6, 2021 18:26

Merge remote-tracking branch 'upstream/master' into enh/groupby_numba…

265a9a7

…_mean

Refactor group_selection_context

48bb0c4

Merge remote-tracking branch 'upstream/master' into enh/groupby_numba…

939a197

…_mean

Merge remote-tracking branch 'upstream/master' into enh/groupby_numba…

a635e5b

…_mean

Merge remote-tracking branch 'upstream/master' into enh/groupby_numba…

77f5fa5

…_mean

Add test for sort

2ae20c2

Add unsupported as_index and axis=1

2c1b098

mroeschke changed the title ~~WIP: ENH: Add engine="numba" to groupby mean~~ ENH: Add engine="numba" to groupby mean Oct 19, 2021

Merge remote-tracking branch 'upstream/master' into enh/groupby_numba…

a85a957

…_mean

mroeschke commented Oct 20, 2021

View reviewed changes

mroeschke added 6 commits October 20, 2021 19:20

Merge remote-tracking branch 'upstream/master' into enh/groupby_numba…

e1a2fc6

…_mean

Merge remote-tracking branch 'upstream/master' into enh/groupby_numba…

cc84742

…_mean

Merge remote-tracking branch 'upstream/master' into enh/groupby_numba…

8104446

…_mean

Merge remote-tracking branch 'upstream/master' into enh/groupby_numba…

f734723

…_mean

Merge remote-tracking branch 'upstream/master' into enh/groupby_numba…

e5f52b2

…_mean

Merge remote-tracking branch 'upstream/master' into enh/groupby_numba…

8d7e9f9

…_mean

mroeschke added 4 commits October 26, 2021 18:03

Merge remote-tracking branch 'upstream/master' into enh/groupby_numba…

dd67784

…_mean

Merge remote-tracking branch 'upstream/master' into enh/groupby_numba…

62af178

…_mean

Merge remote-tracking branch 'upstream/master' into enh/groupby_numba…

5c45732

…_mean

Merge remote-tracking branch 'upstream/master' into enh/groupby_numba…

39cb774

…_mean

jreback approved these changes Oct 31, 2021

View reviewed changes

mroeschke added 5 commits October 31, 2021 12:08

Merge remote-tracking branch 'upstream/master' into enh/groupby_numba…

0309be3

…_mean

Merge remote-tracking branch 'upstream/master' into enh/groupby_numba…

32f0b63

…_mean

Merge remote-tracking branch 'upstream/master' into enh/groupby_numba…

07d8f5e

…_mean

Merge remote-tracking branch 'upstream/master' into enh/groupby_numba…

f31a35f

…_mean

Merge remote-tracking branch 'upstream/master' into enh/groupby_numba…

d51abf9

…_mean

jreback merged commit e87ad05 into pandas-dev:master Nov 5, 2021

mroeschke deleted the enh/groupby_numba_mean branch November 5, 2021 02:47

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

ENH: Add engine="numba" to groupby mean #43731

ENH: Add engine="numba" to groupby mean #43731

Uh oh!

mroeschke commented Sep 24, 2021

Uh oh!

jbrockmendel Sep 24, 2021

Uh oh!

mroeschke Sep 24, 2021

Uh oh!

jbrockmendel Sep 24, 2021

Uh oh!

jbrockmendel Sep 24, 2021

Uh oh!

mroeschke Sep 25, 2021

Uh oh!

jreback Sep 25, 2021

Uh oh!

jbrockmendel Sep 27, 2021

Uh oh!

jreback Sep 25, 2021

Uh oh!

mroeschke Oct 20, 2021

Uh oh!

jreback commented Oct 31, 2021

Uh oh!

jreback commented Nov 5, 2021

Uh oh!

Uh oh!

Uh oh!

ENH: Add engine="numba" to groupby mean #43731

ENH: Add engine="numba" to groupby mean #43731

Uh oh!

Conversation

mroeschke commented Sep 24, 2021

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

jreback commented Oct 31, 2021

Uh oh!

jreback commented Nov 5, 2021

Uh oh!

Uh oh!