Skip to content

ENH: Create a nlargest and nsmallest methods for pandas.core.groupby.DataFrameGroupBy objects #23993

Closed
@mcarbajo

Description

@mcarbajo

Code Sample, a copy-pastable example if possible

Current method

import pandas as pd

dataframe = pd.DataFrame({'col1': list('aaabbbcc'), 'col2': range(8), 'col3': list('whatever')})
dataframe.groupby('col1').apply(lambda df: df.nlargest(2, 'col2'))

Desired method

import pandas as pd

dataframe = pd.DataFrame({'col1': list('aaabbbcc'), 'col2': range(8), 'col3': list('whatever')})
dataframe.groupby('col1').nlargest(2, 'col2')

Problem description

I want to select the first 2 entries of each group ordered by a specific column. I can easily get these values if I only want to keep the column used for the ordering (see code below) but it gets way less efficient and not as clear when I want to keep all the columns of the DataFrame.

import pandas as pd

dataframe = pd.DataFrame({'col1': list('aaabbbcc'), 'col2': range(8), 'col3': list('whatever')})
dataframe.groupby('col1').col2.nlargest(2)

Metadata

Metadata

Assignees

No one assigned

    Labels

    GroupbyNeeds InfoClarification about behavior needed to assess issue

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions