Skip to content

groupby filtering is missing some groups #7870

Closed
@phobson

Description

@phobson

I brought this up on the mailing list. @cpcloud modified my example into very concise sample showing expected and resulting output:

In [20]: df  = pd.DataFrame([
    ['best', 'a', 'x'],
    ['worst', 'b', 'y'],
    ['best', 'c', 'x'],
    ['best','d', 'y'],
    ['worst','d', 'y'],
    ['worst','d', 'y'],
    ['best','d', 'z'],
], columns=['a', 'b', 'c'])

In [21]: pd.concat(v[v.a == 'best'] for _, v in df.groupby('c'))
Out[21]:
      a  b  c
0  best  a  x
2  best  c  x
3  best  d  y # <--- missing from the next statement
6  best  d  z

In [22]: df.groupby('c').filter(lambda g: g.a == 'best')
Out[22]:
      a  b  c
0  best  a  x
2  best  c  x
6  best  d  z

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions