Skip to content

BUG: odd transform behaviour with integers #7972

Closed
@dsm054

Description

@dsm054

After grouping on an integer column, we seem to forget that we have groups:

>>> pd.__version__
'0.14.1-172-gab64d58'
>>> x = np.arange(6, dtype=np.int64)
>>> df = pd.DataFrame({"a": x//2, "b": 2.0*x, "c": 3.0*x})
>>> df
   a   b   c
0  0   0   0
1  0   2   3
2  1   4   6
3  1   6   9
4  2   8  12
5  2  10  15
>>> df.groupby("a").transform("mean")
    b     c
0   1   1.5
1   5   7.5
2   9  13.5
3 NaN   NaN
4 NaN   NaN
5 NaN   NaN
>>> df["a"] = df["a"]*1.0
>>> df.groupby("a").transform("mean")
   b     c
0  1   1.5
1  1   1.5
2  5   7.5
3  5   7.5
4  9  13.5
5  9  13.5

To make it even more obvious:

>>> df.index = range(20, 26)
>>> df.groupby("a").transform("mean")
     b     c
0    1   1.5
1    5   7.5
2    9  13.5
20 NaN   NaN
21 NaN   NaN
22 NaN   NaN
23 NaN   NaN
24 NaN   NaN
25 NaN   NaN

Switching to a float index seems to avoid the issues as well.

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions