Description
This question (I am not sure if this is a bug) is still on StackOverflow
https://stackoverflow.com/q/68526846/4865723
I was asked there to open an Issue about it here.
The result of .agg(list, axis=1)
changed since pandas Version 1.3.0. The goal of my question is to understand what changed and why; and of course how to solve this.
#!/usr/bin/env python3
import pandas as pd
import numpy as np
print(pd.__version__)
df = pd.DataFrame(
{
'PERSON': ['Maya', 'Maya', 'Jim', 'Jim'],
'DAY': ['2016-01-14', '2016-01-14', '2016-02-21', '2016-02-21'],
'FOO': [12, 12, 9, 7],
'BAR': range(4)
}
)
print(df)
res = df.loc[:, ['FOO', 'BAR']].agg(list, axis=1)
print(res)
This is the result with the last pre-1.30 version of Pandas. The two selected columns are "joined" into a list.
1.2.5
PERSON DAY FOO BAR
0 Maya 2016-01-14 12 0
1 Maya 2016-01-14 12 1
2 Jim 2016-02-21 9 2
3 Jim 2016-02-21 7 3
0 [12, 0]
1 [12, 1]
2 [9, 2]
3 [7, 3]
dtype: object
>>>
But since pandas 1.3.0 the result is.
FOO BAR
0 12 0
1 12 1
2 9 2
3 7 3
I looked into the changelog of pandas 1.3.0. There is nothing about agg()
but a lot about apply()
and transform()
. But I do not understand the details and I can see which one of this points is related to my situation.
I am sure the pandas devs have a good reason to change this behavior. When I understand the background of that decision I am maybe able to find a solution for it.
Sideinfo: In the final productive code I like to do things like this
df['NEW_COLUMN'] = df.loc[:, ['FOO', 'BAR']].agg(list, axis=1)