Closed
Description
Code Sample, a copy-pastable example if possible
import pandas as pd
df = pd.DataFrame({'a': ['x','x','y'], 'b': ['a','b','a'], 'c': [7,8,9]})
df['a'] = df['a'].astype('category')
df['b'] = df['b'].astype('category')
result1 = df.groupby(['a','b']).c.agg('sum')
result2 = df.groupby(['a','b']).agg('sum')
Problem description
The calculated result1
and result2
DataFrames are different.
Result1:
a b
x a 7
b 8
y a 9
Name: c, dtype: int64
Result2
c
a b
x a 7.0
b 8.0
y a 9.0
b NaN
Expected Output
I expect that both results have 4 rows, as the observed
option is False
by default.
Output of pd.show_versions()
INSTALLED VERSIONS
------------------
commit: None
python: 3.6.6.final.0
python-bits: 64
OS: Darwin
OS-release: 17.7.0
machine: x86_64
processor: i386
byteorder: little
LC_ALL: None
LANG: None
LOCALE: None.None
pandas: 0.23.4
pytest: None
pip: 18.1
setuptools: 40.6.3
Cython: None
numpy: 1.16.0
scipy: None
pyarrow: None
xarray: None
IPython: None
sphinx: None
patsy: None
dateutil: 2.7.5
pytz: 2018.7
blosc: None
bottleneck: None
tables: None
numexpr: None
feather: None
matplotlib: None
openpyxl: None
xlrd: None
xlwt: None
xlsxwriter: None
lxml: None
bs4: None
html5lib: None
sqlalchemy: None
pymysql: None
psycopg2: None
jinja2: 2.10
s3fs: None
fastparquet: None
pandas_gbq: None
pandas_datareader: None