Description
Code Sample, a copy-pastable example if possible
stations = ['Kings Cross Station', 'Newtown Station', 'Parramatta Station',
'Town Hall Station', 'Central Station', 'Circular Quay Station',
'Martin Place Station', 'Museum Station', 'St James Station',
'Bondi Junction Station', 'North Sydney Station']
df = pd.DataFrame({'Station': ['Kings Cross Station', 'Newtown Station', 'Parramatta Station',
'Kings Cross Station', 'Newtown Station', 'Parramatta Station',
'Kings Cross Station', 'Newtown Station', 'Parramatta Station'],
'Date': pd.DatetimeIndex(['1/1/2017', '1/1/2017', '1/1/2017',
'2/1/2017', '2/1/2017', '2/1/2017',
'3/1/2017', '3/1/2017', '3/1/2017',]),
'Exit': range(0, 9)})
df.Station = df.Station.astype('category', ordered=True, categories=stations)
df.pivot_table(index = 'Date', columns= 'Station', values = 'Exit', dropna=True)
Problem description
When the column is a Categorical the output of pivot_table
- Includes columns that are all NaN which should not be the case as dropna=True
- Includes columns for categories that aren't in the input Data Frame, which is strange.
- Is not the same as when the output before the column is converted to a categorical
This was not the behaviour in earlier versions (18?)
Expected Output
Should be the same as the output when the column is not a categorical
Station Kings Cross Station Newtown Station Parramatta Station
Date
2017-01-01 0 1 2
2017-02-01 3 4 5
2017-03-01 6 7 8
Output of pd.show_versions()
pandas: 0.19.2
nose: 1.3.7
pip: 9.0.1
setuptools: 27.2.0
Cython: 0.25.2
numpy: 1.11.3
scipy: 0.18.1
statsmodels: 0.6.1
xarray: None
IPython: 5.1.0
sphinx: 1.5.1
patsy: 0.4.1
dateutil: 2.6.0
pytz: 2016.10
blosc: None
bottleneck: 1.2.0
tables: 3.3.0
numexpr: 2.6.1
matplotlib: 1.5.3
openpyxl: 2.4.1
xlrd: 1.0.0
xlwt: 1.2.0
xlsxwriter: 0.9.6
lxml: 3.7.2
bs4: 4.5.3
html5lib: None
httplib2: None
apiclient: None
sqlalchemy: 1.1.4
pymysql: None
psycopg2: 2.6.2 (dt dec pq3 ext lo64)
jinja2: 2.9.4
boto: 2.45.0
pandas_datareader: None