Skip to content

pivot_table over Categorical columns #15193

Closed
@Kevin-McIsaac

Description

@Kevin-McIsaac

Code Sample, a copy-pastable example if possible

stations = ['Kings Cross Station', 'Newtown Station', 'Parramatta Station',
                 'Town Hall Station', 'Central Station', 'Circular Quay Station', 
                 'Martin Place Station', 'Museum Station', 'St James Station', 
                 'Bondi Junction Station', 'North Sydney Station']

df = pd.DataFrame({'Station': ['Kings Cross Station', 'Newtown Station', 'Parramatta Station',
                               'Kings Cross Station', 'Newtown Station', 'Parramatta Station',
                               'Kings Cross Station', 'Newtown Station', 'Parramatta Station'],
                   'Date': pd.DatetimeIndex(['1/1/2017', '1/1/2017', '1/1/2017',
                                             '2/1/2017', '2/1/2017', '2/1/2017',
                                             '3/1/2017', '3/1/2017', '3/1/2017',]),
                   'Exit': range(0, 9)})

df.Station = df.Station.astype('category', ordered=True, categories=stations)
df.pivot_table(index = 'Date', columns= 'Station', values = 'Exit', dropna=True)

Problem description

When the column is a Categorical the output of pivot_table

  1. Includes columns that are all NaN which should not be the case as dropna=True
  2. Includes columns for categories that aren't in the input Data Frame, which is strange.
  3. Is not the same as when the output before the column is converted to a categorical

This was not the behaviour in earlier versions (18?)

Expected Output

Should be the same as the output when the column is not a categorical

Station Kings Cross Station Newtown Station Parramatta Station
Date
2017-01-01 0 1 2
2017-02-01 3 4 5
2017-03-01 6 7 8

Output of pd.show_versions()

# Paste the output here pd.show_versions() here INSTALLED VERSIONS ------------------ commit: None python: 3.5.2.final.0 python-bits: 64 OS: Linux OS-release: 4.4.41-36.55.amzn1.x86_64 machine: x86_64 processor: x86_64 byteorder: little LC_ALL: None LANG: en_US.UTF-8 LOCALE: en_US.UTF-8

pandas: 0.19.2
nose: 1.3.7
pip: 9.0.1
setuptools: 27.2.0
Cython: 0.25.2
numpy: 1.11.3
scipy: 0.18.1
statsmodels: 0.6.1
xarray: None
IPython: 5.1.0
sphinx: 1.5.1
patsy: 0.4.1
dateutil: 2.6.0
pytz: 2016.10
blosc: None
bottleneck: 1.2.0
tables: 3.3.0
numexpr: 2.6.1
matplotlib: 1.5.3
openpyxl: 2.4.1
xlrd: 1.0.0
xlwt: 1.2.0
xlsxwriter: 0.9.6
lxml: 3.7.2
bs4: 4.5.3
html5lib: None
httplib2: None
apiclient: None
sqlalchemy: 1.1.4
pymysql: None
psycopg2: 2.6.2 (dt dec pq3 ext lo64)
jinja2: 2.9.4
boto: 2.45.0
pandas_datareader: None

Metadata

Metadata

Assignees

No one assigned

    Labels

    BugCategoricalCategorical Data TypeMissing-datanp.nan, pd.NaT, pd.NA, dropna, isnull, interpolateReshapingConcat, Merge/Join, Stack/Unstack, Explode

    Type

    No type

    Projects

    No projects

    Milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions