Skip to content

DOC: pivot_table - fix documentation of aggfunc parameter #18712

Closed
@stefansimik

Description

@stefansimik

Code Sample, a copy-pastable example if possible

In [30]: df = pd.DataFrame({'random1': [random.random() for i in range(10)],
    ...:                        'random2': [random.random() for i in range(10)],
    ...:                        'type': ['duck', 'bird']*5},
    ...:                 index=range(10,20))
    ...: 
    ...: df.pivot_table(index='type', 
    ...:                    aggfunc={'random1': [np.median, np.mean], 
    ...:                             'random2': np.sum})
    ...: 
Out[30]: 
       random1             random2
          mean    median       sum
type                              
bird  0.420249  0.428048  1.869603
duck  0.422977  0.518311  3.395530
import pandas as pd
from pandas.api.types import CategoricalDtype
import numpy as np

# Load data
df = pd.read_excel('http://pbpython.com/extras/sales-funnel.xlsx')
# Make categories
df["Status"] = pd.Categorical(df["Status"], categories=["won","pending","presented","declined"], ordered=False)

# Create pivot table
pd.pivot_table(df, index=['Manager', 'Status'], 
               values=['Quantity', 'Price'], 
               aggfunc={'Quantity': len, 'Price': [np.sum, np.mean]}, 
               fill_value=0)

Problem description

Documentation for pivot_table method and aggfunc parameter
reports, that valid inputs are:

  • function or
  • list of functions

It misses option, that also dictionary can be used, which is one of the very useful options.

This missing knowledge causes misleading posts on stackoverflow like here:

https://stackoverflow.com/questions/34193862/pandas-pivot-table-list-of-aggfunc

Best answer claims:

The aggfunc argument of pivot_table takes a function or list of functions but not dict

That is not true, as dict is valid input into aggfunc parameter - but as it is not documented,
people believe it is invalid input. Incomplete documetation causes confusion here and should be updated.

Output of pd.show_versions()

INSTALLED VERSIONS ------------------ commit: None python: 3.6.3.final.0 python-bits: 64 OS: Windows OS-release: 10 machine: AMD64 processor: Intel64 Family 6 Model 94 Stepping 3, GenuineIntel byteorder: little LC_ALL: None LANG: None LOCALE: None.None

pandas: 0.21.0
pytest: 3.3.0
pip: 9.0.1
setuptools: 36.5.0.post20170921
Cython: 0.27.3
numpy: 1.13.3
scipy: 1.0.0
pyarrow: None
xarray: None
IPython: 6.2.1
sphinx: 1.6.3
patsy: 0.4.1
dateutil: 2.6.1
pytz: 2017.3
blosc: None
bottleneck: 1.2.1
tables: 3.4.2
numexpr: 2.6.4
feather: None
matplotlib: 2.1.0
openpyxl: 2.4.9
xlrd: 1.1.0
xlwt: 1.3.0
xlsxwriter: 1.0.2
lxml: 4.1.1
bs4: 4.6.0
html5lib: 0.999999999
sqlalchemy: 1.1.13
pymysql: None
psycopg2: None
jinja2: 2.10
s3fs: None
fastparquet: None
pandas_gbq: None
pandas_datareader: None

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions