Skip to content

ER/DOC: Sorting in multi-index columns: misleading error message, unclear docs #4370

Closed
@jgehrcke

Description

@jgehrcke

related #739

Have a look at this example:

import pandas as pd
import numpy as np
from StringIO import StringIO
print "Pandas version %s\n\n" % pd.__version__

data1 = """idx,metric
0,2.1
1,2.5
2,3"""

data2 = """idx,metric
0,2.7
1,2.2
2,2.8"""

df1 = pd.read_csv(StringIO(data1))
df2 = pd.read_csv(StringIO(data2))
concatenated = pd.concat([df1, df2], ignore_index=True)
merged = concatenated.groupby("idx").agg([np.mean, np.std])

print merged
print merged.sort('metric')

and its output:

$ python test.py 
Pandas version 0.11.0


     metric          
       mean       std
idx                  
0      2.40  0.424264
1      2.35  0.212132
2      2.90  0.141421
Traceback (most recent call last):
  File "test.py", line 22, in <module>
    print merged.sort('metric')
  File "/***/Python-2.7.3/lib/python2.7/site-packages/pandas/core/frame.py", line 3098, in sort
    inplace=inplace)
  File "/***/Python-2.7.3/lib/python2.7/site-packages/pandas/core/frame.py", line 3153, in sort_index
    % str(by))
ValueError: Cannot sort by duplicate column metric

The problem here is not that there is a duplicate column metric as stated by the error message. The problem is that there are still two sub-levels. The solution in this case is to use

merged.sort([('metric', 'mean')])

for sorting by the mean of the metric. It took myself quite a while to figure this out. First of all, the error message should be more clear in this case. Then, maybe I was too stupid, but I could not find the solution in the docs, but within a thread on StackOverflow. Looks like the error message above is the result of an over-generalized condition around https://github.com/pydata/pandas/blob/v0.12.0rc1/pandas/core/frame.py#L3269

Metadata

Metadata

Assignees

No one assigned

    Labels

    DocsError ReportingIncorrect or improved errors from pandas

    Type

    No type

    Projects

    No projects

    Milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions