Skip to content

BUG: Passing multiple levels to stack when having mixed integer/string level names #8584

Closed
@jorisvandenbossche

Description

@jorisvandenbossche

Related #7770

Using the example of the docs (http://pandas.pydata.org/pandas-docs/stable/reshaping.html#multiple-levels):

columns = MultiIndex.from_tuples([('A', 'cat', 'long'), ('B', 'cat', 'long'), ('A', 'dog', 'short'), ('B', 'dog', 'short')], 
                                 names=['exp', 'animal', 'hair_length'])
df = DataFrame(randn(4, 4), columns=columns)

CONTEXT: df.stack(level=['animal', 'hair_length']) and df.stack(level=[1, 2]) are equivalent (feature introduced in #7770). Mixing integers location and string names (eg df.stack(level=['animal', 2])) gives a ValueError.

But if you have level names of mixed types, some different (and wrong things) happen:

  • With a total different number, it still works as it should:

    df.columns.names = ['exp', 'animal', 10]
    df.stack(level=['animal', 10])
    
  • With the number 1, it treats the 1 as a level number instead of the level name, leading to a wrong result (two times the same level unstacked):

    In [42]: df.columns.names = ['exp', 'animal', 1]
    
    In [43]: df.stack(level=['animal', 1])
    Out[43]: 
    exp                     A         B
      animal animal                    
    0 cat    cat    -1.006065  0.401136
      dog    dog     0.526734 -1.753478
    1 cat    cat    -0.718401 -0.400386
      dog    dog    -0.951336 -1.074323
    2 cat    cat     1.119843 -0.606982
      dog    dog     0.371467 -1.837341
    3 cat    cat    -1.467968  1.114524
      dog    dog    -0.040112  0.240026
    
  • With the number 0, it gives a strange error:

    In [46]: df.columns.names = ['exp', 'animal', 0]
    
    In [47]: df.stack(level=['animal', 0])
    ---------------------------------------------------------------------------
    KeyError                                  Traceback (most recent call last)
    <ipython-input-47-4e9507e0708f> in <module>()
    ----> 1 df.stack(level=['animal', 0])
    
    /home/joris/scipy/pandas/pandas/core/frame.pyc in stack(self, level, dropna)
    3390 
    3391         if isinstance(level, (tuple, list)):
    -> 3392             return stack_multiple(self, level, dropna=dropna)
    3393         else:
    3394             return stack(self, level, dropna=dropna)
    
    ....
    
    /home/joris/scipy/pandas/pandas/core/index.pyc in _partial_tup_index(self, tup, side)
    3820             raise KeyError('Key length (%d) was greater than MultiIndex'
    3821                            ' lexsort depth (%d)' %
    -> 3822                            (len(tup), self.lexsort_depth))
    3823 
    3824         n = len(tup)
    
    KeyError: 'Key length (2) was greater than MultiIndex lexsort depth (0)'
    

Metadata

Metadata

Assignees

No one assigned

    Labels

    API DesignReshapingConcat, Merge/Join, Stack/Unstack, Explode

    Type

    No type

    Projects

    No projects

    Milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions