Skip to content

BUG: iloc fails with non lex-sorted MultiIndex #13797

Closed
@ygriku

Description

@ygriku

Code Sample, a copy-pastable example if possible

import pandas as pd
import numpy as np
ind = [
        ['AA','AA','AA','BB','BB'],
        ['A' ,'B' ,'B' ,'a' ,'b']
] 
ind_nonLex = [
        ['CC','CC','CC','BB','BB'],
        ['A' ,'B' ,'B' ,'a' ,'b']
] 

strCol=pd.DataFrame([['fooA'],['fooB'],['fooC'],['fooD'],['fooE']])

dat=np.arange(1,26).reshape(5,5)
df=pd.concat([strCol, pd.DataFrame(dat)], axis=1)
df1=pd.DataFrame(df.values, index=ind_nonLex)
df2=pd.DataFrame(df.values, index=ind)

df1 (whose index is not lex sorted) fails with iloc access:

>>> df1.iloc[0,0]
C:\Users\rikuhiro\Anaconda3\envs\pd-check\lib\site-packages\ipykernel\__main__.py:1: PerformanceWarning: indexing past lexsort depth may impact performance.
  if __name__ == '__main__':
---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
<ipython-input-37-884ae2904642> in <module>()
----> 1 df1.iloc[0,0]

C:\Users\rikuhiro\Anaconda3\envs\pd-check\lib\site-packages\pandas\core\indexing.py in __getitem__(self, key)
   1292 
   1293         if type(key) is tuple:
-> 1294             return self._getitem_tuple(key)
   1295         else:
   1296             return self._getitem_axis(key, axis=0)

C:\Users\rikuhiro\Anaconda3\envs\pd-check\lib\site-packages\pandas\core\indexing.py in _getitem_tuple(self, tup)
   1561 
   1562             # if the dim was reduced, then pass a lower-dim the next time
-> 1563             if retval.ndim < self.ndim:
   1564                 axis -= 1
   1565 

AttributeError: 'str' object has no attribute 'ndim'

Expected Output

df2 (whose index is lex sorted) works as expected:

>>> df2.iloc[0,0]
'fooA'

This attributeError does not occur when the DataFrame.values consist of numpy objects (e.g. numpy.int32) because they have the ndim attribute. (Although the performance warning remains, it may be another issue).

I found that the addition of an if statement can remedy this in pandas/core/indexing.py. This just makes the _getitem_tuple(self, tup) be aware of objects without the ndim attribute, as _getitem_nested_tuple(self, tup) is (I will prepare a pull request if it is helpful.)

 @@ -1569,6 +1569,10 @@ def _getitem_tuple(self, tup):
              retval = getattr(retval, self.name)._getitem_axis(key, axis=axis)

 +            # if we have a scalar, we are done
 +            if lib.isscalar(retval) or not hasattr(retval, 'ndim'):
 +                break
 +
              # if the dim was reduced, then pass a lower-dim the next time
              if retval.ndim < self.ndim:
                  axis -= 1

output of pd.show_versions()

>>> pd.show_versions()
INSTALLED VERSIONS
------------------
commit: None
python: 3.5.2.final.0
python-bits: 64
OS: Windows
OS-release: 10
machine: AMD64
processor: Intel64 Family 6 Model 58 Stepping 9, GenuineIntel
byteorder: little
LC_ALL: None
LANG: None

pandas: 0.18.1
nose: None
pip: 8.1.2
setuptools: 23.0.0
Cython: None
numpy: 1.11.1
scipy: None
statsmodels: None
xarray: None
IPython: 5.0.0
sphinx: None
patsy: None
dateutil: 2.5.3
pytz: 2016.6.1
blosc: None
bottleneck: None
tables: None
numexpr: None
matplotlib: None
openpyxl: None
xlrd: None
xlwt: None
xlsxwriter: None
lxml: None
bs4: None
html5lib: None
httplib2: None
apiclient: None
sqlalchemy: None
pymysql: None
psycopg2: None
jinja2: 2.8
boto: None
pandas_datareader: None

Metadata

Metadata

Assignees

No one assigned

    Labels

    BugIndexingRelated to indexing on series/frames, not to indexes themselvesMultiIndex

    Type

    No type

    Projects

    No projects

    Milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions