Skip to content

Inconsistent output when using integer labels in multiindex on both column and index #14969

Open
@relativistic

Description

@relativistic

Description of problem

Forgive me if I'm missing a sublety when using integers for multiindexing, but I seem to be getting inconsistent behavior when using multiindexing. Using loc to index both column and index simultaneously doesn't always give the same result. This seems to depend on the datatype of the innermost index.

Example of the expected behavior

The following example works as I'd expect, giving me a dataframe representing the (0,0) label for the outermost index level:

>>>ind = pd.MultiIndex.from_product([[0,1],['A','B','C','D','E']])
>>>df = pd.DataFrame(np.random.rand(10,10), index=ind, columns=ind)
>>>print(df.loc[0,0])

          A         B         C         D         E
A  0.392093  0.167340  0.292854  0.138955  0.575715
B  0.495728  0.062870  0.733270  0.889761  0.141171
C  0.973444  0.518498  0.648546  0.448096  0.383729
D  0.987809  0.697177  0.601228  0.094184  0.986927
E  0.950939  0.109866  0.151390  0.173802  0.855105

Example of the unexpected behavior

However, if I change the second index level dataype to, for example, floats or ints, loc uses positional indexing rather than label based indexing for the second label. Thus, the same syntax returns a series of a single column, rather than a dataframe.

>>>ind = pd.MultiIndex.from_product([[0,1],np.linspace(0,1,5)])
>>>df = pd.DataFrame(np.random.rand(10,10), index=ind, columns=ind)
>>>print(df.loc[0,0])
0  0.00    0.666874
   0.25    0.023773
   0.50    0.799715
   0.75    0.752675
   1.00    0.935531
1  0.00    0.510080
   0.25    0.845125
   0.50    0.410635
   0.75    0.067144
   1.00    0.658522

Problem description

The problem is that the output is inconsistent. My code breaks depending upon the datatypes used for the indices in a non-obvious way. I would expect things to work as in my first example, with the str dtype used for the second index level. At a minimum, I'd prefer it if the behavior was consistent, regardless of the datatype of the second index level.

Output of pd.show_versions()

# Paste the output here pd.show_versions() here INSTALLED VERSIONS ------------------ commit: None python: 2.7.9.final.0 python-bits: 64 OS: Darwin OS-release: 15.6.0 machine: x86_64 processor: i386 byteorder: little LC_ALL: None LANG: en_US.UTF-8 LOCALE: None.None

pandas: 0.19.1
nose: 1.3.7
pip: 9.0.1
setuptools: 23.1.0
Cython: 0.24
numpy: 1.10.4
scipy: 0.18.1
statsmodels: 0.6.1
xarray: 0.7.2
IPython: 4.1.2
sphinx: 1.4
patsy: 0.4.1
dateutil: 2.6.0
pytz: 2016.7
blosc: None
bottleneck: None
tables: 3.2.2
numexpr: 2.6.0
matplotlib: 1.5.1
openpyxl: None
xlrd: 0.9.4
xlwt: 1.0.0
xlsxwriter: 0.8.4
lxml: None
bs4: 4.4.1
html5lib: 0.999
httplib2: None
apiclient: None
sqlalchemy: 1.0.12
pymysql: None
psycopg2: None
jinja2: 2.8
boto: None
pandas_datareader: None

Metadata

Metadata

Assignees

No one assigned

    Labels

    DocsIndexingRelated to indexing on series/frames, not to indexes themselvesMultiIndex

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions