Skip to content

get_loc() returns integer or slice or KeyError nondeterministic in multiindex data frame #6501

Closed
@colinfang

Description

@colinfang

See example, if n is big, get_loc returns slice, otherwise it returns an integer. The boundary of n being big changes from time to time (but frequently 25 or 50).
http://stackoverflow.com/questions/22067205/when-does-pandas-xs-drop-dimensions-and-how-can-i-force-it-to-not-to

n=23
df = pd.DataFrame({'a':np.append(np.random.randint(0,10,n), -1),
                   'b':np.append(np.random.randint(0,10,n), -1),
                   'c':np.append(np.random.randint(0,10,n), -1),
                   'value':np.random.randint(0,100,n+1)})

df.set_index(['a','b','c'], inplace=True)
df.sortlevel(inplace = True)

#display(df.xs((-1,-1,-1)))
df.index.get_loc((-1,-1,-1))

The directly consequence is, xs would now returns a Series or a Data Frame (even if there is only 1 match) nondeterministicly (up to whether an integer or a slice is returned from get_loc )

What more, if the key is not in the indices, get_loc would sometimes throw KeyError exception, sometimes returnsSlice(0,0,None)

Try df.index.get_loc((-2,-1,-1)) more times and you will see. I suspect it depends on whether there are duplicate values in the multiindex.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions