Skip to content

PERF: Index.__getitem__ performance issue #6370

Closed
@immerrr

Description

@immerrr

Once again, caused by #6328 investigation.

There's something very strange with how Index objects handle slices:

In [1]: import pandas.util.testing as tm

In [2]: idx = tm.makeStringIndex(1000000)

In [3]: timeit idx[:-1]
100000 loops, best of 3: 2 µs per loop

In [4]: timeit idx[slice(None,-1)]
100 loops, best of 3: 6.5 ms per loop

Obviously, this happens because Index doesn't override __getslice__ provided by ndarray, hence idx[:-1] is executed via ndarray.__getslice__ -> Index.__array_finalize__ and idx[slice(None, -1)] goes via Index.__getitem__ -> Index.__new__.

__getitem__ is made 1000x slower trying to infer slice data type and convert it to a different subclass. The problem is that interactive invocation idx[:-1], which is when that milliseconds-vs-microseconds issue doesn't matter, is likely to miss this feature, because it's dispatched via __getslice__ . But for programmatic invocation idx[slice(None, -1)] which hits this soft spot, I'd argue that this type conversion magic is not at all necessary.

Is there a rationale behind this?

Metadata

Metadata

Assignees

No one assigned

    Labels

    IndexingRelated to indexing on series/frames, not to indexes themselvesPerformanceMemory or execution speed performance

    Type

    No type

    Projects

    No projects

    Milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions