PERF/API: fast paths for product MultiIndex?

#### Feature Proposal

At the moment, we have a few different methods for storing indexed higher-dimensional arrays:
- DataFrame/Series with a product MultiIndex (**PMI**), which can be slow
- Panel, which is fast, but less flexible, and may be deprecated soon (#13563)
- xarray, which is designed for this, but has a smaller API

For some datasets, I've found the PMI to be the best option, together with occasional workarounds for performance bottlenecks. Operations which are slow for a general MultiIndex, like `unstack()` or `swaplevel().sortlevel()`, can be sped up for PMIs (see below).

It would be great if we could do something like this more generally, with fast paths for PMIs. We could maybe have `MultiIndex.from_product()` return a PMI object, which would upcast to MultiIndex when necessary. We could also have `stack()` and `unstack()` create PMI objects where possible, and perhaps add an argument to `concat()` and `set_index()` to create PMIs. Slow MultiIndex operations could then have a fast path for PMI objects.

#### Code Sample

```python
import numpy as np
import pandas as pd

m = 100
n = 1000

levels = np.arange(m)
index = pd.MultiIndex.from_product([levels]*2)
columns = np.arange(n)
values = np.arange(m*m*n).reshape(m*m, n)
df = pd.DataFrame(values, index, columns)

# >>> timeit slow_unstack()
# 1 loop, best of 3: 363 ms per loop
def slow_unstack():
    return df.unstack()

# >>> timeit fast_unstack()
# 10 loops, best of 3: 55 ms per loop
def fast_unstack():
    columns = pd.MultiIndex.from_product([df.columns, levels])
    values = df.values.reshape(m, m, n).swapaxes(1, 2).reshape(m, m*n)
    return pd.DataFrame(values, levels, columns)

# >>> timeit slow_swaplevel_sortlevel()
# 1 loop, best of 3: 213 ms per loop
def slow_swaplevel_sortlevel():
    return df.swaplevel().sortlevel()

# >>> timeit fast_swaplevel_sortlevel()
# 10 loops, best of 3: 38.7 ms per loop
def fast_swaplevel_sortlevel():
    values = df.values.reshape(m, m, n).swapaxes(0, 1).reshape(m*m, n)
    index = df.index.swaplevel().sortlevel()[0]
    return pd.DataFrame(values, index, df.columns)

```

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

PERF/API: fast paths for product MultiIndex? #15503

Feature Proposal

Code Sample

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Uh oh!

PERF/API: fast paths for product MultiIndex? #15503

Description

Feature Proposal

Code Sample

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions