Skip to content

ENH: sparse_series_to_coo performance #42880

Closed
@TLouf

Description

@TLouf

Is your feature request related to a problem?

Converting a sparse Series to a scipy.sparse.coo_matrix could be much faster. I think the get_indexer function defined in _to_ijv adds unnecessary complexity.

Describe the solution you'd like

It can be much faster by accessing the codes attribute of the multiindex, as follows:

i_coord, j_coord = ss.index.codes
i_labels, j_labels = ss.index.levels

for a two-level multiindex. It should be straightforward to extend to more levels I think.

API breaking implications

None

Describe alternatives you've considered

None

Additional context

To give an example, I started digging into this problem because I had a 2-level-MultiIndexed Series with 61M rows, that is to be converted to a 1M x 1500 sparse matrix. Making the conversion using to_coo() took 10min, making it as described above took half a second.

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions