Skip to content

ENH: improved dtype inference for Index.map #44609

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 4 commits into from
Dec 5, 2021

Conversation

jbrockmendel
Copy link
Member

  • closes #xxxx
  • tests added / passed
  • Ensure all linting tests pass, see here for how to run them
  • whatsnew entry

Broken off from #43930

@jreback jreback added Dtype Conversions Unexpected or buggy dtype conversions ExtensionArray Extending pandas with custom dtypes or arrays. labels Nov 25, 2021
@jreback jreback added this to the 1.4 milestone Nov 25, 2021
@jreback
Copy link
Contributor

jreback commented Nov 25, 2021

looks good. is this a user visible thing? if so can you add a whatsnew

@jorisvandenbossche
Copy link
Member

An example to illustrate, assume we have an uint32 index:

In [1]: idx = pd.NumericIndex([1, 2, 3], dtype="uint32")

In [2]: idx
Out[2]: NumericIndex([1, 2, 3], dtype='uint32')

In [3]: idx.map(lambda x: x)
Out[3]: NumericIndex([1, 2, 3], dtype='int64')

On master we always get an int64 index back, while with this PR it tries to preserve the original dtype:

In [3]: idx.map(lambda x: x)
Out[3]: NumericIndex([1, 2, 3], dtype='uint32')

Now, I don't know if this is actually a user visible change compared to the released version, since the above example is with a non-int64 dtype, which is only in master (so in that sense you could see it as a bug fix in the new not-yet-released feature)

@jorisvandenbossche
Copy link
Member

Now, I don't know if this is actually a user visible change compared to the released version, since the above example is with a non-int64 dtype, which is only in master (so in that sense you could see it as a bug fix in the new not-yet-released feature)

Ah, we of course already had UInt64Index, for example, and that will now also preserve the dtype instead of returning Int64Index in the example above.

Copy link
Member

@jorisvandenbossche jorisvandenbossche left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The current tests you edited are (all?) simple lambda x: x checks that ensure the dtype is now preseved. Can you ensure we also test the case where a maybe_cast_pointwise_result needs to fallback? Eg idx.map(lambda x: x*1000) with an uint8 index, where the resulting values thus don't fit in the original dtype.

@jbrockmendel
Copy link
Member Author

is this a user visible thing? if so can you add a whatsnew

yes, will add whatsnew.

@jbrockmendel
Copy link
Member Author

The current tests you edited are (all?) simple lambda x: x checks that ensure the dtype is now preseved. Can you ensure we also test the case where a maybe_cast_pointwise_result needs to fallback? Eg idx.map(lambda x: x*1000) with an uint8 index, where the resulting values thus don't fit in the original dtype.

I do intend to do this (https://github.com/pandas-dev/pandas/pull/44609/files#diff-9090cca4ac914e526dfb58dd84ce7c75e2935e41cdeb3ee1618bb5068f05e2dbR549), the question is when. This PR in its current state is just splitting the Index.map changes off from #43930. I'm happy to take the time to do this The Right Way, but that does mean marginally slowing down the ExtensionIndex PR.

@jreback jreback merged commit bb50531 into pandas-dev:master Dec 5, 2021
@jbrockmendel jbrockmendel deleted the bug-index-map branch December 5, 2021 02:06
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Dtype Conversions Unexpected or buggy dtype conversions ExtensionArray Extending pandas with custom dtypes or arrays.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants