Skip to content

Non-monotonic-increasing DatetimeIndex claims not to __contain__ duplicate entries #9512

Closed
@ischwabacher

Description

@ischwabacher

This was fun to debug.

In [1]: import pandas as pd

In [2]: 0 in pd.Int64Index([0, 0, 1])
Out[2]: True

In [3]: 0 in pd.Int64Index([0, 1, 0])
Out[3]: True

In [4]: 0 in pd.Int64Index([0, 0, -1])
Out[4]: True

In [5]: pd.Timestamp(0) in pd.DatetimeIndex([0, 1, -1])
Out[5]: True

In [6]: pd.Timestamp(0) in pd.DatetimeIndex([0, 1, 0])
Out[6]: False   # BAD

In [7]: pd.Timestamp(0) in pd.DatetimeIndex([0, 0, 1])
Out[7]: True

In [8]: pd.Timestamp(0) in pd.DatetimeIndex([0, 0, -1])
Out[8]: False   # BAD

TimedeltaIndex is also broken.

The problem is in DatetimeIndexOpsMixin.__contains__, which checks the type of idx.get_loc(key) to determine whether the key was found in the index. If the index contains duplicate entries and is not monotonic increasing (for some reason, monotonic decreasing doesn't cut it), get_loc eventually falls back to Int64Engine._maybe_get_bool_indexer, which returns an ndarray of bools if the key is duplicated. Since the original __contains__ method is looking for scalars or slices, it reports that the duplicated entry is not present.

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions