drop_duplicates destroys non-duplicated data under 0.17

The `drop_duplicates()` function in Python 3 is broken. Take the following example snippet:

```
import pandas as pd

raw_data = {'x': [7,6,3,3,4,8,0],'y': [0,6,5,5,9,1,2]}
df = pd.DataFrame(raw_data, columns = ['x', 'y'])

print("Before:", df)
df = df.drop_duplicates()
print("After:", df)
```

When run under python 2, the results are correct, but when running under python 3, pandas removes `6,6` from the frame, which is a completely unique row. When using this function with large CSV files, it causes thousands of lines of unique data loss.

See:
http://stackoverflow.com/questions/33224356/why-is-pandas-dropping-unique-rows


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

drop_duplicates destroys non-duplicated data under 0.17 #11376

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Uh oh!

drop_duplicates destroys non-duplicated data under 0.17 #11376

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions