Skip to content

Series replace, unexpected fill behavior #19998

Closed
@rasmuse

Description

@rasmuse

Many thanks for the excellent software. This report is about behavior I did not expect. Not sure if it is a bug or not.

>>> import pandas as pd
>>> s = pd.Series([10, 20, 30, 'a', 'a', 'b', 'a'])
>>> print(s)
0    10
1    20
2    30
3     a
4     a
5     b
6     a
dtype: object
>>> print(s.replace('a', None))
0    10
1    20
2    30
3    30
4    30
5     b
6     b
dtype: object
>>> print(s.replace({'a': None}))
0      10
1      20
2      30
3    None
4    None
5       b
6    None
dtype: object

Problem description

This behavior was unexpected for me. I would have assumed that these two lines would produce the same output:

s.replace('a', None)
s.replace({'a': None})

In my particular use case, I was actually looking to just replace 'a'with None and therefore did s.replace('a', None). I did not check output carefully and therefore ended up with some very strange behavior down the line in my data analysis.

Not sure if this is to be considered a bug or not. Docs are not entirely clear on what is intended behavior. Possible solutions could include

  • Describe behavior in docs (the filling behavior is barely described at all).
  • Hint that something like s.replace('a', numpy.nan) might be a better option.
  • Change API to require a more explicit opt-in for filling.

Output of pd.show_versions()

INSTALLED VERSIONS

commit: None
python: 3.6.4.final.0
python-bits: 64
OS: Linux
OS-release: 4.4.0-116-generic
machine: x86_64
processor: x86_64
byteorder: little
LC_ALL: None
LANG: en_US.UTF-8
LOCALE: en_US.UTF-8

pandas: 0.22.0
pytest: None
pip: 9.0.1
setuptools: 38.5.1
Cython: None
numpy: 1.14.0
scipy: None
pyarrow: None
xarray: None
IPython: None
sphinx: None
patsy: None
dateutil: 2.6.1
pytz: 2018.3
blosc: None
bottleneck: None
tables: None
numexpr: None
feather: None
matplotlib: None
openpyxl: None
xlrd: None
xlwt: None
xlsxwriter: None
lxml: None
bs4: None
html5lib: None
sqlalchemy: None
pymysql: None
psycopg2: None
jinja2: None
s3fs: None
fastparquet: None
pandas_gbq: None
pandas_datareader: None

Metadata

Metadata

Assignees

No one assigned

    Labels

    API DesignMissing-datanp.nan, pd.NaT, pd.NA, dropna, isnull, interpolateNeeds DiscussionRequires discussion from core team before further actionreplacereplace method

    Type

    No type

    Projects

    No projects

    Milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions