Description
Many thanks for the excellent software. This report is about behavior I did not expect. Not sure if it is a bug or not.
>>> import pandas as pd
>>> s = pd.Series([10, 20, 30, 'a', 'a', 'b', 'a'])
>>> print(s)
0 10
1 20
2 30
3 a
4 a
5 b
6 a
dtype: object
>>> print(s.replace('a', None))
0 10
1 20
2 30
3 30
4 30
5 b
6 b
dtype: object
>>> print(s.replace({'a': None}))
0 10
1 20
2 30
3 None
4 None
5 b
6 None
dtype: object
Problem description
This behavior was unexpected for me. I would have assumed that these two lines would produce the same output:
s.replace('a', None)
s.replace({'a': None})
In my particular use case, I was actually looking to just replace 'a'
with None
and therefore did s.replace('a', None)
. I did not check output carefully and therefore ended up with some very strange behavior down the line in my data analysis.
Not sure if this is to be considered a bug or not. Docs are not entirely clear on what is intended behavior. Possible solutions could include
- Describe behavior in docs (the filling behavior is barely described at all).
- Hint that something like
s.replace('a', numpy.nan)
might be a better option. - Change API to require a more explicit opt-in for filling.
Output of pd.show_versions()
INSTALLED VERSIONS
commit: None
python: 3.6.4.final.0
python-bits: 64
OS: Linux
OS-release: 4.4.0-116-generic
machine: x86_64
processor: x86_64
byteorder: little
LC_ALL: None
LANG: en_US.UTF-8
LOCALE: en_US.UTF-8
pandas: 0.22.0
pytest: None
pip: 9.0.1
setuptools: 38.5.1
Cython: None
numpy: 1.14.0
scipy: None
pyarrow: None
xarray: None
IPython: None
sphinx: None
patsy: None
dateutil: 2.6.1
pytz: 2018.3
blosc: None
bottleneck: None
tables: None
numexpr: None
feather: None
matplotlib: None
openpyxl: None
xlrd: None
xlwt: None
xlsxwriter: None
lxml: None
bs4: None
html5lib: None
sqlalchemy: None
pymysql: None
psycopg2: None
jinja2: None
s3fs: None
fastparquet: None
pandas_gbq: None
pandas_datareader: None