Description
I have a Series of lists that I need to work with, and discovered some crashes without exceptions. Here's an overview of the behaviour (which is inconsistent to me).
import pandas as pd
import numpy as np
ser = pd.Series([['a', 'b'], np.nan, [1]])
ser.replace({np.nan : []}) # crashes w/o exception
ser.replace({np.nan : 'dummy'}) # works
# 0 [a, b]
# 1 dummy
# 2 [1]
# dtype: object
ser.replace({np.nan : ['dummy']}) # why does this unwrap?
# 0 [a, b]
# 1 dummy
# 2 [1]
# dtype: object
ser.replace({np.nan : ['dummy', 'alt']}) # crashes w/o exception
ser.fillna([]) # raises
ser.fillna({1 : []}) # this works!
# 0 [a, b]
# 1 []
# 2 [1]
# dtype: object
ser.fillna({1 : ['dummy', 'alt']}) # works as well!
# 0 [a, b]
# 1 [dummy, alt]
# 2 [1]
# dtype: object
# Dataframe has exact same behaviour as Series above
df = pd.DataFrame({'col' : ser})
df.replace({np.nan : []}) # crashes w/o exception
...
I agree that interpreting a list as the argument to .replace
makes no sense. But I don't understand why it's not possible to fillna
a list (cf. other people asking this question https://stackoverflow.com/q/33199193/2965879).
There's no reason in my opinion why .replace({np.nan : ['dummy', 'alt']})
or .replace({np.nan : []})
couldn't work in principle - it's very clear what the intent is. Furthermore, it already works like that in fillna
(of course with different interpretation of the dict-key). But even if forbidding lists is a design decision, the call shouldn't just crash, but raise an exception, at least.
In my case, I have to design an API that does replacements (as pre-/post-processing around actual work) with pandas
in the background, and I'd like to be able to just pass through legal dict
s to Series.replace
/ DataFrame.replace
- e.g. {r'\s*' : np.nan}
or {np.nan : []}
. Otherwise I have to inspect every passed replacement parameter (with all the overhead that comes with allowing both {search : replace}
and {column : {search : replace}}
), extract special cases, and build complicated wrappers like in the answers of the above SO question.
Versions are the most recent on conda, details below.
Output of pd.show_versions()
INSTALLED VERSIONS
commit: None
python: 3.6.4.final.0
python-bits: 64
OS: Windows
OS-release: 10
machine: AMD64
processor: Intel64 Family 6 Model 78 Stepping 3, GenuineIntel
byteorder: little
LC_ALL: None
LANG: None
LOCALE: None.None
pandas: 0.22.0
pytest: 3.2.1
pip: 9.0.1
setuptools: 38.4.0
Cython: 0.26.1
numpy: 1.13.3
scipy: 1.0.0
pyarrow: None
xarray: None
IPython: 6.2.1
sphinx: 1.6.3
patsy: 0.4.1
dateutil: 2.6.1
pytz: 2017.3
blosc: None
bottleneck: 1.2.1
tables: 3.4.2
numexpr: 2.6.4
feather: None
matplotlib: 2.1.0
openpyxl: 2.4.8
xlrd: 1.1.0
xlwt: 1.3.0
xlsxwriter: 1.0.2
lxml: 4.1.0
bs4: 4.6.0
html5lib: 1.0.1
sqlalchemy: 1.1.13
pymysql: None
psycopg2: None
jinja2: 2.9.6
s3fs: None
fastparquet: None
pandas_gbq: None
pandas_datareader: None