Skip to content

Method dropna does not work on SparseDataFrames #21172

Closed
@babky

Description

@babky

Function dropna may return wrong result on SparseDataFrame. The following code

import pandas as pd

pd.SparseDataFrame({"F1": [None, None], "F2": [0, 1], "F3": [float('nan'), 0]}).dropna(axis=1, inplace=False, how='all')
pd.SparseDataFrame({"F1": [None, None], "F2": [0, 1], "F3": [None, 0]}).dropna(axis=1, inplace=False, how='all')
pd.SparseDataFrame({"F1": [float('nan'), float('nan')], "F2": [0, 1], "F3": [float('nan'), 0]}).dropna(axis=1, inplace=False, how='all')
pd.SparseDataFrame({"F1": [None, None], "F2": [0, 1]}).dropna(axis=1, inplace=False, how='all')
pd.SparseDataFrame({"F1": [float('nan'), float('nan')], "F2": [0, 1]}).dropna(axis=1, inplace=False, how='all')

pd.SparseDataFrame({"F1": [None, None], "F2": [0, 1], "F3": [float('nan'), 0]}).to_dense().dropna(axis=1, inplace=False, how='all')
pd.SparseDataFrame({"F1": [None, None], "F2": [0, 1], "F3": [None, 0]}).to_dense().dropna(axis=1, inplace=False, how='all')
pd.SparseDataFrame({"F1": [float('nan'), float('nan')], "F2": [0, 1], "F3": [float('nan'), 0]}).to_dense().dropna(axis=1, inplace=False, how='all')
pd.SparseDataFrame({"F1": [None, None], "F2": [0, 1]}).to_dense().dropna(axis=1, inplace=False, how='all')
pd.SparseDataFrame({"F1": [float('nan'), float('nan')], "F2": [0, 1]}).to_dense().dropna(axis=1, inplace=False, how='all')

outputs

import pandas as pd

print(pd.SparseDataFrame({"F1": [None, None], "F2": [0, 1], "F3": [float('nan'), 0]}).dropna(axis=1, inplace=False, how='all'))
    F1  F2
0  NaN   0
1  NaN   1

print(pd.SparseDataFrame({"F1": [None, None], "F2": [0, 1], "F3": [None, 0]}).dropna(axis=1, inplace=False, how='all'))
    F1  F2
0  NaN   0
1  NaN   1

print(pd.SparseDataFrame({"F1": [float('nan'), float('nan')], "F2": [0, 1], "F3": [float('nan'), 0]}).dropna(axis=1, inplace=False, how='all'))
   F1  F2
0 NaN   0
1 NaN   1

print(pd.SparseDataFrame({"F1": [None, None], "F2": [0, 1]}).dropna(axis=1, inplace=False, how='all'))
    F1
0  NaN
1  NaN

print(pd.SparseDataFrame({"F1": [float('nan'), float('nan')], "F2": [0, 1]}).dropna(axis=1, inplace=False, how='all'))
   F1
0 NaN
1 NaN

print(pd.SparseDataFrame({"F1": [None, None], "F2": [0, 1], "F3": [float('nan'), 0]}).to_dense().dropna(axis=1, inplace=False, how='all'))
   F2   F3
0   0  NaN
1   1  0.0

print(pd.SparseDataFrame({"F1": [None, None], "F2": [0, 1], "F3": [None, 0]}).to_dense().dropna(axis=1, inplace=False, how='all'))
   F2   F3
0   0  NaN
1   1  0.0

print(pd.SparseDataFrame({"F1": [float('nan'), float('nan')], "F2": [0, 1], "F3": [float('nan'), 0]}).to_dense().dropna(axis=1, inplace=False, how='all'))
   F2   F3
0   0  NaN
1   1  0.0

print(pd.SparseDataFrame({"F1": [None, None], "F2": [0, 1]}).to_dense().dropna(axis=1, inplace=False, how='all'))
   F2
0   0
1   1

print(pd.SparseDataFrame({"F1": [float('nan'), float('nan')], "F2": [0, 1]}).to_dense().dropna(axis=1, inplace=False, how='all'))
   F2
0   0
1   1

Problem description

dropna method behaves differently for SparseDataFrames and dense ones. Also it may happen that it does not drop nan columns at all (see the last examples in the first batch). The correct behaviour is in the second batch of commands.

Expected Output

   F2   F3
0   0  NaN
1   1  0.0

   F2   F3
0   0  NaN
1   1  0.0

   F2   F3
0   0  NaN
1   1  0.0

   F2
0   0
1   1

   F2
0   0
1   1

Output of pd.show_versions()

pd.show_versions()

INSTALLED VERSIONS
------------------
commit: None
python: 3.6.5.final.0
python-bits: 64  
OS: Linux       
OS-release: 4.15.0-20-generic
machine: x86_64
processor:
byteorder: little                                                                
LC_ALL: C.UTF-8
LANG: C.UTF-8
LOCALE: en_US.UTF-8
                                                                                 
pandas: 0.23.0
pytest: 3.5.0
pip: 9.0.3
setuptools: 39.0.1
Cython: 0.28.2                                                                   
numpy: 1.14.3
scipy: 1.0.1
pyarrow: None
xarray: None                                                                     
IPython: 6.3.1
sphinx: None
patsy: None
dateutil: 2.7.2                                                                  
pytz: 2018.4
blosc: None
bottleneck: None
tables: None                                                                     
numexpr: 2.6.4
feather: None
matplotlib: 2.2.2
openpyxl: None                                                                   
xlrd: None
xlwt: None
xlsxwriter: None
lxml: None
bs4: None
html5lib: 1.0.1       
sqlalchemy: 1.2.7
pymysql: None     
psycopg2: None    
jinja2: 2.10
s3fs: None           
fastparquet: None
pandas_gbq: None
pandas_datareader: None

Metadata

Metadata

Assignees

No one assigned

    Labels

    Missing-datanp.nan, pd.NaT, pd.NA, dropna, isnull, interpolateSparseSparse Data Type

    Type

    No type

    Projects

    No projects

    Milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions