Skip to content

DataFrame.clip_upper does not preserve dtype per column #24162

Closed
@joneugster

Description

@joneugster

Code Sample

import pandas as pd
data = pd.DataFrame({'INT': [-1, 0, 10, 9],
              'FLOAT': [-0.148, 0.2347, 38.237, 12.2233]},
             index=pd.date_range("20180101 00:00", periods=4))

print('Original data:')
print(data.head())

print('\nThis is probably not a bug but my misunderstanding:')
print('(So how would I apply "clip_upper" inplace on parts of the dataframe?)')
data.loc[[True, True, True, False], ['INT']].clip_upper(8, inplace=True)
print(data.head()) 
# I used then:
# data.loc[[True, True, True, False], ['INT']] = data.loc[[True, True, True, False], ['INT']].clip_upper(8)     

print('\nIt seems that clip_upper does not preserve the dtypes:')
print(data.clip_upper(8).head())

print('\nSame for inplace:')
data.clip_upper(8, inplace=True)
print(data.head())
Output of this code:
Original data:
            INT    FLOAT
2018-01-01   -1  -0.1480
2018-01-02    0   0.2347
2018-01-03   10  38.2370
2018-01-04    9  12.2233

(A) This is probably not a bug but my misunderstanding:
(So how would I apply "clip_upper" inplace on parts of the dataframe?)
            INT    FLOAT
2018-01-01   -1  -0.1480
2018-01-02    0   0.2347
2018-01-03   10  38.2370
2018-01-04    9  12.2233

(B) It seems that clip_upper does not preserve the dtypes:
            INT   FLOAT
2018-01-01 -1.0 -0.1480
2018-01-02  0.0  0.2347
2018-01-03  8.0  8.0000
2018-01-04  8.0  8.0000

(C) Same for inplace:
            INT   FLOAT
2018-01-01 -1.0 -0.1480
2018-01-02  0.0  0.2347
2018-01-03  8.0  8.0000
2018-01-04  8.0  8.0000

Problem description

clip_upper with int- and float- columns convert int-column to float.

Calling data.clip_upper(10) with an integer, I would expect that it leaves the int-column as integers and the float-column as float. However, it converts everything to float. (see (B) and (C))

Moreover, clip_upper with inplace=True does not work with .loc but this might as well be me understanding the concept wrong... (see (A))

Same for clip_lower.

Expected Output

For (A):

            INT    FLOAT
2018-01-01   -1  -0.1480
2018-01-02    0   0.2347
2018-01-03    8  38.2370
2018-01-04    9  12.2233

For (B) and (C):

            INT   FLOAT
2018-01-01 -1 -0.1480
2018-01-02  0  0.2347
2018-01-03  8  8.0000
2018-01-04  8  8.0000

Output of pd.show_versions()

INSTALLED VERSIONS

commit: None

pandas: 0.23.4
pytest: 4.0.1
pip: 18.1
setuptools: 40.6.2
Cython: 0.29
numpy: 1.15.4
scipy: 1.1.0
pyarrow: None
xarray: None
IPython: 7.2.0
sphinx: 1.8.2
patsy: None
dateutil: 2.7.5
pytz: 2018.7
blosc: None
bottleneck: None
tables: None
numexpr: None
feather: None
matplotlib: 3.0.1
openpyxl: 2.5.11
xlrd: 1.1.0
xlwt: 1.3.0
xlsxwriter: 1.1.2
lxml: 4.2.5
bs4: 4.6.3
html5lib: 1.0.1
sqlalchemy: 1.2.14
pymysql: None
psycopg2: None
jinja2: 2.10
s3fs: None
fastparquet: None
pandas_gbq: None
pandas_datareader: None

Metadata

Metadata

Assignees

No one assigned

    Labels

    AlgosNon-arithmetic algos: value_counts, factorize, sorting, isin, clip, shift, diffBug

    Type

    No type

    Projects

    No projects

    Milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions