Skip to content

Issues with numeric_only for DataFrame.std() #9201

Closed
@mortada

Description

@mortada

The docstring shows a numeric_only option for DataFrame.std() but it does not seem to actually be implemented. I'm happy to take a crack at fixing it but I'm not sure whether it's the doc or the implementation that needs fixing.

To see this consider a mixed-type DataFrame where I'm setting one entry to be a str of '100' while all other entries are float. For std() It does not matter whether numeric_only is True or False, but for max() it clearly makes a difference.

In [1]: import pandas as pd
In [2]: import numpy as np
In [3]: df = pd.DataFrame(np.random.randn(5, 2), columns=['foo', 'bar'])
In [4]: df.ix[0, 'foo'] = '100'

In [5]: df
Out[5]:
         foo       bar
0        100 -1.958036
1   0.221049  0.309971
2   1.200093 -0.103244
3  -2.475388 -2.279483
4  0.1623936 -1.185682

In [6]: df.std(numeric_only=True)
Out[6]:
foo    44.841828
bar     1.129182
dtype: float64

In [7]: df.std(numeric_only=False)
Out[7]:
foo    44.841828
bar     1.129182
dtype: float64

In [8]: df.max(numeric_only=False)
Out[8]:
foo    100.000000
bar      0.309971
dtype: float64

In [9]: df.max(numeric_only=True)
Out[9]:
bar    0.309971
dtype: float64

Metadata

Metadata

Assignees

No one assigned

    Labels

    BugNumeric OperationsArithmetic, Comparison, and Logical operations

    Type

    No type

    Projects

    No projects

    Milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions