Skip to content

DOC: pd.DataFrame(dtype) arg cannot be list, dict, Series. And None will infer wider type than necessary. #14764

Closed
@smcinerney

Description

@smcinerney

Code Sample, a copy-pastable example if possible

# This is a DOCbug simply to document what pd.DataFrame(dtype) currently does.
# It is non-obvious to new users, gives non-obvious error messages, and also behaves differently to read_csv(dtype)

# a) Leaving dtype=None in constructor will infer a wider type than necessary
df_cols = {'year':np.int32, 'month':np.int8}
df = pd.DataFrame(columns=df_cols.keys(), dtype=None, index=range(10), data=-1)
>>> df.dtypes
month    int64
year     int64

# b) The doc doesn't explicitly say a list/dict/Series/array-like is not allowed (and if you
# pass in one the error is not very friendly). Also behaves differently to read_csv(dtype)
df = pd.DataFrame(columns=df_cols.keys(), dtype=np.int32, index=range(10), data=-1)
# df.dtypes shows they're all np.int32
# Fix up dtypes after declaration
for col,coltype in df_cols.items():
    df[col] = df[col].astype(coltype) 

Problem description

The DataFrame() doc doesn't explicitly say a list/dict/Series/array-like is not allowed (and if you pass in one the error is not very friendly). Also behaves differently to read_csv(dtype).
Leaving dtype=None in constructor will infer a wider type than necessary.
So in general you either set dtype=widest_necessary_type, or dtype=None and then manually fix them up after declaration, by casting with astype()

Expected Output

Output of pd.show_versions()

python: 2.7.10.final.0 python-bits: 64 machine: x86_64 processor: i386 byteorder: little pandas: 0.19.1 numpy: 1.11.2 scipy: 0.18.1

Metadata

Metadata

Assignees

No one assigned

    Labels

    DocsDtype ConversionsUnexpected or buggy dtype conversionsReshapingConcat, Merge/Join, Stack/Unstack, Explode

    Type

    No type

    Projects

    No projects

    Milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions