Closed
Description
Code Sample, a copy-pastable example if possible
# This is a DOCbug simply to document what pd.DataFrame(dtype) currently does.
# It is non-obvious to new users, gives non-obvious error messages, and also behaves differently to read_csv(dtype)
# a) Leaving dtype=None in constructor will infer a wider type than necessary
df_cols = {'year':np.int32, 'month':np.int8}
df = pd.DataFrame(columns=df_cols.keys(), dtype=None, index=range(10), data=-1)
>>> df.dtypes
month int64
year int64
# b) The doc doesn't explicitly say a list/dict/Series/array-like is not allowed (and if you
# pass in one the error is not very friendly). Also behaves differently to read_csv(dtype)
df = pd.DataFrame(columns=df_cols.keys(), dtype=np.int32, index=range(10), data=-1)
# df.dtypes shows they're all np.int32
# Fix up dtypes after declaration
for col,coltype in df_cols.items():
df[col] = df[col].astype(coltype)
Problem description
The DataFrame() doc doesn't explicitly say a list/dict/Series/array-like is not allowed (and if you pass in one the error is not very friendly). Also behaves differently to read_csv(dtype).
Leaving dtype=None in constructor will infer a wider type than necessary.
So in general you either set dtype=widest_necessary_type, or dtype=None and then manually fix them up after declaration, by casting with astype()
Expected Output
Output of pd.show_versions()
python: 2.7.10.final.0
python-bits: 64
machine: x86_64
processor: i386
byteorder: little
pandas: 0.19.1
numpy: 1.11.2
scipy: 0.18.1