Closed
Description
I just noticed that DataFrame
constructor ignores the copy=True
argument if dtype
is set. In the code snippet below, the orig
dataframe should stay unmodified after any modification of new1
and new2
. Instead, the columns of new2
(or at least the first one as shown in the snippet) are references to the same data, as highlighted by the modification shown on statement 13 and onwards.
In [1]: import numpy as np
In [2]: import pandas as pd
In [3]: pd.show_versions()
INSTALLED VERSIONS
------------------
commit: None
python: 2.7.8.final.0
python-bits: 64
OS: Linux
OS-release: 3.16.0-25-generic
machine: x86_64
processor: x86_64
byteorder: little
LC_ALL: None
LANG: en_US.UTF-8
pandas: 0.15.2
nose: 1.3.4
Cython: 0.21.1
numpy: 1.9.1
scipy: 0.14.0
statsmodels: 0.4.3
IPython: 2.3.1
sphinx: 1.1.2
patsy: 0.3.0
dateutil: 2.3
pytz: 2014.10
bottleneck: 0.6.0
tables: None
numexpr: 2.4
matplotlib: 1.3.1
openpyxl: None
xlrd: None
xlwt: None
xlsxwriter: None
lxml: None
bs4: None
html5lib: 0.999
httplib2: 0.7.4
apiclient: None
rpy2: None
sqlalchemy: 0.9.7
pymysql: None
psycopg2: None
In [4]: orig_data = {
...: 'col1': [1.],
...: 'col2': [2.],
...: 'col3': [3.],}
In [5]: orig = pd.DataFrame(orig_data)
In [6]: new1 = pd.DataFrame(orig, copy=True)
In [7]: new2 = pd.DataFrame(orig, dtype=float, copy=True)
In [8]: new1
Out[8]:
col1 col2 col3
0 1 2 3
In [9]: new2
Out[9]:
col1 col2 col3
0 1 2 3
In [10]: new1['col1'] = 100.
In [11]: new1
Out[11]:
col1 col2 col3
0 100 2 3
In [12]: orig
Out[12]:
col1 col2 col3
0 1 2 3
In [13]: new2['col1'] = 200.
In [14]: new2
Out[14]:
col1 col2 col3
0 200 2 3
In [15]: orig
Out[15]:
col1 col2 col3
0 200 2 3
In [16]: