Description
Code Sample, a copy-pastable example if possible
I'm sadly unable to share the underlying data, and have not yet been able to product a minimised reproduction.
In [202]: s1 = df1.symbol
In [203]: s2 = df2.symbol
In [204]: s1.dtype
Out[204]: CategoricalDtype(categories=['RE00012ME6MA', 'RE00002YE6MA', 'RE00018ME6MA', 'RE00012YE6MA', 'RE00013YE6MA', 'RE00010YE6MA', 'RE00014YE6MA', 'RE00015YE6MA', 'RE00016YE6MA', 'RE00017YE6MA', 'RE00018YE6MA'
, 'RE00019YE6MA', 'RE00020YE6MA', 'RE00025YE6MA', 'RE00011YE6MA', 'RE00003YE6MA', 'RE00005YE6MA', 'RE00009YE6MA', 'RE00004YE6MA', 'RE00008YE6MA', 'RE00006YE6MA', 'RE00007YE6MA', 'RE00030YE6MA'], ordered=False)
In [205]: s1.shape
Out[205]: (2084,)
In [206]: s2.dtype
Out[206]: CategoricalDtype(categories=['RE00030YE6MA', 'RE00008YE6MA', 'RE00016YE6MA', 'RE00015YE6MA', 'RE00018YE6MA', 'RE00017YE6MA', 'RE00020YE6MA', 'RE00006YE6MA', 'RE00005YE6MA', 'RE00004YE6MA', 'RE00014YE6MA'
, 'RE00025YE6MA', 'RE00003YE6MA', 'RE00013YE6MA', 'RE00002YE6MA', 'RE00009YE6MA', 'RE00018ME6MA', 'RE00011YE6MA', 'RE00019YE6MA', 'RE00010YE6MA', 'RE00007YE6MA', 'RE00012YE6MA', 'RE00012ME6MA'], ordered=False)
In [207]: s2.shape
Out[207]: (1030,)
In [208]: pd.concat([s1, s2]).astype('object') == pd.concat([s1.astype('object'), s2.astype('object')])
Out[208]:
0 True
1 True
2 True
3 True
4 True
...
1025 False
1026 False
1027 False
1028 False
1029 False
Name: symbol, Length: 3114, dtype: bool
In [209]: pd.concat([s1, s2], ignore_index=True).astype('object') == pd.concat([s1.astype('object'), s2.astype('object')], ignore_index=True)
Out[209]:
0 True
1 True
2 True
3 True
4 True
...
3109 False
3110 False
3111 False
3112 False
3113 False
Name: symbol, Length: 3114, dtype: bool
In [210]: pd.concat([s1.astype('object'), s2.astype('object')], ignore_index=True).iloc[-5:]
Out[210]:
3109 RE00012ME6MA
3110 RE00012ME6MA
3111 RE00005YE6MA
3112 RE00015YE6MA
3113 RE00015YE6MA
Name: symbol, dtype: object
In [211]: pd.concat([s1, s2], ignore_index=True).astype('object').iloc[-5:]
Out[211]:
3109 RE00030YE6MA
3110 RE00030YE6MA
3111 RE00016YE6MA
3112 RE00012YE6MA
3113 RE00012YE6MA
Name: symbol, dtype: object
Problem description
The row values have changed without warning. This seems to be extremely suprising behaviour!
Expected Output
Concatenating two series with categories of the same values in different orders should not result in the row values changing
Output of pd.show_versions()
INSTALLED VERSIONS
commit: None
python: 3.6.3.final.0
python-bits: 64
OS: Linux
OS-release: 4.9.0-4-amd64
machine: x86_64
processor:
byteorder: little
LC_ALL: None
LANG: C.UTF-8
LOCALE: en_US.UTF-8
pandas: 0.22.0
pytest: None
pip: 10.0.0.subpip_fix
setuptools: 36.5.0
Cython: None
numpy: 1.13.3
scipy: 1.0.0
pyarrow: None
xarray: None
IPython: 6.2.1
sphinx: None
patsy: None
dateutil: 2.6.1
pytz: 2017.3
blosc: None
bottleneck: None
tables: None
numexpr: None
feather: None
matplotlib: 2.1.1
openpyxl: None
xlrd: None
xlwt: None
xlsxwriter: None
lxml: 4.1.0
bs4: 4.6.0
html5lib: 1.0b10
sqlalchemy: None
pymysql: None
psycopg2: None
jinja2: 2.10
s3fs: None
fastparquet: None
pandas_gbq: None
pandas_datareader: 0.5.0