Skip to content

BUG: merge_asof only merges by first column of 'by=' list #15676

Closed
@seibs

Description

@seibs

Code Sample, a copy-pastable example if possible

left = pd.DataFrame([
    [pd.to_datetime('20160602'), 1, pd.to_datetime('20150430')],
    [pd.to_datetime('20160602'), 2, pd.to_datetime('20150430')],
    [pd.to_datetime('20160603'), 1, pd.to_datetime('20150501')],
    [pd.to_datetime('20160603'), 2, pd.to_datetime('20150501')],
], columns=['time', 'key_1', 'key_2']).set_index('time')

right = pd.DataFrame([
    [pd.to_datetime('20160502'), 1, pd.to_datetime('20150430'), 1.0],
    [pd.to_datetime('20160502'), 2, pd.to_datetime('20150430'), 2.0],
    [pd.to_datetime('20160503'), 1, pd.to_datetime('20150501'), 3.0],
    [pd.to_datetime('20160503'), 2, pd.to_datetime('20150501'), 4.0],
], columns=['time', 'key_1', 'key_2', 'value']).set_index('time')

expected = pd.DataFrame([
    [pd.to_datetime('20160602'), 1, pd.to_datetime('20150430'), 1.0],
    [pd.to_datetime('20160602'), 2, pd.to_datetime('20150430'), 2.0],
    [pd.to_datetime('20160603'), 1, pd.to_datetime('20150501'), 3.0],
    [pd.to_datetime('20160603'), 2, pd.to_datetime('20150501'), 4.0],
], columns=['time', 'key_1', 'key_2', 'value']).set_index('time')

result = pd.merge_asof(left,
                       right,
                       left_index=True,
                       right_index=True,
                       by=['key_1', 'key_2'],
                       tolerance=pd.Timedelta('31d'))

assert_frame_equal(expected, result)

Problem description

merge_asof does not seem to be working with multiple "by" keys. The output looks like it only takes into account the first key in the list - with key_1 first in the list, the output is the same as by=['key_1'] and with key_2 first in the list, the output is the same as by=['key_2'].

Expected Output

            key_1      key_2  value
time                               
2016-06-02      1 2015-04-30    1.0
2016-06-02      2 2015-04-30    2.0
2016-06-03      1 2015-05-01    3.0
2016-06-03      2 2015-05-01    4.0

Actual Output

            key_1      key_2  value
time                               
2016-06-02      1 2015-04-30    3.0
2016-06-02      2 2015-04-30    4.0
2016-06-03      1 2015-05-01    3.0
2016-06-03      2 2015-05-01    4.0

Output of pd.show_versions()

INSTALLED VERSIONS ------------------ commit: None python: 3.5.2.final.0 python-bits: 64 OS: Linux OS-release: 4.1.35-pv-ts2 machine: x86_64 processor: byteorder: little LC_ALL: None LANG: en_US.UTF-8 LOCALE: en_US.UTF-8 pandas: 0.19.2 nose: None pip: 9.0.1 setuptools: 27.2.0 Cython: None numpy: 1.11.3 scipy: None statsmodels: None xarray: None IPython: 5.1.0 sphinx: None patsy: None dateutil: 2.6.0 pytz: 2016.10 blosc: None bottleneck: None tables: None numexpr: None matplotlib: None openpyxl: None xlrd: None xlwt: None xlsxwriter: None lxml: None bs4: 4.5.3 html5lib: None httplib2: None apiclient: None sqlalchemy: None pymysql: None psycopg2: None jinja2: 2.9.4 boto: None pandas_datareader: None

Metadata

Metadata

Assignees

No one assigned

    Labels

    BugReshapingConcat, Merge/Join, Stack/Unstack, Explode

    Type

    No type

    Projects

    No projects

    Milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions