BUG: {expanding,rolling}_{cov,corr} functions between objects with different index sets

related #7514

There appears to be a bug in the expanding_{cov,corr} functions when dealing with two objects with different indexes.

First, there is a problem with series. See example below, where I would expect expanding_corr(s1, s2) to produce the result produced by expanding_corr(s1, s2a).

The problem is due to the fact that expanding_corr is implemented in terms of rolling_corr with window = max(len(arg1), len(arg2)), but then rolling_corr resets window to window = min(window, len(arg1), len(arg2)). The end result is that window = min(len(arg1), len(arg2)) -- and these are the raw, unaligned arg1 and arg2. Thus in the expanding_corr(s1, s2) example below, window=2, and so when calculating the third row (index=2) it tries to calculate the correlation between [2, 3] and [NaN, 3], producing NaN -- rather than calculating the correlation between [1, 2, 3] and [1, Nan, 3] and producing 1.

The solution would appear to be simply deleting the window = min(window, len(arg1), len(arg2)) line from rolling_cov and rolling_corr, as I believe the rolling_\* functions run fine with a window larger than the data, or at least replacing it with window = min(window, max(len(arg1), len(arg2))).

```
In [1]: from pandas import Series, expanding_corr

In [2]: s1 = Series([1, 2, 3], index=[0, 1, 2])

In [3]: s2 = Series([1, 3], index=[0, 2])

In [4]: expanding_corr(s1, s2)
Out[4]:
0   NaN
1   NaN
2   NaN
dtype: float64

In [5]: s2a = Series([1, None, 3], index=[0, 1, 2])

In [6]: expanding_corr(s1, s2a)
Out[6]:
0   NaN
1   NaN
2     1
dtype: float64
```

Next, there is a problem with data frames. [This was originally reported separately in https://github.com/pydata/pandas/issues/7512, but I've merged it into this issue.]

The problem is with with _flex_binary_moment(). When pairwise=True, it doesn't properly handle two DataFrames with different index sets. In the following example, I believe [6], [7], and [8] should all produce the result in [9].

```
In [1]: from pandas import DataFrame, expanding_corr

In [2]: df1 = DataFrame([[1,2], [3, 2], [3,4]], columns=['A','B'])

In [3]: df1a = DataFrame([[1,2], [3,4]], columns=['A','B'], index=[0,2])

In [4]: df2 = DataFrame([[5,6], [None,None], [2,1]], columns=['X','Y'])

In [5]: df2a = DataFrame([[5,6], [2,1]], columns=['X','Y'], index=[0,2])

In [6]: expanding_corr(df1, df2, pairwise=True)[2]
Out[6]:
          X         Y
A -1.224745 -1.224745
B -1.224745 -1.224745

In [7]: expanding_corr(df1, df2a, pairwise=True)[2]
Out[7]:
    X   Y
A NaN NaN
B NaN NaN

In [8]: expanding_corr(df1a, df2, pairwise=True)[2]
Out[8]:
    X   Y
A NaN NaN
B NaN NaN

In [9]: expanding_corr(df1a, df2a, pairwise=True)[2]
Out[9]:
   X  Y
A -1 -1
B -1 -1
```

And there are similar problems with rolling_cov and rolling_corr. For example, continuing with the previous example, [77], [78], and [79] should give the same result as [80].

```
In [77]: rolling_corr(df1, df2, window=3, pairwise=True, min_periods=2)[2]
Out[77]:
          X         Y
A -1.224745 -1.224745
B -1.224745 -1.224745

In [78]: rolling_corr(df1, df2a, window=3, pairwise=True, min_periods=2)[2]
Out[78]:
    X   Y
A NaN NaN
B NaN NaN

In [79]: rolling_corr(df1a, df2, window=3, pairwise=True, min_periods=2)[2]
Out[79]:
    X   Y
A NaN NaN
B NaN NaN

In [80]: rolling_corr(df1a, df2a, window=3, pairwise=True, min_periods=2)[2]
Out[80]:
   X  Y
A -1 -1
B -1 -1
```


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

BUG: {expanding,rolling}_{cov,corr} functions between objects with different index sets #7512

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Uh oh!

BUG: {expanding,rolling}_{cov,corr} functions between objects with different index sets #7512

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions