Skip to content

Center rolling window for time offset #38780

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 61 commits into from
Apr 9, 2021
Merged

Conversation

luholgit
Copy link
Contributor

@luholgit luholgit commented Dec 29, 2020

Added center functionality for VariableWindowIndexer. Note - I am unsure if the NotImplementedError in lines 1966-1969 in rolling.py still correctly raises an error for offset based windows.

Finalizes previous PR #36097

Copy link
Contributor

@jreback jreback left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

needs tests

@jreback jreback added the Window rolling, ewma, expanding label Dec 29, 2020
@luholgit
Copy link
Contributor Author

take

@@ -11,7 +11,7 @@ def calculate_variable_window_bounds(
int64_t num_values,
int64_t window_size,
object min_periods, # unused but here to match get_window_bounds signature
object center, # unused but here to match get_window_bounds signature
object center,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should be bint center now if supported

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks for your quick response, this is done now. :)

@@ -450,7 +457,36 @@ def homogeneous_func(values: np.ndarray):
if values.size == 0:
return values.copy()

def _calculate_center_offset(window) -> int:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

None of this code below should be needed.

The center calculation should just adjust the indexers return in calculate_variable_window_bounds and everything else should work correctly.

@@ -334,6 +334,13 @@ def _get_window_indexer(self) -> BaseIndexer:
"""
if isinstance(self.window, BaseIndexer):
return self.window

if self.is_freq_type:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This clause shouldn't be needed either. self,center should just be passed to VariableWindowIndexer in the clause below

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is corrected too

or isinstance(
self._on, (ABCDatetimeIndex, ABCTimedeltaIndex, ABCPeriodIndex)
)
) and isinstance(self.window, (str, BaseOffset, timedelta)):

self._validate_monotonic()

# we don't allow center
if self.center:
# we don't allow center for offset based windows
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This entire clause should be removed once center is supported

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

removed this now, too! :)

_calculate_center_offset(self.window)
if self.center
and not isinstance(
self._get_window_indexer(self.window), VariableWindowIndexer
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think you should be passing an argument here, just self._get_window_indexer()

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I removed the argument here, too. Thanks for the feedback!

@jreback jreback changed the title Center window Center rolling window for time offset Dec 29, 2020
Copy link
Contributor

@jreback jreback left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you need tests to validate each path that is added.

@@ -74,14 +78,27 @@ def calculate_variable_window_bounds(
# right endpoint is open
else:
end[0] = 0
if center_window:
for j in range(0, num_values+1):
if (index[j] == index[0] + index_growth_sign*window_size/2 and
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

use spaces around operators & parens to format this (we don't lint these files automatically)

Comment on lines 418 to 430
def _center_window(self, result: np.ndarray, offset: int) -> np.ndarray:
"""
Center the result in the window for weighted rolling aggregations.
"""
if self.axis > result.ndim - 1:
raise ValueError("Requested axis is larger then no. of argument dimensions")

if offset > 0:
lead_indexer = [slice(None)] * result.ndim
lead_indexer[self.axis] = slice(offset, None)
result = np.copy(result[tuple(lead_indexer)])
return result

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In an effort to make the tests run again before introducing new ones, I encountered an error with "'Rolling' object has no attribute '_center_window'".
I tried to solve the error by adding the _center_window() method to the BaseWindow class.
However, this resulted in new errors.

I am not sure whether it is correct to follow this debugging path. Could you quickly comment @jreback , @mroeschke or @arw2019?

Meanwhile, I am working on adding the changes you asked for. :)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As mentioned in #38780 (comment), I think you shouldn't need any additional code in this _apply method.

The changes should just need to occur in pandas/_libs/window/indexers.pyx in the calculate_variable_window_bounds function to make it return indexes that "center" the window.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For example, for non time offests, center is handled when calculating the indexes of each window

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As mentioned in #38780 (comment), I think you shouldn't need any additional code in this _apply method.

The changes should just need to occur in pandas/_libs/window/indexers.pyx in the calculate_variable_window_bounds function to make it return indexes that "center" the window.

Just removed the unnecessary lines from rolling.py and that made the tests work again, thanks for that hint! At first I understood I had to migrate those lines to the pandas/_libs/window/indexers.pyx instead of ditching them all together.

@@ -60,6 +62,8 @@ def calculate_variable_window_bounds(

if index[num_values - 1] < index[0]:
index_growth_sign = -1
if center:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sine we typed center as bint. This isnt needed anymore

@@ -334,9 +334,12 @@ def _get_window_indexer(self) -> BaseIndexer:
"""
if isinstance(self.window, BaseIndexer):
return self.window

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you undo this white space?

@@ -451,6 +454,7 @@ def homogeneous_func(values: np.ndarray):
return values.copy()

def calc(x):

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you undo this white space?

@@ -1880,20 +1884,14 @@ def validate(self):
# we allow rolling on a datetimelike index
if (
self.obj.empty
# TODO: add "or self.is_datetimelike"?
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't need this TODO

Copy link
Contributor

@jreback jreback left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@luholgit we need actual tests that lock down whether the bounds are correct including variable size windows. There are some center tests for fixed windows, follow those examples.

@pep8speaks
Copy link

pep8speaks commented Mar 29, 2021

Hello @luholgit! Thanks for updating this PR. We checked the lines you've touched for PEP 8 issues, and found:

There are currently no PEP 8 issues detected in this Pull Request. Cheers! 🍻

Comment last updated at 2021-04-09 12:15:28 UTC

Copy link
Contributor

@jreback jreback left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

pls also add a whatsnew note (enh is ok), as well as see if any updates are needed in the https://pandas.pydata.org/pandas-docs/dev/user_guide/window.html, also the doc-string of rolling might need a small update as well


tm.assert_frame_equal(result, expected)


def test_closed_fixed_binary_col():
def test_datetimelike_centered_selections(closed, arithmetic_win_operators):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you paramterize on closed (e.g. directly put your window selection up there with the closed parameter), eg.. don't use the fixture

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good idea, this is done, too.

+ [77.0] * 2
)

else:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if doing this pls paramterize outside the function

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done!

@jreback jreback added this to the 1.3 milestone Apr 7, 2021
used (:issue:`38780`).
For example:

.. code-block:: ipython
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

use ipython:: python so the code runs (don't need the prompts or output just the code)

@@ -134,6 +134,30 @@ a copy will no longer be made (:issue:`32960`)
The default behavior when not passing ``copy`` will remain unchanged, i.e.
a copy will be made.

Centered Datetime-Like Rolling Windows
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

would not object to an example in window.rst as well

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added the same example also to the window.rst now

index=date_range("2020", periods=5, freq="1D")
)

In [2]: df.rolling("2D", center=True).mean()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't need the In Out annotations when using .. ipython:: python (the code will execute line by line

Comment on lines +219 to +230
def test_even_number_window_alignment():
# see discussion in GH 38780
s = Series(range(3), index=date_range(start="2020-01-01", freq="D", periods=3))

# behavior of index- and datetime-based windows differs here!
# s.rolling(window=2, min_periods=1, center=True).mean()

result = s.rolling(window="2D", min_periods=1, center=True).mean()

expected = Series([0.5, 1.5, 2], index=s.index)

tm.assert_series_equal(result, expected)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mroeschke I added this test to check for the behavior when operating with even sized windows. As mentioned by @sevberg this currently does behave differently than the classic index-based windows.

Is that what you had in mind here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This test would also work with N instead of D as frequency, therefore I am not sure if this is what you meant when referring to nano seconds?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes I believe this test exercises the difference in behavior which is good.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great, thanks!

Copy link
Contributor

@jreback jreback left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

just a minor doc comment, ping on green (I think you addressed all of @mroeschke comments but if not pls do)


df = pd.DataFrame(
{"A": [0, 1, 2, 3, 4]}, index=pd.date_range("2020", periods=5, freq="1D")
)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same comment as above

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

@@ -156,6 +156,16 @@ By default the labels are set to the right edge of the window, but a
s.rolling(window=5).mean()
s.rolling(window=5, center=True).mean()

This can also be applied to datetime-like indices.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you add a versionadded 1.3 tag here

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done, too!

@jreback jreback merged commit 2e341ab into pandas-dev:master Apr 9, 2021
@jreback
Copy link
Contributor

jreback commented Apr 9, 2021

thanks @luholgit very nice!

@luholgit
Copy link
Contributor Author

Big thanks to you, @jreback and @mroeschke! I've learned a lot while working on this. :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Window rolling, ewma, expanding
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Center rolling window with date NotImplementedError
9 participants