-
-
Notifications
You must be signed in to change notification settings - Fork 18.8k
Closed
Labels
EnhancementNeeds DiscussionRequires discussion from core team before further actionRequires discussion from core team before further actionWindowrolling, ewma, expandingrolling, ewma, expanding
Description
Problem
Currently calculating the the windowing aggregation of several pandas objects requires concatenating them together and returning the entire result
pd.concat([df1, df2, df3, ...], axis=1).rolling(10).mean()
By holding the latest state of a windowing aggregation, it is possible to calculate the above more efficiently (and flexibly) in an online fashion, e.g.
roll = df1.rolling(10)
roll.mean()
roll.mean(update=df2)
roll.mean(update=df3)
Aiming to eventually target:
rolling/expanding/ewm
groupby.rolling/groupby.expanding/groupby.ewm
As windowing operations are adapting numba
this would be a numba
only feature.
Important API features
- Independent state sharing (e.g. an update provided to rolling mean should not affect the state of rolling variance)
- Independent numba engine configuration (e.g. a user should be able to have rolling mean call
nogil=True
while also having rolling variance callnogil=False
)
Proposed API options (example with ExponentialMovingWindow
)
ExponentialMovingWindow
automatically hold state of an aggregation call, and anupdate
keyword exists on the aggregation methods to pass in updated data
In [1]: s = pd.Series([0, 1, 2, np.nan, 4])
In [2]: ewm = s.head(1).ewm(com=0.5)
# Prime the first resulting state
In [3]: ewm.mean(engine="numba")
Out[3]:
0 0.0
dtype: float64
In [4]: ewm.mean(engine="numba", update=s.iloc[1:4])
Out[4]:
1 0.750000
2 1.615385
3 1.615385
dtype: float64
In [5]: ewm.mean(engine="numba", update=s.tail(1))
Out[5]:
1 3.670213
dtype: float64
# Clear the latest ewm state
In [6]: ewm.reset_update()
# Re-prime the first resulting state
In [7]: ewm.mean(engine="numba")
Out[7]:
0 0.0
dtype: float64
In [8]: ewm.mean(engine="numba", update=s.tail(4))
Out[8]:
1 0.750000
2 1.615385
3 1.615385
4 3.670213
dtype: float64
- A new
online
method inewm
that makes a newOnlineExponentialMovingWindow
object. The aggregations methods on this object then support anupdate
keyword.
In [6]: s = pd.Series([0, 1, 2, np.nan, 4])
# This will raise if numba is not installed
In [7]: ewm_online = s.head(1).ewm(com=0.5).online()
In [8]: ewm_online
Out[8]: OnlineExponentialMovingWindow [com=0.5,min_periods=1,adjust=True,ignore_na=False,axis=0]
# Prime the first resulting state
In [10]: ewm_online.mean()
Out[10]:
0 0.0
dtype: float64
In [11]: ewm_online.mean(update=s.iloc[1:4])
Out[11]:
1 0.750000
2 1.615385
3 1.615385
dtype: float64
# Clear the latest ewm state
In [12]: ewm_online.reset()
# Re-prime the first resulting state
In [13]: ewm_online.mean()
Out[13]:
0 0.0
dtype: float64
In [14]: ewm_online.mean(update=s.tail(4))
Out[14]:
1 0.750000
2 1.615385
3 1.615385
4 3.670213
dtype: float64
Considered API Options
- Create a new object per windowing method + aggregation, but this would lead to too many individual classes
- Use streamz as an optional dependency for online calculations. However, streamz has a full API workflow adopted for this operation which may be difficult to reuse. Also, we would ideally want this to start as a numba accelerated operation cc @mrocklin
@pandas-dev/pandas-core
Metadata
Metadata
Assignees
Labels
EnhancementNeeds DiscussionRequires discussion from core team before further actionRequires discussion from core team before further actionWindowrolling, ewma, expandingrolling, ewma, expanding