Skip to content

PERF: Series.transform speedups (GH6496) #7421

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Jun 11, 2014

Conversation

jreback
Copy link
Contributor

@jreback jreback commented Jun 10, 2014

closes #6496

turns out indexing into an array rather than building it up as a list
using concat is faster (but have to be careful of type changes).

# this PR
In [11]: %timeit df['signal'].groupby(g).transform(np.mean)
10 loops, best of 3: 158 ms per loop

# master
In [11]: %timeit df['signal'].groupby(g).transform(np.mean)
1 loops, best of 3: 601 ms per loop
In [1]: np.random.seed(0)

In [2]: N = 120000

In [3]: N_TRANSITIONS = 1400

In [5]: transition_points = np.random.permutation(np.arange(N))[:N_TRANSITIONS]

In [6]: transition_points.sort()

In [7]: transitions = np.zeros((N,), dtype=np.bool)

In [8]: transitions[transition_points] = True

In [9]: g = transitions.cumsum()

In [10]: df = DataFrame({ 'signal' : np.random.rand(N)})

@jreback jreback added this to the 0.14.1 milestone Jun 10, 2014
jreback added a commit that referenced this pull request Jun 11, 2014
PERF: Series.transform speedups (GH6496)
@jreback jreback merged commit 53c6b08 into pandas-dev:master Jun 11, 2014
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Groupby Performance Memory or execution speed performance
Projects
None yet
Development

Successfully merging this pull request may close these issues.

PERF: transform speedup
1 participant