Add benchmarks #1829

ColCarroll · 2017-02-26T23:26:52Z

Benchmarks would be useful for a number of reasons, the two most prominent being detecting performance regressions, and as a sort of end to end test. I have been mucking about with airspeed velocity this weekend, and have the start of a benchmark suite written for pymc3.

There's not great documentation on actually deploying this. The configuration file is largely copied from numpy's benchmarks. It looks like it is somewhat standard to keep the benchmarks with the project, and push results to a separate repo to host the static site.

From what I have read, we would still need:

A separate repo to push the benchmarks to (pymc-devs/pymc3-benchmarks would make sense). asv output is a nice static html page.
A dedicated(ish) machine to run benchmarks. See the discussions below. We (I?) could certainly run things manually for a while, but you don't want too many other tasks running at the same time.
A separate script to run a daily cron task, benchmarking the latest commits.

The most helpful github discussions I found were here about deploying sympy benchmarks on a raspberry pi (!), and this on getting a machine to run benchmarks for dask.

I'm running these particular benchmarks on the last 10 commits right now, and will try to get a sample site up soon.

ColCarroll · 2017-02-26T23:52:00Z

Two benchmarks so far: one is a parametrized example just testing overhead of sampling from a normal distribution with different samplers, the other is recreating this example

https://colcarroll.github.io/pymc3-benchmarks/

twiecki · 2017-02-27T09:56:36Z

This looks incredible. Can't wait to see where the performance regressions and speed-ups happened.

springcoil · 2017-03-13T22:12:07Z

While it's not completely true that this is beginner friendly, it's a task that someone with experience of adding benchmarks could help on. It's ok to remove beginner friendly if you think it's wrong though @ColCarroll

ColCarroll · 2017-03-21T17:12:36Z

Spent much of the weekend on this and couldn't quite get things to work:

had a bash script spin up a sizeable spot image on AWS,
set up a github deploy key for the benchmarks repo (which is separate from the main pymc3 repo)
clone and sync the benchmarks that have been run in the past
clone and install pymc3
run all new benchmarks
push new benchmark results to benchmark repo
shut down spot instance

My plan was to run this script once daily on as small of a machine as possible, so the $5/month machine is on all the time, and spawns the $120/month box for ~1hr/day.

I'm going to take some time away from this, but happy to share the work I've already done if someone is excited or has better AWS knowledge. One thing is that I think I will try to use a dedicated instance, since bidding on a spot instance was a bit... spotty.

ColCarroll · 2017-03-21T17:13:30Z

Removed "beginner friendly", if only for my pride.

kyleabeauchamp · 2017-03-21T17:16:06Z

We could just do things the old fashioned way and a commandline benchmark suite that users run and post on the Github wiki or issue. This would have the benefit of allowing people to benchmark various hardware configurations. E.g. the extra boilerplate for CI might not be necessary.

twiecki · 2017-03-22T10:42:56Z

@ColCarroll Sounds like at least some progress. Could you add your code to this PR if someone else wants to help with this?

As @kyleabeauchamp, we could always run this manually from time to time if AWS is too much trouble.

ColCarroll · 2017-03-22T12:44:45Z

That could be a good incremental step. I'll tidy up and comment the script.

…

On Wed, Mar 22, 2017 at 6:42 AM Thomas Wiecki ***@***.***> wrote: @ColCarroll <https://github.com/ColCarroll> Sounds like at least some progress. Could you add your code to this PR if someone else wants to help with this? As @kyleabeauchamp <https://github.com/kyleabeauchamp>, we could always run this manually from time to time if AWS is too much trouble. — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#1829 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/ACMHEI_GyP2w1hgH06QySSRpo_2eTa2Yks5roPsygaJpZM4MMkIl> .

TomAugspurger · 2017-05-31T18:41:50Z

FYI, I'm working to make the benchmark running and publishing easier.

The basic idea is that NumFOCUS will pay for a machine, and projects will share time on it. The benchmarks will be run daily and the results pushed to github pages. I'll have more information in the next couple of weeks (once it's built 😄 ). For now I'd say get a basic set of benchmarks you would like to have run written, and we'll get the running and publishing stuff sorted out later.

ColCarroll · 2017-09-29T14:51:51Z

Hey @TomAugspurger what's the best place to continue this conversation? Benchmarking will likely be more important for us while figuring out the right path with respect to theano...

TomAugspurger · 2017-09-29T14:54:26Z

@ColCarroll once they're ready let me know. Then I'll add pymc3 to https://github.com/TomAugspurger/asv-runner/blob/master/tests/full.yml and get them running on our machine.

ColCarroll · 2017-09-29T15:08:17Z

Sounds great! We'll probably land something similar to this soon, so that others can contribute test cases.

It looks as though our measurements will be somewhat grosser than others (O(60s), instead of O(1ms)) -- will try to keep the total suite time reasonable, but any thoughts you have would be welcome.

TomAugspurger · 2017-09-29T20:47:54Z

The machine is still idle for the majority of the day (pandas is the longest). I'll check in on it the first few days and see how things are going.

TomAugspurger · 2017-09-29T20:51:01Z

benchmarks/asv.conf.json

+    // If missing or the empty string, the tool will be automatically
+    // determined by looking for tools on the PATH environment
+    // variable.
+    "environment_type": "virtualenv",


Do the PyMC devs have a strong preference here? I haven't setup the benchmark server with virtualenv, so conda would be slightly easier for me. Though it probably isn't much work to get virtualenv working too.

I think virtualenv is my preference, everyone else uses conda, and I'm happy to go there as well!

ColCarroll · 2017-10-15T23:57:58Z

@TomAugspurger I think this is ready with at least a "starter" set of benchmarks -- let me know if it looks good and I can merge. I included a screenshot of the current set of benchmarks: two "case study" examples, and one just sampling from a normal distribution, parametrized across NUTS, HMC, Slice, and Metropolis. I only ran on one commit, so the line chart isn't much to see!

TomAugspurger · 2017-10-16T12:49:37Z

@ColCarroll cool. Feel free to merge whenever.

FYI, the runner is a bit backlogged at the moment, since there was a power outage last week and my auto-restart script had a bug. But I can still get it added to the queue whenever, and I'll let you know when things start running (probably tomorrow or Wednesday).

ColCarroll · 2017-10-16T13:42:35Z

Thanks -- let me know if I can help with this further (or when we can start seeing benchmarks!)

TomAugspurger · 2017-10-16T14:01:10Z

Great. I added it to the runner repo in asv-runner/asv-runner@9ce6c1a and am redeploying now.

I'll let you know when they're running. The results will be published to http://pandas.pydata.org/speed/pymc3. The HTML files generated from asv are uploaded to https://github.com/tomaugspurger/asv-collection, if you want to clone that host them yourselves somewhere.

TomAugspurger · 2017-10-17T16:15:24Z

They should be available on http://pandas.pydata.org/speed/pymc3/ now.

…

On Mon, Oct 16, 2017 at 8:42 AM, Colin ***@***.***> wrote: Thanks -- let me know if I can help with this further (or when we can start seeing benchmarks!) — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#1829 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/ABQHIiZgVeirnbgWe5sSFjxkv_U_Cv32ks5ss11XgaJpZM4MMkIl> .

* Add benchmark skeleton * Add another benchmark, switch to conda environment

ColCarroll · 2018-03-30T14:40:42Z

@TomAugspurger it looks like the site hasn't updated for any project since mid february, and most of our benchmark history is gone. do you know the right place to talk about this, and perhaps volunteer to help fix? it has been a super useful service!

TomAugspurger · 2018-03-30T14:51:49Z

Sorry about that, it was my fault. I was attempting to add email notifications on failures and (ironically) broke runner, and didn't notice until earlier this week. I thought I fixed it, but apparently not. Looking into it now.

More generally, I'm not really sure what the best way to manage this is, given everyone's time constraints and how poorly documented the setup it. I'm hoping that with the email notifications working properly, I'll be able to respond more quickly. It'd be good to get a more formal setup / documentation / access for the projects using the build machine.

ColCarroll · 2018-03-30T14:58:16Z

No problem at all! It seems like you're donating computer and devops time for no personal gain. Please follow up if there's some way you can offload some of that work.

TomAugspurger · 2018-03-30T15:10:20Z

Hopefully back up and running now.

ColCarroll force-pushed the add_benchmarks branch from 6ffdfe4 to 5f0117f Compare March 12, 2017 15:35

springcoil added the beginner friendly label Mar 13, 2017

ColCarroll removed the beginner friendly label Mar 21, 2017

TomAugspurger reviewed Sep 29, 2017

View reviewed changes

ColCarroll added 2 commits October 15, 2017 15:23

Add benchmark skeleton

a21fdc7

Add another benchmark, switch to conda environment

7d5daa2

ColCarroll force-pushed the add_benchmarks branch from 5f0117f to 7d5daa2 Compare October 15, 2017 23:52

ColCarroll merged commit c97a0b5 into pymc-devs:master Oct 16, 2017

ColCarroll deleted the add_benchmarks branch October 16, 2017 13:41

junpenglao mentioned this pull request Oct 18, 2017

Add benchmark models #2638

Closed

4 tasks

ColCarroll added a commit that referenced this pull request Nov 9, 2017

Add benchmarks (#1829)

62fd67f

* Add benchmark skeleton * Add another benchmark, switch to conda environment

junpenglao mentioned this pull request Nov 13, 2017

Add recent changes to RELEASE-NOTES #2713

Merged

Uh oh!

Add benchmarks #1829

Add benchmarks #1829

Uh oh!

Conversation

ColCarroll commented Feb 26, 2017

Uh oh!

ColCarroll commented Feb 26, 2017

Uh oh!

twiecki commented Feb 27, 2017

Uh oh!

springcoil commented Mar 13, 2017

Uh oh!

ColCarroll commented Mar 21, 2017

Uh oh!

ColCarroll commented Mar 21, 2017

Uh oh!

kyleabeauchamp commented Mar 21, 2017

Uh oh!

twiecki commented Mar 22, 2017

Uh oh!

ColCarroll commented Mar 22, 2017 via email

Uh oh!

TomAugspurger commented May 31, 2017

Uh oh!

ColCarroll commented Sep 29, 2017

Uh oh!

TomAugspurger commented Sep 29, 2017

Uh oh!

ColCarroll commented Sep 29, 2017

Uh oh!

TomAugspurger commented Sep 29, 2017

Uh oh!

TomAugspurger Sep 29, 2017

Choose a reason for hiding this comment

Uh oh!

ColCarroll Sep 29, 2017

Choose a reason for hiding this comment

Uh oh!

ColCarroll commented Oct 15, 2017

Uh oh!

TomAugspurger commented Oct 16, 2017

Uh oh!

ColCarroll commented Oct 16, 2017

Uh oh!

TomAugspurger commented Oct 16, 2017

Uh oh!

TomAugspurger commented Oct 17, 2017 via email

Uh oh!

ColCarroll commented Mar 30, 2018

Uh oh!

TomAugspurger commented Mar 30, 2018

Uh oh!

ColCarroll commented Mar 30, 2018

Uh oh!

TomAugspurger commented Mar 30, 2018

Uh oh!

Uh oh!