SlideShare a Scribd company logo
Histogramming
Henry Schreiner
October 17, 2019
Overview
• Part 1: Overview of histograms
▶ Components of a Histogram
▶ Histograms in Python
▶ Boost.Histogram in C++14
▶ Introducing: boost-histogram for Python
▶ Outlook, with hist and aghast
• Part 2: Hands-on with boost-histogram
1/26Henry Schreiner Histogramming October 17, 2019
What is a histogram?
• A histogram is a set of accumulators over data in ranges
▶ Usually continuous in Physics, could also be categories
▶ Accumulators often are a sum of values - can contain other components
• Input values are digitized by axes (AKA binnings)
▶ Categories
▶ Real values
▶ Variable sized bins (usually give edges)
▶ Regular binning (#bins, start, stop)
▶ May have special features (overflow, circular, etc.)
2/26Henry Schreiner Histogramming October 17, 2019
Histogram components
A ‘histogram is a
collection of 1+ axes
and an accumulator.
Performance
• Variable axis - list
of edges is most
general but
requires a sorted
search.
• Regular axis:
regular spacing
Regular axis
Variable axis
axes
Optional overflowOptional underflow
Accumulator
3/26Henry Schreiner Histogramming October 17, 2019
Histograms in (classic) PyROOT
• 1D Regular
h = ROOT.TH1D("", "", 10, 0, 1)
h.fillN(arr)
• 1D Variable
h = ROOT.TH1D("", "", (1,2,3,4,5,6))
h.fillN(arr)
• 2D Regular
h = ROOT.TH2D("", "", 10, 0, 1, 20, 0, 2)
h.fillN(arr)
4/26Henry Schreiner Histogramming October 17, 2019
Histogram in Numpy
• 1D Regular
bins, edges = np.histogram(arr, bins=10, range=(0,1))
• 1D Variable
bins, edges = np.histogram(arr, bins=(1,2,3,4,5,6))
• 2D regular
b, e1, e2 = np.histogram2d(x, y, bins=(10,20), range=((0,1),(0,2)))
5/26Henry Schreiner Histogramming October 17, 2019
Numpy Pros and Cons
Pros
• Comes with Numpy
• Good for interactive operations (auto
binning)
• Reasonably fast
• Density option, weight support too
Cons
• Manipulation of plain arrays
• One time fill
• 2D+ not optimized for regular binning
• 1D, 2D, and ND syntax variations
• MPL had to mimic: plt.hist
6/26Henry Schreiner Histogramming October 17, 2019
PyROOT Pros and Cons
Pros
• Full histogram object
• Iterative fill option
• Weights option
• Can track sum of weights too
Cons
• ROOT requirement (Conda-forge helps)
• Can be slow in Python (and C++)
• Poor interactive exploration
• Odd syntax, odd memory model
• Max 3D
7/26Henry Schreiner Histogramming October 17, 2019
Histogram Libraries
• Narrow focus: speed,
plotting, or language
• Many are abandoned
• Often issues with design,
backends, distribution
• No/little interaction
HistBook
Histogrammar
pygram11
rootplotlib
PyROOT
YODA
physt
fast-histogramqhist
Vaex
hdrhistogram
multihist
matplotlib-hep
pyhistogram
histogram
SimpleHist
paida
theodoregoetz
numpy
8/26Henry Schreiner Histogramming October 17, 2019
Physt
• Histograms as objects
• Pure Python - Dropped Python 2 this
year :)
• Very slow fills (slower than numpy)
hist = histogram(heights)
hist.plot(show_values=True)
• Powerful plotting
• Easy conversion to Pandas and many
more (ROOT through uproot)
• Special histograms, like polar histograms Figure 1: Physt example default plot
9/26Henry Schreiner Histogramming October 17, 2019
Fast-Histogram
• Exactly like numpy, but faster
▶ C kernel
▶ Takes advantage of regular binning
▶ Can be 20-25x faster for 2D histograms
▶ Missing some features / combinations
Figure 2: Fast Histogram 2d comparison with
Numpy
10/26Henry Schreiner Histogramming October 17, 2019
HistBook (archived)
The first Scikit-HEP library for histograms
• Designed for shared axis histogram collections
• Plotting with Vega-Light
Now deprecated and in archive mode, functionality may return in Hist (see next slides).
>>> array = np.random.normal(0, 1, 1000000)
>>> histogram = Hist(bin("data", 10, -5, 5))
>>> histogram.fill(data=array)
>>> histogram.step("data").to(canvas)
11/26Henry Schreiner Histogramming October 17, 2019
SciKit-HEP Histogramming plan
• boost-histogram: Fast filling and manipulation (core library)
• hist: Simple analysis frontend
• aghast: Conversions between histogram libraries
• UHI: Unified Histogram Indexing: A way for histograms to be indexed cross-library
(boost-histogram and hist to begin with)
Core histogramming libraries boost-histogram ROOT
Universal adaptor Aghast
Front ends (plotting, etc) hist mpl-hep physt others
12/26Henry Schreiner Histogramming October 17, 2019
Boost.Histogram C++14
• Multidimensional templated header-only histogram library: /boostorg/histogram
• Designed by Hans Dembinski, inspired by ROOT and GSL
Histogram
• Axes
• Storage
Axes types
• Regular, Circular
• Variable
• Integer
• Category
Storage (
Static
Dynamic
)Regular axis
Regular axis with
log transformaxes
Optional overflowOptional underflow
Accumulator
int, double,
unlimited, ...
13/26Henry Schreiner Histogramming October 17, 2019
Boost.Histogram example
#include <boost/histogram.hpp>
#include <boost/histogram/ostream.hpp>
#include <random>
int main() {
namespace bh = boost::histogram;
auto hist = bh::make_histogram(bh::axis::regular<>{20, -3, 3});
std::default_random_engine eng;
std::normal_distribution<double> dist(0, 1);
for(int n = 0; n < 10'000; ++n)
hist(dist(eng));
std::cout << hist << std::endl;
return 0;
}
14/26Henry Schreiner Histogramming October 17, 2019
Boost.Histogram example (output)
histogram(regular(20, -3, 3, options=underflow | overflow))
+----------------------------------------------------------
+
[-inf, -3) 9 | |
[ -3, -2.7) 19 |= |
[-2.7, -2.4) 36 |== |
[-2.4, -2.1) 110 |===== |
[-2.1, -1.8) 191 |========= |
[-1.8, -1.5) 275 |============= |
[-1.5, -1.2) 518 |========================= |
[-1.2, -0.9) 644 |=============================== |
[-0.9, -0.6) 914 |============================================ |
[-0.6, -0.3) 1107 |===================================================== |
[-0.3, 0) 1183 |========================================================= |
[ 0, 0.3) 1185 |========================================================= |
[ 0.3, 0.6) 1120 |====================================================== |
[ 0.6, 0.9) 874 |========================================== |
[ 0.9, 1.2) 663 |================================ |
[ 1.2, 1.5) 491 |======================== |
[ 1.5, 1.8) 322 |=============== |
[ 1.8, 2.1) 172 |======== |
[ 2.1, 2.4) 79 |==== |
[ 2.4, 2.7) 38 |== |
[ 2.7, 3) 28 |= |
[ 3, inf) 22 |= |
+----------------------------------------------------------
+
15/26Henry Schreiner Histogramming October 17, 2019
boost-histogram: Python bindings
Design
• A histogram should be an object
• Manipulation and plotting should be easy
Performance
• Fast filling
• Compiled composable manipulations
Flexibility
• Axes options: sparse, growing, labels
• Storage: integers, weights, errors…
Distribution
• Easy to use anywhere, pip or conda
• Should have wheels, be easy to build, etc.
16/26Henry Schreiner Histogramming October 17, 2019
Intro to the Python bindings
• Boost.Histogram developed with Python in mind
• Original bindings based on Boost::Python
▶ Hard to build and distribute
▶ Somewhat limited
• New bindings: /scikit-hep/boost-histogram
▶ 0-dependency build (C++14 only)
▶ State-of-the-art PyBind11
Design Flexibility Speed Distribution
17/26Henry Schreiner Histogramming October 17, 2019
Design
• 500+ unit tests run on Azure on Linux, macOS, and Windows
Resembles the original Boost.Histogram where possible, with changes where needed for Python
performance and idioms.
C++14
#include <boost/histogram.hpp>
namespace bh = boost::histogram;
auto hist = bh::make_histogram(
bh::axis::regular<>{2, 0, 1, "x"},
bh::axis::regular<>{4, 0, 1, "y"});
hist(.2, .3); // Fill will also be
hist(.4, .5); // availble in 1.7.2
hist(.3, .2);
Python
import boost.histogram as bh
hist = bh.histogram(
bh.axis.regular(2, 0, 1, metadata="x"),
bh.axis.regular(4, 0, 1, metadata="y"))
hist.fill(
[.2, .4, .3],
[.3, .5, .2])
18/26Henry Schreiner Histogramming October 17, 2019
Design: Manipulations
Combine two histograms
hist1 + hist2
Scale a histogram
hist * 2.0
Sum a histogram contents
hist.sum()
Access an axis
ax = hist.axis(0)
ax.edges # The edges array
ax.centers # Centers of bins
ax.widths # Width of each bin
Fill 2D histogram with values or arrays
hist.fill(x, y)
Convert contents to Numpy array
hist.view()
Convert to Numpy style histogram tuple
hist.to_numpy()
Pickle supported (multiprocessing)
pickle.dumps(hist, -1)
Copy/deepcopy supported
hist2 = copy.deepcopy(hist)
19/26Henry Schreiner Histogramming October 17, 2019
Unified Histogram Indexing (UHI)
The language here (bh.loc, etc) is defined in such a way that any library can provide them -
“Unified”.
Access
v = h[b] # Returns bin contents, indexed by bin number
v = h[bh.loc(b)] # Returns the bin containing the value
v = h[bh.underflow] # Underflow and overflow can be accessed with special tags
Setting
h[b] = v
h[bh.loc(b)] = v
h[bh.underflow] = v
20/26Henry Schreiner Histogramming October 17, 2019
Unified Histogram Indexing (UHI) (2)
h == h[:] # Slice over everything
h2 = h[a:b] # Slice of histogram (includes flow bins)
h2 = h[:b] # Leaving out endpoints is okay
h2 = h[bh.loc(v):] # Slices can be in data coordinates, too
h2 = h[::bh.project] # Sum an axis (name may change)
h2 = h[::bh.rebin(2)] # Modification operations (rebin)
h2 = h[a:b:bh.rebin(2)] # Modifications can combine with slices
h2 = h[a:b, ...] # Ellipsis work just like normal numpy
• Docs are here
• Description may move to a new repository
21/26Henry Schreiner Histogramming October 17, 2019
Performance
• Factor of 2 faster than 1D regular binning in Numpy 1.17
▶ Currently no specialization, just a 1D regular fill
▶ Could be optimized further
• Factor of 6-10 faster than 2D regular binning Numpy
22/26Henry Schreiner Histogramming October 17, 2019
Distribution
• We must provide excellent distribution.
▶ If anyone writes pip install boost-histogram and it fails, we have failed.
• Docker ManyLinux1 GCC 9.2: /scikit-hep/manylinuxgcc
• Used in /scikit-hep/iMinuit, see /scikit-hep/azure-wheel-helpers
Wheels
• manylinux1 32 and 64 bit, Py 2.7 &
3.5–3.7
• manylinux2010 64 bit, Py 2.7 & 3.5–3.8
• macOS 10.9+ 64 bit, Py 2.7 & 3.6–3.8
• Windows 32 and 64 bit, Py 2.7 & 3.6–3.7
Source
• SDist
• Build directly from GitHub
Conda
• conda-forge package planned
python -m pip install boost-histogram
# OR git+https://github.com/scikit-hep/boost-histogram.git@develop
23/26Henry Schreiner Histogramming October 17, 2019
Hist
hist is the ‘wrapper’ piece that does plotting and interacts with the rest of the ecosystem.
Plans
• Easy plotting adaptors (mpl-hep)
• Serialization formats via Aghast (ROOT, HDF5)
• Auto-multithreading
• Statistical functions (Like TEfficiency)
• Multihistograms (HistBook)
• Interaction with fitters (ZFit, GooFit, etc)
• Bayesian Blocks algorithm from SciKit-HEP
• Command line histograms for stream of numbers
Call for contributions
• What do you need?
• What do you want?
• What would you like?
Join in the development! This
should combine the best features
of other packages.
24/26Henry Schreiner Histogramming October 17, 2019
Aghast
Aghast is a histogramming library that does not fill histograms
and does not plot them.
• A memory format for histograms, like Apache Arrow
• Converts to and from other libraries
• Uses flatbuffers to hold histograms
• Indexing ideas inspired the UHI
Binnings
IntegerBinning • RegularBinning • HexagonalBinning • EdgesBinning • IrregularBinning •
CategoryBinning • SparseRegularBinning • FractionBinning • PredicateBinning •
VariationBinning
25/26Henry Schreiner Histogramming October 17, 2019
End of part 1
Now, we will go hands on with the first beta of boost-histogram!
Support
• Supported by IRIS-HEP, NSF OAC-1836650
26/26Henry Schreiner Histogramming October 17, 2019

More Related Content

PDF
CHEP 2019: Recent developments in histogram libraries
PDF
PyHEP 2019: Python 3.8
PDF
IRIS-HEP: Boost-histogram and Hist
PDF
DPF 2017: GPUs in LHCb for Analysis
PDF
IRIS-HEP Retreat: Boost-Histogram Roadmap
PDF
2019 IRIS-HEP AS workshop: Boost-histogram and hist
PDF
ACAT 2017: GooFit 2.0
PDF
DIANA: Recent developments in GooFit
CHEP 2019: Recent developments in histogram libraries
PyHEP 2019: Python 3.8
IRIS-HEP: Boost-histogram and Hist
DPF 2017: GPUs in LHCb for Analysis
IRIS-HEP Retreat: Boost-Histogram Roadmap
2019 IRIS-HEP AS workshop: Boost-histogram and hist
ACAT 2017: GooFit 2.0
DIANA: Recent developments in GooFit

What's hot (20)

PDF
Digital RSE: automated code quality checks - RSE group meeting
PDF
CHEP 2018: A Python upgrade to the GooFit package for parallel fitting
PDF
Pybind11 - SciPy 2021
PDF
ROOT 2018: iminuit and MINUIT2 Standalone
PDF
RDM 2020: Python, Numpy, and Pandas
PDF
2019 IRIS-HEP AS workshop: Particles and decays
PDF
PEARC17: Modernizing GooFit: A Case Study
PDF
Massively Parallel Processing with Procedural Python (PyData London 2014)
PPTX
Pig: Data Analysis Tool in Cloud
PPTX
Mixing C++ & Python II: Pybind11
PPT
Substituting HDF5 tools with Python/H5py scripts
PPTX
Pypy is-it-ready-for-production-the-sequel
PPT
Using HDF5 and Python: The H5py module
PDF
GitRecruit final 1
PPTX
Adding CF Attributes to an HDF5 File
PDF
High scalable applications with Python
PDF
私は如何にして心配するのを止めてPyTorchを愛するようになったか
PDF
Pydata2017 11-29
PDF
確率的プログラミングライブラリEdward
Digital RSE: automated code quality checks - RSE group meeting
CHEP 2018: A Python upgrade to the GooFit package for parallel fitting
Pybind11 - SciPy 2021
ROOT 2018: iminuit and MINUIT2 Standalone
RDM 2020: Python, Numpy, and Pandas
2019 IRIS-HEP AS workshop: Particles and decays
PEARC17: Modernizing GooFit: A Case Study
Massively Parallel Processing with Procedural Python (PyData London 2014)
Pig: Data Analysis Tool in Cloud
Mixing C++ & Python II: Pybind11
Substituting HDF5 tools with Python/H5py scripts
Pypy is-it-ready-for-production-the-sequel
Using HDF5 and Python: The H5py module
GitRecruit final 1
Adding CF Attributes to an HDF5 File
High scalable applications with Python
私は如何にして心配するのを止めてPyTorchを愛するようになったか
Pydata2017 11-29
確率的プログラミングライブラリEdward
Ad

Similar to PyHEP 2019: Python Histogramming Packages (20)

PPTX
Pronto raster v3
PDF
Data Structures Handling Trillions of Daily Streaming Events by Evan Chan
PDF
[2D3]TurboGraph- Ultrafast graph analystics engine for billion-scale graphs i...
PPTX
Presentation.pptx
PPTX
Presentation.pptx
PDF
Mapreduce Algorithms
PPT
PDF
HyperLogLog in Hive - How to count sheep efficiently?
PDF
C++ tutorial boost – 2013
PDF
Approximate methods for scalable data mining (long version)
PDF
Beyond EXPLAIN: Query Optimization From Theory To Code
PDF
Web Data Engineering - A Technical Perspective on Web Archives
PDF
Exceeding Classical: Probabilistic Data Structures in Data Intensive Applicat...
PDF
Fletcher Framework for Programming FPGA
PDF
Everything You Always Wanted to Know About Memory in Python But Were Afraid t...
PDF
Take advantage of C++ from Python
PDF
Postgres в основе вашего дата-центра, Bruce Momjian (EnterpriseDB)
PDF
Whippet: A new production embeddable garbage collector for Guile
PDF
Graph operations in Git version control system
PPTX
Python in geospatial analysis
Pronto raster v3
Data Structures Handling Trillions of Daily Streaming Events by Evan Chan
[2D3]TurboGraph- Ultrafast graph analystics engine for billion-scale graphs i...
Presentation.pptx
Presentation.pptx
Mapreduce Algorithms
HyperLogLog in Hive - How to count sheep efficiently?
C++ tutorial boost – 2013
Approximate methods for scalable data mining (long version)
Beyond EXPLAIN: Query Optimization From Theory To Code
Web Data Engineering - A Technical Perspective on Web Archives
Exceeding Classical: Probabilistic Data Structures in Data Intensive Applicat...
Fletcher Framework for Programming FPGA
Everything You Always Wanted to Know About Memory in Python But Were Afraid t...
Take advantage of C++ from Python
Postgres в основе вашего дата-центра, Bruce Momjian (EnterpriseDB)
Whippet: A new production embeddable garbage collector for Guile
Graph operations in Git version control system
Python in geospatial analysis
Ad

More from Henry Schreiner (20)

PDF
SciPy 2025 - Packaging a Scientific Python Project
PDF
Tools That Help You Write Better Code - 2025 Princeton Software Engineering S...
PDF
Princeton RSE: Building Python Packages (+binary)
PDF
Tools to help you write better code - Princeton Wintersession
PDF
Learning Rust with Advent of Code 2023 - Princeton
PDF
The two flavors of Python 3.13 - PyHEP 2024
PDF
Modern binary build systems - PyCon 2024
PDF
Software Quality Assurance Tooling - Wintersession 2024
PDF
Princeton RSE Peer network first meeting
PDF
Software Quality Assurance Tooling 2023
PDF
Princeton Wintersession: Software Quality Assurance Tooling
PDF
What's new in Python 3.11
PDF
Everything you didn't know you needed
PDF
SciPy22 - Building binary extensions with pybind11, scikit build, and cibuild...
PDF
SciPy 2022 Scikit-HEP
PDF
PyCon 2022 -Scikit-HEP Developer Pages: Guidelines for modern packaging
PDF
PyCon2022 - Building Python Extensions
PDF
boost-histogram / Hist: PyHEP Topical meeting
PDF
CMake best practices
PDF
HOW 2019: Machine Learning for the Primary Vertex Reconstruction
SciPy 2025 - Packaging a Scientific Python Project
Tools That Help You Write Better Code - 2025 Princeton Software Engineering S...
Princeton RSE: Building Python Packages (+binary)
Tools to help you write better code - Princeton Wintersession
Learning Rust with Advent of Code 2023 - Princeton
The two flavors of Python 3.13 - PyHEP 2024
Modern binary build systems - PyCon 2024
Software Quality Assurance Tooling - Wintersession 2024
Princeton RSE Peer network first meeting
Software Quality Assurance Tooling 2023
Princeton Wintersession: Software Quality Assurance Tooling
What's new in Python 3.11
Everything you didn't know you needed
SciPy22 - Building binary extensions with pybind11, scikit build, and cibuild...
SciPy 2022 Scikit-HEP
PyCon 2022 -Scikit-HEP Developer Pages: Guidelines for modern packaging
PyCon2022 - Building Python Extensions
boost-histogram / Hist: PyHEP Topical meeting
CMake best practices
HOW 2019: Machine Learning for the Primary Vertex Reconstruction

Recently uploaded (20)

PDF
Video forgery: An extensive analysis of inter-and intra-frame manipulation al...
PDF
Approach and Philosophy of On baking technology
PDF
Mobile App Security Testing_ A Comprehensive Guide.pdf
PPTX
Tartificialntelligence_presentation.pptx
PDF
Network Security Unit 5.pdf for BCA BBA.
PPTX
Machine Learning_overview_presentation.pptx
PPTX
Programs and apps: productivity, graphics, security and other tools
PDF
Profit Center Accounting in SAP S/4HANA, S4F28 Col11
PDF
Spectral efficient network and resource selection model in 5G networks
PDF
Build a system with the filesystem maintained by OSTree @ COSCUP 2025
PDF
Architecting across the Boundaries of two Complex Domains - Healthcare & Tech...
PDF
Empathic Computing: Creating Shared Understanding
PDF
A comparative analysis of optical character recognition models for extracting...
PDF
Reach Out and Touch Someone: Haptics and Empathic Computing
PPTX
OMC Textile Division Presentation 2021.pptx
PDF
Encapsulation_ Review paper, used for researhc scholars
PDF
August Patch Tuesday
PDF
Heart disease approach using modified random forest and particle swarm optimi...
PPTX
Spectroscopy.pptx food analysis technology
PDF
Agricultural_Statistics_at_a_Glance_2022_0.pdf
Video forgery: An extensive analysis of inter-and intra-frame manipulation al...
Approach and Philosophy of On baking technology
Mobile App Security Testing_ A Comprehensive Guide.pdf
Tartificialntelligence_presentation.pptx
Network Security Unit 5.pdf for BCA BBA.
Machine Learning_overview_presentation.pptx
Programs and apps: productivity, graphics, security and other tools
Profit Center Accounting in SAP S/4HANA, S4F28 Col11
Spectral efficient network and resource selection model in 5G networks
Build a system with the filesystem maintained by OSTree @ COSCUP 2025
Architecting across the Boundaries of two Complex Domains - Healthcare & Tech...
Empathic Computing: Creating Shared Understanding
A comparative analysis of optical character recognition models for extracting...
Reach Out and Touch Someone: Haptics and Empathic Computing
OMC Textile Division Presentation 2021.pptx
Encapsulation_ Review paper, used for researhc scholars
August Patch Tuesday
Heart disease approach using modified random forest and particle swarm optimi...
Spectroscopy.pptx food analysis technology
Agricultural_Statistics_at_a_Glance_2022_0.pdf

PyHEP 2019: Python Histogramming Packages

  • 2. Overview • Part 1: Overview of histograms ▶ Components of a Histogram ▶ Histograms in Python ▶ Boost.Histogram in C++14 ▶ Introducing: boost-histogram for Python ▶ Outlook, with hist and aghast • Part 2: Hands-on with boost-histogram 1/26Henry Schreiner Histogramming October 17, 2019
  • 3. What is a histogram? • A histogram is a set of accumulators over data in ranges ▶ Usually continuous in Physics, could also be categories ▶ Accumulators often are a sum of values - can contain other components • Input values are digitized by axes (AKA binnings) ▶ Categories ▶ Real values ▶ Variable sized bins (usually give edges) ▶ Regular binning (#bins, start, stop) ▶ May have special features (overflow, circular, etc.) 2/26Henry Schreiner Histogramming October 17, 2019
  • 4. Histogram components A ‘histogram is a collection of 1+ axes and an accumulator. Performance • Variable axis - list of edges is most general but requires a sorted search. • Regular axis: regular spacing Regular axis Variable axis axes Optional overflowOptional underflow Accumulator 3/26Henry Schreiner Histogramming October 17, 2019
  • 5. Histograms in (classic) PyROOT • 1D Regular h = ROOT.TH1D("", "", 10, 0, 1) h.fillN(arr) • 1D Variable h = ROOT.TH1D("", "", (1,2,3,4,5,6)) h.fillN(arr) • 2D Regular h = ROOT.TH2D("", "", 10, 0, 1, 20, 0, 2) h.fillN(arr) 4/26Henry Schreiner Histogramming October 17, 2019
  • 6. Histogram in Numpy • 1D Regular bins, edges = np.histogram(arr, bins=10, range=(0,1)) • 1D Variable bins, edges = np.histogram(arr, bins=(1,2,3,4,5,6)) • 2D regular b, e1, e2 = np.histogram2d(x, y, bins=(10,20), range=((0,1),(0,2))) 5/26Henry Schreiner Histogramming October 17, 2019
  • 7. Numpy Pros and Cons Pros • Comes with Numpy • Good for interactive operations (auto binning) • Reasonably fast • Density option, weight support too Cons • Manipulation of plain arrays • One time fill • 2D+ not optimized for regular binning • 1D, 2D, and ND syntax variations • MPL had to mimic: plt.hist 6/26Henry Schreiner Histogramming October 17, 2019
  • 8. PyROOT Pros and Cons Pros • Full histogram object • Iterative fill option • Weights option • Can track sum of weights too Cons • ROOT requirement (Conda-forge helps) • Can be slow in Python (and C++) • Poor interactive exploration • Odd syntax, odd memory model • Max 3D 7/26Henry Schreiner Histogramming October 17, 2019
  • 9. Histogram Libraries • Narrow focus: speed, plotting, or language • Many are abandoned • Often issues with design, backends, distribution • No/little interaction HistBook Histogrammar pygram11 rootplotlib PyROOT YODA physt fast-histogramqhist Vaex hdrhistogram multihist matplotlib-hep pyhistogram histogram SimpleHist paida theodoregoetz numpy 8/26Henry Schreiner Histogramming October 17, 2019
  • 10. Physt • Histograms as objects • Pure Python - Dropped Python 2 this year :) • Very slow fills (slower than numpy) hist = histogram(heights) hist.plot(show_values=True) • Powerful plotting • Easy conversion to Pandas and many more (ROOT through uproot) • Special histograms, like polar histograms Figure 1: Physt example default plot 9/26Henry Schreiner Histogramming October 17, 2019
  • 11. Fast-Histogram • Exactly like numpy, but faster ▶ C kernel ▶ Takes advantage of regular binning ▶ Can be 20-25x faster for 2D histograms ▶ Missing some features / combinations Figure 2: Fast Histogram 2d comparison with Numpy 10/26Henry Schreiner Histogramming October 17, 2019
  • 12. HistBook (archived) The first Scikit-HEP library for histograms • Designed for shared axis histogram collections • Plotting with Vega-Light Now deprecated and in archive mode, functionality may return in Hist (see next slides). >>> array = np.random.normal(0, 1, 1000000) >>> histogram = Hist(bin("data", 10, -5, 5)) >>> histogram.fill(data=array) >>> histogram.step("data").to(canvas) 11/26Henry Schreiner Histogramming October 17, 2019
  • 13. SciKit-HEP Histogramming plan • boost-histogram: Fast filling and manipulation (core library) • hist: Simple analysis frontend • aghast: Conversions between histogram libraries • UHI: Unified Histogram Indexing: A way for histograms to be indexed cross-library (boost-histogram and hist to begin with) Core histogramming libraries boost-histogram ROOT Universal adaptor Aghast Front ends (plotting, etc) hist mpl-hep physt others 12/26Henry Schreiner Histogramming October 17, 2019
  • 14. Boost.Histogram C++14 • Multidimensional templated header-only histogram library: /boostorg/histogram • Designed by Hans Dembinski, inspired by ROOT and GSL Histogram • Axes • Storage Axes types • Regular, Circular • Variable • Integer • Category Storage ( Static Dynamic )Regular axis Regular axis with log transformaxes Optional overflowOptional underflow Accumulator int, double, unlimited, ... 13/26Henry Schreiner Histogramming October 17, 2019
  • 15. Boost.Histogram example #include <boost/histogram.hpp> #include <boost/histogram/ostream.hpp> #include <random> int main() { namespace bh = boost::histogram; auto hist = bh::make_histogram(bh::axis::regular<>{20, -3, 3}); std::default_random_engine eng; std::normal_distribution<double> dist(0, 1); for(int n = 0; n < 10'000; ++n) hist(dist(eng)); std::cout << hist << std::endl; return 0; } 14/26Henry Schreiner Histogramming October 17, 2019
  • 16. Boost.Histogram example (output) histogram(regular(20, -3, 3, options=underflow | overflow)) +---------------------------------------------------------- + [-inf, -3) 9 | | [ -3, -2.7) 19 |= | [-2.7, -2.4) 36 |== | [-2.4, -2.1) 110 |===== | [-2.1, -1.8) 191 |========= | [-1.8, -1.5) 275 |============= | [-1.5, -1.2) 518 |========================= | [-1.2, -0.9) 644 |=============================== | [-0.9, -0.6) 914 |============================================ | [-0.6, -0.3) 1107 |===================================================== | [-0.3, 0) 1183 |========================================================= | [ 0, 0.3) 1185 |========================================================= | [ 0.3, 0.6) 1120 |====================================================== | [ 0.6, 0.9) 874 |========================================== | [ 0.9, 1.2) 663 |================================ | [ 1.2, 1.5) 491 |======================== | [ 1.5, 1.8) 322 |=============== | [ 1.8, 2.1) 172 |======== | [ 2.1, 2.4) 79 |==== | [ 2.4, 2.7) 38 |== | [ 2.7, 3) 28 |= | [ 3, inf) 22 |= | +---------------------------------------------------------- + 15/26Henry Schreiner Histogramming October 17, 2019
  • 17. boost-histogram: Python bindings Design • A histogram should be an object • Manipulation and plotting should be easy Performance • Fast filling • Compiled composable manipulations Flexibility • Axes options: sparse, growing, labels • Storage: integers, weights, errors… Distribution • Easy to use anywhere, pip or conda • Should have wheels, be easy to build, etc. 16/26Henry Schreiner Histogramming October 17, 2019
  • 18. Intro to the Python bindings • Boost.Histogram developed with Python in mind • Original bindings based on Boost::Python ▶ Hard to build and distribute ▶ Somewhat limited • New bindings: /scikit-hep/boost-histogram ▶ 0-dependency build (C++14 only) ▶ State-of-the-art PyBind11 Design Flexibility Speed Distribution 17/26Henry Schreiner Histogramming October 17, 2019
  • 19. Design • 500+ unit tests run on Azure on Linux, macOS, and Windows Resembles the original Boost.Histogram where possible, with changes where needed for Python performance and idioms. C++14 #include <boost/histogram.hpp> namespace bh = boost::histogram; auto hist = bh::make_histogram( bh::axis::regular<>{2, 0, 1, "x"}, bh::axis::regular<>{4, 0, 1, "y"}); hist(.2, .3); // Fill will also be hist(.4, .5); // availble in 1.7.2 hist(.3, .2); Python import boost.histogram as bh hist = bh.histogram( bh.axis.regular(2, 0, 1, metadata="x"), bh.axis.regular(4, 0, 1, metadata="y")) hist.fill( [.2, .4, .3], [.3, .5, .2]) 18/26Henry Schreiner Histogramming October 17, 2019
  • 20. Design: Manipulations Combine two histograms hist1 + hist2 Scale a histogram hist * 2.0 Sum a histogram contents hist.sum() Access an axis ax = hist.axis(0) ax.edges # The edges array ax.centers # Centers of bins ax.widths # Width of each bin Fill 2D histogram with values or arrays hist.fill(x, y) Convert contents to Numpy array hist.view() Convert to Numpy style histogram tuple hist.to_numpy() Pickle supported (multiprocessing) pickle.dumps(hist, -1) Copy/deepcopy supported hist2 = copy.deepcopy(hist) 19/26Henry Schreiner Histogramming October 17, 2019
  • 21. Unified Histogram Indexing (UHI) The language here (bh.loc, etc) is defined in such a way that any library can provide them - “Unified”. Access v = h[b] # Returns bin contents, indexed by bin number v = h[bh.loc(b)] # Returns the bin containing the value v = h[bh.underflow] # Underflow and overflow can be accessed with special tags Setting h[b] = v h[bh.loc(b)] = v h[bh.underflow] = v 20/26Henry Schreiner Histogramming October 17, 2019
  • 22. Unified Histogram Indexing (UHI) (2) h == h[:] # Slice over everything h2 = h[a:b] # Slice of histogram (includes flow bins) h2 = h[:b] # Leaving out endpoints is okay h2 = h[bh.loc(v):] # Slices can be in data coordinates, too h2 = h[::bh.project] # Sum an axis (name may change) h2 = h[::bh.rebin(2)] # Modification operations (rebin) h2 = h[a:b:bh.rebin(2)] # Modifications can combine with slices h2 = h[a:b, ...] # Ellipsis work just like normal numpy • Docs are here • Description may move to a new repository 21/26Henry Schreiner Histogramming October 17, 2019
  • 23. Performance • Factor of 2 faster than 1D regular binning in Numpy 1.17 ▶ Currently no specialization, just a 1D regular fill ▶ Could be optimized further • Factor of 6-10 faster than 2D regular binning Numpy 22/26Henry Schreiner Histogramming October 17, 2019
  • 24. Distribution • We must provide excellent distribution. ▶ If anyone writes pip install boost-histogram and it fails, we have failed. • Docker ManyLinux1 GCC 9.2: /scikit-hep/manylinuxgcc • Used in /scikit-hep/iMinuit, see /scikit-hep/azure-wheel-helpers Wheels • manylinux1 32 and 64 bit, Py 2.7 & 3.5–3.7 • manylinux2010 64 bit, Py 2.7 & 3.5–3.8 • macOS 10.9+ 64 bit, Py 2.7 & 3.6–3.8 • Windows 32 and 64 bit, Py 2.7 & 3.6–3.7 Source • SDist • Build directly from GitHub Conda • conda-forge package planned python -m pip install boost-histogram # OR git+https://github.com/scikit-hep/boost-histogram.git@develop 23/26Henry Schreiner Histogramming October 17, 2019
  • 25. Hist hist is the ‘wrapper’ piece that does plotting and interacts with the rest of the ecosystem. Plans • Easy plotting adaptors (mpl-hep) • Serialization formats via Aghast (ROOT, HDF5) • Auto-multithreading • Statistical functions (Like TEfficiency) • Multihistograms (HistBook) • Interaction with fitters (ZFit, GooFit, etc) • Bayesian Blocks algorithm from SciKit-HEP • Command line histograms for stream of numbers Call for contributions • What do you need? • What do you want? • What would you like? Join in the development! This should combine the best features of other packages. 24/26Henry Schreiner Histogramming October 17, 2019
  • 26. Aghast Aghast is a histogramming library that does not fill histograms and does not plot them. • A memory format for histograms, like Apache Arrow • Converts to and from other libraries • Uses flatbuffers to hold histograms • Indexing ideas inspired the UHI Binnings IntegerBinning • RegularBinning • HexagonalBinning • EdgesBinning • IrregularBinning • CategoryBinning • SparseRegularBinning • FractionBinning • PredicateBinning • VariationBinning 25/26Henry Schreiner Histogramming October 17, 2019
  • 27. End of part 1 Now, we will go hands on with the first beta of boost-histogram! Support • Supported by IRIS-HEP, NSF OAC-1836650 26/26Henry Schreiner Histogramming October 17, 2019