SlideShare a Scribd company logo
Python and HDF5
Andrew Collette
University of Colorado
What makes scientific data special?
What makes scientific data special?

It’s meant to be shared - collaborative
Ad-hoc or changing structure - flexible
Archived and preserved - robust

Python and HDF5 together address all three
High-level language
Fully object-oriented
Almost no “boilerplate” code

Readable
Free
(the language)
“Exception” error handling

Self-documenting

First-class module/namespace support
(the platform)

Mature numerical, plotting and scientific modules
Hundreds of specialized science packages
Thousands more general-purpose
Python itself is “batteries included”
Core analysis packages
NumPy - Array objects and basic operations
SciPy - Advanced science & engineering library
Matplotlib - Publication-quality plots

(both rendered and interactive)
Thousands of others
Distribution - distutils/pip single-command installs
Unit testing - unittest module in stdlib
Interface: F2PY (Fortran), Cython (C), ctypes, others
Web servers and development - literally hundreds
Only need to write code for your problem
Python highlights
Readable
Iteration
C

IDL

Python
Speed
Speed
FFTs and optimized routines built in to NumPy/Scipy
Speed
FFTs and optimized routines built in to NumPy/Scipy
ctypes and Cython
ctypes
Advanced foreign function interface
Call C libraries from pure Python code
Cython
Example from the HDF5 C Library:
HDF5
HDF5
Hierarchical Data Format
3 things:
File specification and object model
C library
Ecosystem of users and developers
Objects
Datasets - Homogenous arrays of data
Groups: containers holding datasets and groups
Attributes: arbitrary metadata on groups & datasets

Standard constructs using these, or make your own!
Dataset features
Partial I/O: read and write just what you want
(In Python, we even use the array-access syntax!)
Automatic type conversion
On-the-fly compression
Parallel reads & writes with MPI
(Directly from Python!)
Metadata & Organization
Groups form a POSIX-style “filesystem” in the file
Attributes can store arbitrary data on arbitrary objects
How should the file be organized?
You decide!
!

Thousands of domain-specific “application formats”
Anyone can read them because HDF5 is self-describing!
Example
Open an HDF5 file
Extract a particular dataset
Read the data
Make an interactive plot
Close the file
Open an HDF5 file
Extract a particular dataset
Read the data
Make an interactive plot
Close the file
Open an HDF5 file
Extract a particular dataset
Read the data
Make an interactive plot
Close the file
Open an HDF5 file
Extract a particular dataset
Read the data
Make an interactive plot
Close the file
Open an HDF5 file
Extract a particular dataset
Read the data
Make an interactive plot
Close the file
Open an HDF5 file
Extract a particular dataset
Read the data
Make an interactive plot
Close the file
Demo
Real-world use
UCLA Large Plasma Device
UCLA Large Plasma Device

Image credit: Basic Plasma Science Facility
Laser Experiment

Image credit: Basic Plasma Science Facility
LAPD Data Products
Acquisition file - “Planes” of data in HDF5
Metadata:

timestamps, digitizer settings, probe positions,
background plasma conditions…
Packaged into HDF5 following “lab layout”
Users take their data back home and analyze
Visualization
Python 2D plotting

A. Collette et al. Phys. Rev. Lett 105, 195003 (2010)
Only 160 lines of code!

A. Collette et al. Phys. Rev. Lett 105, 195003 (2010)
Python does 3D too!
“MayaVi” 3D visualizer
Development sponsored
by Enthought
Both offline (scripted) and
interactive modes

A. Collette et al. Phys. Plasmas 18, 055705 (2011)
CU Accelerator
CU Accelerator
CU Accelerator
CU Accelerator
CU Accelerator
Raw data

HDF5 Shot file
Automated
speed/mass
calculation

Data search
HDF5 file for user

MySQL
Where to get Python
Where to get Python
Distributions are the best way to get started
(they include HDF5/h5py!)
Anaconda (Windows, Mac, Linux):
http://continuum.io
PythonXY (Windows)
http://pythonxy.googlecode.com
Questions?

More Related Content

PPT
The Python Programming Language and HDF5: H5Py
PPT
Substituting HDF5 tools with Python/H5py scripts
PDF
Hdf5 is for Lovers (PyData SV 2013)
PPT
Using HDF5 and Python: The H5py module
PPTX
Introduction to HDF5 Data and Programming Models
PPTX
The Python Programming Language and HDF5: H5Py
Substituting HDF5 tools with Python/H5py scripts
Hdf5 is for Lovers (PyData SV 2013)
Using HDF5 and Python: The H5py module
Introduction to HDF5 Data and Programming Models

What's hot (20)

PPTX
Adding CF Attributes to an HDF5 File
PPT
HDF5 Advanced Topics - Datatypes and Partial I/O
PPT
Introduction to HDF5 Data Model, Programming Model and Library APIs
PPT
Projection Indexes for HDF5 Datasets
PPTX
Interoperability with netCDF-4 - Experience with NPP and HDF-EOS5 products
PPTX
HDF Group Support for NPP/NPOESS/JPSS
PPT
PPSX
NASA HDF/HDF-EOS Data for Dummies (and Developers)
PPTX
HDF4 Mapping Project Update
PPTX
Tools to improve the usability of NASA HDF Data
PPT
Digital Object Identifiers for EOSDIS data
PDF
Introduction to HDF5 Data Model, Programming Model and Library APIs
PPSX
NASA HDF/HDF-EOS Data Access Challenges
PPTX
Democratizing Big Semantic Data management
Adding CF Attributes to an HDF5 File
HDF5 Advanced Topics - Datatypes and Partial I/O
Introduction to HDF5 Data Model, Programming Model and Library APIs
Projection Indexes for HDF5 Datasets
Interoperability with netCDF-4 - Experience with NPP and HDF-EOS5 products
HDF Group Support for NPP/NPOESS/JPSS
NASA HDF/HDF-EOS Data for Dummies (and Developers)
HDF4 Mapping Project Update
Tools to improve the usability of NASA HDF Data
Digital Object Identifiers for EOSDIS data
Introduction to HDF5 Data Model, Programming Model and Library APIs
NASA HDF/HDF-EOS Data Access Challenges
Democratizing Big Semantic Data management
Ad

Similar to Python and HDF5: Overview (20)

PDF
Introduction to HDF5 Data Model, Programming Model and Library APIs
PDF
Parallel HDF5 Introductory Tutorial
PDF
Module net cdf4
PPT
Fedora Overview
PPTX
HDF Update for DAAC Managers (2017-02-27)
PPT
Hdf5 intro
PPT
Using HDF5 tools for performance tuning and troubleshooting
PDF
Enhancing Domain Specific Language Implementations Through Ontology
PPTX
Hdf Augmentation: Interoperability in the Last Mile
PPT
Ensuring Long Term Access to Remotely Sensed HDF4 Data with Layout Maps
PPTX
Hdf5 parallel
PPT
HDF Status and Development
PPTX
Upcoming New HDF5 Features: Multi-threading, sparse data storage, and encrypt...
PPTX
247th ACS Meeting: The Eureka Research Workbench
PDF
Hopsworks at Google AI Huddle, Sunnyvale
PPTX
DAOS Middleware overview
Introduction to HDF5 Data Model, Programming Model and Library APIs
Parallel HDF5 Introductory Tutorial
Module net cdf4
Fedora Overview
HDF Update for DAAC Managers (2017-02-27)
Hdf5 intro
Using HDF5 tools for performance tuning and troubleshooting
Enhancing Domain Specific Language Implementations Through Ontology
Hdf Augmentation: Interoperability in the Last Mile
Ensuring Long Term Access to Remotely Sensed HDF4 Data with Layout Maps
Hdf5 parallel
HDF Status and Development
Upcoming New HDF5 Features: Multi-threading, sparse data storage, and encrypt...
247th ACS Meeting: The Eureka Research Workbench
Hopsworks at Google AI Huddle, Sunnyvale
DAOS Middleware overview
Ad

Recently uploaded (20)

PPTX
Programs and apps: productivity, graphics, security and other tools
PDF
Blue Purple Modern Animated Computer Science Presentation.pdf.pdf
PDF
gpt5_lecture_notes_comprehensive_20250812015547.pdf
PPTX
Group 1 Presentation -Planning and Decision Making .pptx
PPTX
SOPHOS-XG Firewall Administrator PPT.pptx
PPTX
Tartificialntelligence_presentation.pptx
PDF
Getting Started with Data Integration: FME Form 101
PPTX
20250228 LYD VKU AI Blended-Learning.pptx
PDF
Approach and Philosophy of On baking technology
PPTX
A Presentation on Artificial Intelligence
PDF
Profit Center Accounting in SAP S/4HANA, S4F28 Col11
PPTX
Digital-Transformation-Roadmap-for-Companies.pptx
PDF
Advanced methodologies resolving dimensionality complications for autism neur...
PPTX
Spectroscopy.pptx food analysis technology
PDF
cuic standard and advanced reporting.pdf
PDF
Spectral efficient network and resource selection model in 5G networks
PDF
Agricultural_Statistics_at_a_Glance_2022_0.pdf
PDF
Reach Out and Touch Someone: Haptics and Empathic Computing
PDF
Architecting across the Boundaries of two Complex Domains - Healthcare & Tech...
PDF
Dropbox Q2 2025 Financial Results & Investor Presentation
Programs and apps: productivity, graphics, security and other tools
Blue Purple Modern Animated Computer Science Presentation.pdf.pdf
gpt5_lecture_notes_comprehensive_20250812015547.pdf
Group 1 Presentation -Planning and Decision Making .pptx
SOPHOS-XG Firewall Administrator PPT.pptx
Tartificialntelligence_presentation.pptx
Getting Started with Data Integration: FME Form 101
20250228 LYD VKU AI Blended-Learning.pptx
Approach and Philosophy of On baking technology
A Presentation on Artificial Intelligence
Profit Center Accounting in SAP S/4HANA, S4F28 Col11
Digital-Transformation-Roadmap-for-Companies.pptx
Advanced methodologies resolving dimensionality complications for autism neur...
Spectroscopy.pptx food analysis technology
cuic standard and advanced reporting.pdf
Spectral efficient network and resource selection model in 5G networks
Agricultural_Statistics_at_a_Glance_2022_0.pdf
Reach Out and Touch Someone: Haptics and Empathic Computing
Architecting across the Boundaries of two Complex Domains - Healthcare & Tech...
Dropbox Q2 2025 Financial Results & Investor Presentation

Python and HDF5: Overview