Python Data Science Handbook
Jake VanderPlas
This website contains the full text of the Python Data Science Handbook
(http://shop.oreilly.com/product/0636920034919.do) by Jake VanderPlas; the content is
available on GitHub (https://github.com/jakevdp/PythonDataScienceHandbook) in the form of
Jupyter notebooks.
The text is released under the CC-BY-NC-ND license (https://creativecommons.org/licenses/by-
nc-nd/3.0/us/legalcode), and code is released under the MIT license
(https://opensource.org/licenses/MIT).
If you find this content useful, please consider supporting the work by buying the book
(http://shop.oreilly.com/product/0636920034919.do)!
# Table of Contents
## Preface (00.00-preface.html)
## 1. IPython: Beyond Normal Python (01.00-ipython-beyond-
normal-python.html)
Help and Documentation in IPython (01.01-help-and-documentation.html)
Keyboard Shortcuts in the IPython Shell (01.02-shell-keyboard-shortcuts.html)
IPython Magic Commands (01.03-magic-commands.html)
Input and Output History (01.04-input-output-history.html)
IPython and Shell Commands (01.05-ipython-and-shell-commands.html)
Errors and Debugging (01.06-errors-and-debugging.html)
Profiling and Timing Code (01.07-timing-and-profiling.html)
More IPython Resources (01.08-more-ipython-resources.html)
## 2. Introduction to NumPy (02.00-introduction-to-
numpy.html)
Understanding Data Types in Python (02.01-understanding-data-types.html)
The Basics of NumPy Arrays (02.02-the-basics-of-numpy-arrays.html)
Computation on NumPy Arrays: Universal Functions (02.03-computation-on-arrays-
ufuncs.html)
Aggregations: Min, Max, and Everything In Between (02.04-computation-on-arrays-
aggregates.html)
Computation on Arrays: Broadcasting (02.05-computation-on-arrays-broadcasting.html)
Comparisons, Masks, and Boolean Logic (02.06-boolean-arrays-and-masks.html)
Fancy Indexing (02.07-fancy-indexing.html)
Sorting Arrays (02.08-sorting.html)
Structured Data: NumPy's Structured Arrays (02.09-structured-data-numpy.html)
## 3. Data Manipulation with Pandas (03.00-introduction-to-
pandas.html)
Introducing Pandas Objects (03.01-introducing-pandas-objects.html)
Data Indexing and Selection (03.02-data-indexing-and-selection.html)
Operating on Data in Pandas (03.03-operations-in-pandas.html)
Handling Missing Data (03.04-missing-values.html)
Hierarchical Indexing (03.05-hierarchical-indexing.html)
Combining Datasets: Concat and Append (03.06-concat-and-append.html)
Combining Datasets: Merge and Join (03.07-merge-and-join.html)
Aggregation and Grouping (03.08-aggregation-and-grouping.html)
Pivot Tables (03.09-pivot-tables.html)
Vectorized String Operations (03.10-working-with-strings.html)
Working with Time Series (03.11-working-with-time-series.html)
High-Performance Pandas: eval() and query() (03.12-performance-eval-and-query.html)
Further Resources (03.13-further-resources.html)
## 4. Visualization with Matplotlib (04.00-introduction-to-
matplotlib.html)
Simple Line Plots (04.01-simple-line-plots.html)
Simple Scatter Plots (04.02-simple-scatter-plots.html)
Visualizing Errors (04.03-errorbars.html)
Density and Contour Plots (04.04-density-and-contour-plots.html)
Histograms, Binnings, and Density (04.05-histograms-and-binnings.html)
Customizing Plot Legends (04.06-customizing-legends.html)
Customizing Colorbars (04.07-customizing-colorbars.html)
Multiple Subplots (04.08-multiple-subplots.html)
Text and Annotation (04.09-text-and-annotation.html)
Customizing Ticks (04.10-customizing-ticks.html)
Customizing Matplotlib: Configurations and Stylesheets (04.11-settings-and-
stylesheets.html)
Three-Dimensional Plotting in Matplotlib (04.12-three-dimensional-plotting.html)
Geographic Data with Basemap (04.13-geographic-data-with-basemap.html)
Visualization with Seaborn (04.14-visualization-with-seaborn.html)
Further Resources (04.15-further-resources.html)
## 5. Machine Learning (05.00-machine-learning.html)
What Is Machine Learning? (05.01-what-is-machine-learning.html)
Introducing Scikit-Learn (05.02-introducing-scikit-learn.html)
Hyperparameters and Model Validation (05.03-hyperparameters-and-model-
validation.html)
Feature Engineering (05.04-feature-engineering.html)
In Depth: Naive Bayes Classification (05.05-naive-bayes.html)
In Depth: Linear Regression (05.06-linear-regression.html)
In-Depth: Support Vector Machines (05.07-support-vector-machines.html)
In-Depth: Decision Trees and Random Forests (05.08-random-forests.html)
In Depth: Principal Component Analysis (05.09-principal-component-analysis.html)
In-Depth: Manifold Learning (05.10-manifold-learning.html)
In Depth: k-Means Clustering (05.11-k-means.html)
In Depth: Gaussian Mixture Models (05.12-gaussian-mixtures.html)
In-Depth: Kernel Density Estimation (05.13-kernel-density-estimation.html)
Application: A Face Detection Pipeline (05.14-image-features.html)
Further Machine Learning Resources (05.15-learning-more.html)
## Appendix: Figure Code (06.00-figure-code.html)