edgetier
COMPLEX DECISIONS SIMPLIFIED
Data Visualisation in Python
Quick and easy routes to plotting magic
Shane Lynn Ph.D.
@shane_a_lynn
www.edgetier.com | [email protected] | @TeamEdgeTier
edgetier Outline
COMPLEX DECISIONS SIMPLIFIED
• Data Visualisation Basics
• Basic Python Setup & Core Libraries
• Code examples and comparisons
• What to avoid
edgetier EdgeTier
COMPLEX DECISIONS SIMPLIFIED
EdgeTier specialise in data and artificial intelligence products for customer
contact centres.
Commercially focused
SaaS to increase revenue
and reduce costs
Focus on data science, AI system works alongside
machine learning, and customer service agents
automation to increase efficiency by
100%
edgetier
Data Visualisation
COMPLEX DECISIONS SIMPLIFIED
Data visualisation is a general term that describes any effort
to help people understand the significance of data by placing
it in a visual context.
edgetier
Data Visualisation
COMPLEX DECISIONS SIMPLIFIED
Choice of Data Visualisation Tool is important
Iteration speed
Un-intrusive
Flexible
Aesthetically pleasing
edgetier Chart Choice
COMPLEX DECISIONS SIMPLIFIED
Source: www.extremepresentation.com
edgetier
Chart Choice – Fearsome Foursome
COMPLEX DECISIONS SIMPLIFIED
BARPLOT HISTOGRAM SCATTER PLOT LINE CHART
Represents the An accurate Show the Shows the
value of entities graphical relationship evolution of
using bar of representation of between 2 numeric numeric variables.
various length. the distribution of variables.
numeric data.
Icons: www.data-to-viz.com
edgetier
Chart Choice – Fearsome Foursome
COMPLEX DECISIONS SIMPLIFIED
Special Mentions
BARPLOT HISTOGRAM
BARPLOT SCATTER PLOT LINE CHART
Represents the BOXPLOT Represents
An accurate SANKEY
the Show CHOROPLETH
the Shows the
value of entities
Summarizevalue
thegraphicalDIAGRAM relationship MAP
of entities evolution of
using bar of representation
distribution using bar
of of of flows
Showing between
with 2 numeric
Display annumeric variables.
various length. the
various
distribution
numeric variables length.of links variables.
smooth aggregated value for
numeric data. each region of a map
Icons: www.data-to-viz.com
edgetier
Data Visualisation in Python
COMPLEX DECISIONS SIMPLIFIED
Interactive
environment
Data
Manipulation
Python Visualisation Library
- Lots of choice of libraries
- Many tools, with varied APIs & outputs
- Best to conquer and become familiar Visualisation
Library
with one / two
seaborn
edgetier Matplotlib
COMPLEX DECISIONS SIMPLIFIED
Grand daddy of Python Plotting
Low level plotting library with
Matlab-like API
+ Very flexible, complete control
- Verbose plots, aesthetically lacking,
sometimes difficult with Pandas
...need to know enough to debug…
edgetier
Pandas / Seaborn / Altair
COMPLEX DECISIONS SIMPLIFIED
Higher level plotting
Pandas – Visualisation API built into
DataFrame & Series objects, interface
to Matplotlib.
Seaborn – extends and provides high-
level API on Matplotlib with
improved styling.
Altair – Built on “Vega-Lite”
visualisation grammar. Allows some
interactive plots in Jupyter
Notebooks.
edgetier
Basic Notebook Setup
COMPLEX DECISIONS SIMPLIFIED
Imports on Matplotlib
Top of notebook – inline
vs notebook style.
Theme also can be
chosen here
edgetier Sample Data
COMPLEX DECISIONS SIMPLIFIED
EdgeTier relevant sample dataset on chat system performance.
Agents answering customer chats from different websites and
languages – 5477 chats over 100 agents.
edgetier The Bar Plot
COMPLEX DECISIONS SIMPLIFIED
edgetier
The Bar Plot - Matplotlib
COMPLEX DECISIONS SIMPLIFIED
Bar plot of chats per user
Python visualisation libraries often require that the
data for plotting is pre-formatted for visualisation.
For Pandas and Matplotlib, the visualisation library
often only present the values, and does not do
calculations.
edgetier
The Bar Plot - Matplotlib
COMPLEX DECISIONS SIMPLIFIED
.bar() function does the work, manually position
‘x’ labels and positions.
Most code here is formatting and display.
edgetier
The Bar Plot - Pandas
COMPLEX DECISIONS SIMPLIFIED
Plot output is Matplotlib – same manipulation.
Slightly simpler API / data access.
edgetier
The Bar Plot - Seaborn
COMPLEX DECISIONS SIMPLIFIED
seaborn
Simpler data access again.
Same Matplotlib formatting functions
edgetier
The Bar Plot - Altair
COMPLEX DECISIONS SIMPLIFIED
Not Matplotlib-based – very different syntax and formatting.
Ordering was difficult here.
Only one command for everything. JSON format behind.
edgetier
The Bar Plot - Altair
COMPLEX DECISIONS SIMPLIFIED
Not Matplotlib-based – very different syntax and formatting.
Ordering was difficult here.
Only one command for everything. JSON format behind.
edgetier The Bar Plot
COMPLEX DECISIONS SIMPLIFIED
seaborn
edgetier
Prettier Pandas Plots
COMPLEX DECISIONS SIMPLIFIED
seaborn
Seaborn styles are applied to all matplotlib plots –
Cheat your way to nicer looking Pandas Plots!
edgetier
More Challenging Bar Plot
COMPLEX DECISIONS SIMPLIFIED
For the top 20 agents, what was the split of the
top websites?
We want a ‘stacked bar’ for this visualisation.
edgetier
Stacked Bar - Matplotlib
COMPLEX DECISIONS SIMPLIFIED
edgetier
Stacked Bar - Matplotlib
COMPLEX DECISIONS SIMPLIFIED
edgetier
Stacked Bar - Matplotlib
COMPLEX DECISIONS SIMPLIFIED
edgetier
Stacked Bar - Pandas
COMPLEX DECISIONS SIMPLIFIED
Plotting code is simple, but data manipulation
required.
edgetier
Stacked Bar - Seaborn
COMPLEX DECISIONS SIMPLIFIED
seaborn
Elegant API, simple code structure,
but …
…embarrassingly…
no stacked-bar chart support!
edgetier
Stacked Bar - Seaborn
COMPLEX DECISIONS SIMPLIFIED
seaborn
Elegant API, simple code structure,
but …
…embarrassingly…
no stacked-bar chart support!
edgetier
Stacked Bar - Altair
COMPLEX DECISIONS SIMPLIFIED
Simple output, short code.
Some issues around data storage,
JSON formats, and sorting is difficult.
edgetier
Seaborn - Estimators
COMPLEX DECISIONS SIMPLIFIED
seaborn
Calculations done as part of plotting – no
previous data manipulations.
Separation of data and visualisation code.
edgetier
Seaborn - Estimators
COMPLEX DECISIONS SIMPLIFIED
seaborn
Very simple to change estimator function to
calculate different statistics.
Similar functionality available in Altair
edgetier Histograms
COMPLEX DECISIONS SIMPLIFIED
edgetier Histograms
COMPLEX DECISIONS SIMPLIFIED
seaborn
All libraries
good at
univariate
distribution
visualisations.
edgetier Histograms
COMPLEX DECISIONS SIMPLIFIED
seaborn
edgetier Histograms
COMPLEX DECISIONS SIMPLIFIED
Layering / comparison achieved unfortunately
by building up the histograms in place.
edgetier
Histograms - Seaborn
COMPLEX DECISIONS SIMPLIFIED
seaborn
Some really nice options for
impressive and informative hints on
Seaborn graphs.
edgetier
Scatter Plots - Pandas
COMPLEX DECISIONS SIMPLIFIED
edgetier
Scatter Plots - Pandas
COMPLEX DECISIONS SIMPLIFIED
Pandas: Good for quick single-coloured scatter visualisations.
Messy with multiple categories.
edgetier
Scatter Plots - Pandas
COMPLEX DECISIONS SIMPLIFIED
Pandas: Good for quick single-coloured scatter visualisations.
Messy with multiple categories.
edgetier
Scatter Plots - Seaborn
COMPLEX DECISIONS SIMPLIFIED
Seaborn / Altair: Better higher level representation, and better
for multi-category scatters.
seaborn
edgetier
Scatter Plots - Altair
COMPLEX DECISIONS SIMPLIFIED
Seaborn / Altair: Better higher level representation, and better
for multi-category scatters.
edgetier Line Plots
COMPLEX DECISIONS SIMPLIFIED
edgetier Line Plots
COMPLEX DECISIONS SIMPLIFIED
Plot chats per language
over time
Pandas: Needs data
manipulation, simple
thereafter.
edgetier Line Plots
COMPLEX DECISIONS SIMPLIFIED
Pandas: Needs data
manipulation, simple
thereafter.
edgetier Line Plots
COMPLEX DECISIONS SIMPLIFIED
Pandas: Needs data
manipulation, simple
thereafter.
edgetier Line Plots
COMPLEX DECISIONS SIMPLIFIED
Seaborn/Altair:
Operate directly on seaborn
raw data
edgetier More Options!
COMPLEX DECISIONS SIMPLIFIED
Geospatial Viz
Folium: Generate interactive
maps using leaflet.js
Matplotlib: Basemap plugin
Interactive Plots
Bokeh: Makes visualisations for
web browser interaction.
Plotly: Online visualisations –
runs by default in cloud
edgetier
What to Avoid – Angles?
COMPLEX DECISIONS SIMPLIFIED
Pie Charts: Radial angle for comparison. Humans are very bad at
accurate radial comparisons – we’ve evolved for speedy length /
distance comparisons.
https://blog.funnel.io/why-we-dont-use-pie-charts-and-some-tips-on-
better-data-visualizations
edgetier
What to Avoid – Angles?
COMPLEX DECISIONS SIMPLIFIED
Pie Charts: Radial angle for comparison. Humans are very bad at
accurate radial comparisons – we’ve evolved for speedy length /
distance comparisons.
https://blog.funnel.io/why-we-dont-use-pie-charts-and-some-tips-on-
better-data-visualizations
edgetier
What to Avoid – Area?
COMPLEX DECISIONS SIMPLIFIED
Area: We’re bad at area – rank these bubbles by area, and compare
them relative to each other.
edgetier
What to Avoid – Area?
COMPLEX DECISIONS SIMPLIFIED
Area: We’re bad at area – rank these bubbles by area, and compare
them relative to each other.
https://www.data-to-viz.com/caveat/area_hard.html
edgetier
What to Avoid – 3d?
COMPLEX DECISIONS SIMPLIFIED
3d: In general, 3D is “fake fancy”. Impractical but gee-whizz – avoid!
Caveat: Interactive Scatters?
edgetier Conclusions
COMPLEX DECISIONS SIMPLIFIED
Wide variety of tools available in Python.
Get familiar with Pandas syntax for quick & simple
exploration, and use with Seaborn themes.
Learn one more high-level library in detail – Seaborn or Altair
for publication of output and more flexibility
“Simplicity is the ultimate sophistication”
Leonardo Da Vinci
edgetier
COMPLEX DECISIONS SIMPLIFIED
Data Visualisation in Python
Quick and easy routes to plotting magic
Shane Lynn PhD
@shane_a_lynn | @TeamEdgeTier
www.edgetier.com | [email protected] | @TeamEdgeTier
edgetier More?
COMPLEX DECISIONS SIMPLIFIED
Resources
Tour of Python’s Data Landscape
https://dsaber.com/2016/10/02/a-dramatic-tour-through-pythons-data-
visualization-landscape-including-ggplot-and-altair/
Python Graph Gallery
https://python-graph-gallery.com/
From Data to Viz
https://www.data-to-viz.com/