DATA VISUALIZATION
WITH
With Expert Python Instructor Chris
Bruehl
*Copyright Maven Analytics,
COURSE
STRUCTURE
This is a project-based course for students looking for a practical, hands-on
approach to learning data visualization with Python using the Matplotlib and
Seaborn libraries
Additional resources include:
Downloadable PDF to serve as a helpful reference when you’re offline or on the
go
Quizzes & Assignments to test and reinforce key concepts, with step-by-step
solutions
Interactive demos to keep you engaged and apply your skills throughout the
course
*Copyright Maven Analytics, LLC
COURSE
OUTLINE
Cover key data visualization best practices for clear communication, with
1 Intro to Data tips for
choosing the right chart, formatting it effectively, and using it to tell a story
Visualization s
Introduce the Matplotlib library and use it to build & customize several ,
2 Matplotlib chart type including line charts, bar charts, pie charts, scatterplots, and
histograms
Fundamentals
PROJECT: Visualizing Coffee Industry
Data
3 Advanced Apply advanced customization techniques in Matplotlib, including multi-
chart figures, custom layouts & colors, style sheets, and more
Customization
PROJECT: Consolidating Coffee Industry Data into a
Report
Visualize data with Seaborn, another Python library that introduces new
4 Data Viz with chart
Seaborn types and layouts, and interacts will with Matplotlib
PROJECT: Highlighting Insights from the Automotive Auction
Industry
*Copyright Maven Analytics, LLC
WELCOME TO MAVEN CONSULTING
GROUP
You’ve just been hired as an Associate Consultant for Maven Consulting
THE Group (MCG), a multinational firm that provides strategic advice to
SITUATIO companies across different industries. Your new role will see you take on
projects in the hotel, coffee, automotive, and diamond industries.
N
Your task is to effectively visualize data from these industries to deliver key
THE insights to M C G’s clients.
ASSIGNMENT This will range from analyzing hotel customer demographics to understanding
the major players in the global coffee industry.
• Use Pandas to read & manipulate multiple datasets
THE
• Use Matplotlib to visualize data & communicate
OBJECTIVE insights, and then build reports to consolidate your
S findings
• Use Seaborn to conduct advanced exploratory
analysis and aid the decision-making process
*Copyright Maven Analytics, LLC
SETTING EXPECTATIONS
This course covers the core functionality for Matplotlib &
Seaborn
• We’ll cover chart types, common customization options, and best practices for visualizing and analyzing
data
• We’ll give the tools to use the official documentation to apply any customization option not covered in the
course
We’ll focus on creating static visuals &
dashboards
• Interactive data visualization with Python will be covered in a separate
course
We’ll use Jupyter Notebooks as our primary coding
environment
• Jupyter Notebooks are free to use, and the industry standard for conducting data analysis with
Python
(we’ll introduce Google Colab as an alternative, cloud-based environment as well)
You do NOT need to be a Python expert to take this
course
• It is strongly recommended that you complete our Python Foundations and Data Analysis with Pandas
courses, or
have a solid understanding of basic Python syntax and DataFrame manipulation with the Pandas library
*Copyright Maven Analytics, LLC
*Copyright Maven Analytics, LLC
INSTALLING ANACONDA
(MAC)
1) Go to anaconda.com/products/distribution and
click
4) Follow the installation
steps
(default settings are OK)
2) Click X on the Anaconda Nucleus pop-
up
(no need to launch)
3) Launch the downloaded Anaconda pkg
file
*Copyright Maven Analytics, LLC
INSTALLING ANACONDA
(PC)
1) Go to anaconda.com/products/distribution and
click
4) Follow the installation
steps
(default settings are OK)
2) Click X on the Anaconda Nucleus pop-
up
(no need to launch)
3) Launch the downloaded Anaconda exe
file
*Copyright Maven Analytics, LLC
LAUNCHING
JUPYTER
1) Launch Anaconda 2) Find Jupyter Notebook and
Navigator click
*Copyright Maven Analytics, LLC
YOUR FIRST JUPYTER
NOTEBOOK
1) Once inside the Jupyter interface, create a folder to store your notebooks for the
course
NOTE: You can rename your folder by clicking “Rename” in the top left
corner
2) Open your new coursework folder and launch your first Jupyter
notebook!
NOTE: You can rename your notebook by clicking on the title at the top of the
screen
*Copyright Maven Analytics, LLC
THE NOTEBOOK
SERVER
NOTE: When you launch a Jupyter notebook, a terminal window may pop
up as well; this is called a notebook server, and it powers the notebook
interface
If you close the server window,
your notebooks will not run!
Depending on your OS, and
method of launching Jupyter, one
may not open. As long as you can
run your notebooks, don’t worry!
*Copyright Maven Analytics, LLC
ALTERNATIVE: GOOGLE
COLAB
Google Colab is Google’s cloud-based version of Jupyter
Notebooks
To create a Colab notebook:
1. Log in to a Gmail account
2. Go to colab.research.google.com
3. Click “new notebook”
Colab is very similar to Jupyter Notebooks
(they even share the same file extension); the
main difference is that you are connecting to
Google Drive rather than your machine, so
files will be stored in Google’s cloud
*Copyright Maven Analytics, LLC
*Copyright Maven Analytics, LLC
DATA VISUALIZATION
In this section we’ll cover key data visualization best practices for clear
communication,
with tips for choosing the right chart, formatting it effectively, and using it to tell a
story
TOPICS WE’LL GOALS FOR THIS SECTION:
COVER:
• Understand the purpose behind visualizing
data
• Learn the common chart types and their use cases
• Apply data visualization best practices to create
clear
and compelling charts
• Address common errors and how to avoid them
*Copyright Maven Analytics, LLC
WHY VISUALIZE
DATA?
Data visualization allows you to bring your data to life
• The human brain is built to interpret raw data as meaningless numbers and noise
• We need clear patterns and visual cues to help us quickly make sense of complex
information
Prefrontal Visual
Cortex
• Located in the frontal lobe
Cortex
• Located in the occipital lobe
• Responsible for cognitive • Responsible for visual
functioning & problem perception & understanding
solving • Helps us make sense of
• Helps us make sense of colors,
non-visual patterns, shapes, sizes, etc.
information (like raw • Instantaneous & subconscious
data)
• Slow & conscious
Data visualization puts both our prefrontal and visual cortex to work,
combining
the power of cognition (slow and conscious) and perception (instantaneous)
*Copyright Maven Analytics, LLC
THE TEN SECOND
RULE
In 10 seconds, what can you learn from the data
below?
0 10
TIME’S
UP!
*Copyright Maven Analytics, LLC
THE TEN SECOND
RULE
What if you were given the
averages?
*Copyright Maven Analytics, LLC
THE TEN SECOND
RULE
What if you visualize
it?
This is a slight twist on
Anscombe’s Quartet
Despite sharing nearly
identical descriptive
stats, each series tells a
very different visual
story
*Copyright Maven Analytics, LLC
THE 3 KEY
QUESTIONS
The 3 key questions are a great way to help choose the right
visual
What type of data What do you want Who is the end user
are you working to and what do they
with? communicate? need?
Time-series Comparison Analyst
Data that spans across Compares values over time Likes to see details and
continuous time or across categories understand
periods what’s happening at a granular
level
Categorical Composition Manager
Data that can be split up Breaks down the Wants summarized
into groups or categories component parts of a information with clear,
whole actionable insights
Numeric Distribution Executive
Data with quantitative Shows the frequency of Needs high-level, clear KPIs to
values, values track
either discrete or continuous within a series business health and performance
Hierarchical Relationship General Public
Data with natural groups Shows the correlation Requires engaging visuals and
and sub-groups between multiple variables a clear story to follow
*Copyright Maven Analytics, LLC
ESSENTIAL
VISUALS
KPI PIE TABL
CARD CHART E
Sometime Sort the slices, Add a color scale
s simple keep them under to highlight
text ~5, and focus on patterns in the
works one data
best
LINE BAR SCATTER PLOT
CHART CHART Remember that
correlation does
not imply
The dates must
causation
be continuous
Baseline must start at
zero
AREA 100% HISTOGRA
CHART STACKED M
Comparison
& Avoid using
composition too many
bins!
*Copyright Maven Analytics, LLC
CHART FORMATTING
Chart formatting should be used to eliminate noise & facilitate
understanding
BEFORE: Cluttered This is the right chart type… so why
chart is it
so hard to understand the visual?
× The chart border and gridlines are
more distracting than useful
× The vertical axis labels are hard to
read and lack context – it’s using
scientific notation and doesn’t start
at 0
× Data labels can help add context, but
they
just add noise here
× It’s not clear what each line
represents
PRO TIP: Be intentional about the formatting you apply – don’t just use the default
settings!
*Copyright Maven Analytics, LLC
CHART FORMATTING
Chart formatting should be used to eliminate noise & facilitate
understanding
AFTER: Clear chart
PRO TIPS:
Remove the chart border & gridlines
Format the axis labels clearly
Add context with the chart title
Create a visual order
Make sure the story is clear
“Perfection is achieved not when there is nothing more to add, but when there is nothing left to take
away”
Antoine de Saint-Exupery
*Copyright Maven Analytics, LLC
STORYTELLING
Descriptive titles and data labels can be used to tell a clear story within your
visuals
AFTER: Compelling chart
PRO TIPS:
Leverage the title to guide the
audience toward specific insights
Insert text & shapes directly inside
the chart
Use data labels and annotations to
draw
attention to the main data points
Use color strategically
*Copyright Maven Analytics, LLC
COMMON
ERRORS
Choosing the wrong visual to represent the type of
data
Using a line chart, which is
meant for time series
data, with categorical data
gives the false sense of a
trend
Bar charts are great for
showing
comparison with categorical
data
While a tree map can work,
comparisons and compositions
are harder to make than with
a bar or pie chart
It’s best to use them with PRO TIP: Don’t prioritize
hierarchical data variety over effectiveness;
use the right chart for the
job!
*Copyright Maven Analytics, LLC
COMMON
ERRORS
Including too many series in a single
visual
It’s hard to focus or
extract
any valuable
information
Try highlighting the
series you want, or
aggregating other
categories
You can also group the
other categories into a
single series
*Copyright Maven Analytics, LLC
COMMON
ERRORS
Providing little to no context with text and
labels
What does
each line
represent?
What are
these
values?
What does
each
period
represent?
When removing elements from a chart to reduce clutter and
noise, remember to keep all the elements that add
understanding
*Copyright Maven Analytics, LLC
COMMON
ERRORS
Using inconsistent colors between related
visuals
Using different colors for the same
series
makes it difficult to associate them
visually
Consistency gains more
importance as the
number of visuals
increases, making it
critical for dashboards
Using the same colors consistently
makes them easier to understand, and
in some cases allows you to remove
the legend
*Copyright Maven Analytics, LLC
KEY TAKEAWAYS
Always answer the 3 key questions to choose the right visual
• What type of data are you working with? What do you want to communicate? Who is the end user?
Do NOT prioritize variety over effectiveness
• Choose chart types based on how clearly they communicate the data underneath – you can customize
later!
Eliminate noise and distractions to facilitate understanding
• “Perfection is achieved not when there is nothing more to add, but when there is nothing left to take
away”
Tell a story with the data to guide the user to the insights
• Use titles, strategic labels, and callouts to create a clear narrative
*Copyright Maven Analytics, LLC
*Copyright Maven Analytics, LLC
INTRO TO
MATPLOTLIB
In this section we’ll introduce the Matplotlib library and use it to build & customize
several
chart types, including line charts, bar charts, pie charts, scatterplots, and histograms
TOPICS WE’LL GOALS FOR THIS SECTION:
COVER:
• Understand the difference between the two
primary Matplotlib plotting frameworks
• Identify the key components of an object-oriented
plot
• Build different variations of line, bar and pie charts,
as well as scatterplots and histograms
• Customize your charts by adding custom titles,
labels, legends, annotations and much more!
*Copyright Maven Analytics, LLC
MEET MATPLOTLIB
Matplotlib is an open-source Python library built for data visualization that lets
you produce a wide variety of highly customizable charts & graphs
‘plt’ is the standard alias for
Matplotlib
The plot() function creates a
line chart by default, using
the index as the x-values
and the list elements as the
y-values
*Copyright Maven Analytics, LLC
COMPATIBLE DATA TYPES
Matplotlib can plot many data types, including base Python sequences,
NumPy Arrays, and Pandas Series & DataFrames
Python Pandas Pandas
List Series DataFrame
*Copyright Maven Analytics, LLC
PLOTTING METHODS
Matplotlib has two plotting methods, or
interfaces:
Charts are created with the plot() Charts are created by defining a plot
function, and modified with object, and modified using figure &
additional functions axis methods
1. Create the figure object and assign it
to
the ‘fig’ variable
2. Add a chart, or axis, object to the
figure
and assign it to the ‘ax’ variable
3. Call the axis plot() method to draw
the
chart
We’ll mostly focus on the
Object-Oriented
approach, as it provides
more clear control over
customization
*Copyright Maven Analytics, LLC
OBJECT-ORIENTED
PLOTTING
Object-Oriented plots are built by adding axes, or charts, to a figure
• The subplots() function lets you create the figure and axes in a single line of code
• You can then use figure & axis methods to customize the different elements in the
plot
Creates the figure and axis
Plots “y”
Adds a title to the figure and
axis
We’ll start by adding a single
subplot to each figure for
now, but will dive deeper
into subplots later in the
course!
*Copyright Maven Analytics, LLC
PLOTTING DATAFRAMES
When plotting DataFrames using the Object-Oriented interface, Matplotlib
will use the index as the x-axis and plot each column as a separate series by
default
*Copyright Maven Analytics, LLC
PLOTTING DATAFRAMES
Plotting each series independently allows for improved
customization
• ax.plot(x-axis series, y-series values)
*Copyright Maven Analytics, LLC
ASSIGNMENT: PLOTTING
DATAFRAMES
Results
NEW MESSAGE Preview
August 29, 2022
From: Ian Intern (Summer Consultant)
Subject: Do you know Matplotlib?
Hi!
I need someone who knows M atplotlib for help with
some client work.
Can you plot Lodging Revenue and Other Revenue over
time for our hotel client?
Thanks!
section02_assignments.ipynb
*Copyright Maven Analytics, LLC
SOLUTION: PLOTTING
DATAFRAMES
Solution
NEW MESSAGE Code
August 29, 2022
Plot Each
From: Ian Intern (Summer Consultant) Series
Subject: Do you know Matplotlib?
Hi!
I need someone who knows M atplotlib for help with Plot The
some client work.
DataFrame
Can you plot Lodging Revenue and Other Revenue over
time for our hotel client?
Thanks!
section02_solutions.ipynb
*Copyright Maven Analytics, LLC
FORMATTING OPTIONS
Matplotlib has these formatting options for PyPlot and Object-Oriented
plots:
Figure
Title
Y-axis Tick Legen Figure Title fig.suptitle() plt.suptitle()
d
Axis Chart Title ax.set_title() plt.subtitle()
Title
Y-axis X-Axis Label ax.set_xlabel() plt.xlabel()
Label
Y-Axis Label ax.set_ylabel() plt.ylabel()
Legend ax.legend() plt.legend()
Tex
t X-Axis Limit ax.set_xlim() plt.xlim()
Y-Axis Limit ax.set_ylim() plt.ylim()
Axe X-Axis Ticks ax.set_xticks() plt.xticks()
s
Figur Y-Axis Ticks ax.set_yticks() plt.yticks()
Vertical
e
Line Vertical Line ax.axvline() plt.axvline()
Horizontal Line ax.axhline() plt.axhline()
X-axis spine[‘bottom’] Text ax.text() plt.text()
Tick X-axis
Label Spines (borders) ax.spines[‘side’] plt.spines[‘side’]
*Copyright Maven Analytics, LLC
CHART
TITLES
The set_title() and set_label() methods let you add chart titles and axis
labels
• fig.suptitle() serves as an overall figure title
*Copyright Maven Analytics, LLC
FONT
SIZES
You can modify chart font sizes with the “fontsize” argument
• You can specify the size in points (10, 12, etc.) or relative size (“smaller”, “x-large”,
etc.)
*Copyright Maven Analytics, LLC
CHART
LEGENDS
The legend() method lets you add a chart legend to identify each
series
• The series labels are used by default, but custom values can also be passed
through
*Copyright Maven Analytics, LLC
CHART
LEGENDS
The legend() method lets you add a chart legend to identify each
series
• The series labels are used by default, but custom values can also be passed
through
*Copyright Maven Analytics, LLC
LEGEND LOCATION
You can change the legend location with the “loc” or “bbox_to_anchor”
arguments
• “loc” lets you set a predetermined location option
• “bbox_to_anchor” lets you set specific (x, y) coordinates
1
best (default)
upper right
upper left
upper center
lower right
lower left
lower center
center right
center left 0
center bbo
0 1
x
*Copyright Maven Analytics, LLC
LEGEND LOCATION
You can change the legend location with the “loc” or “bbox_to_anchor”
arguments
• “loc” lets you set a predetermined location option
• “bbox_to_anchor” lets you set specific (x, y) coordinates
*Copyright Maven Analytics, LLC
LEGEND LOCATION
You can change the legend location with the “loc” or “bbox_to_anchor”
arguments
• “loc” lets you set a predetermined location option
• “bbox_to_anchor” lets you set specific (x, y) coordinates
*Copyright Maven Analytics, LLC
LEGEND LOCATION
You can change the legend location with the “loc” or “bbox_to_anchor”
arguments
• “loc” lets you set a predetermined location option
• “bbox_to_anchor” lets you set specific (x, y) coordinates
*Copyright Maven Analytics, LLC
LEGEND LOCATION
You can change the legend location with the “loc” or “bbox_to_anchor”
arguments
• “loc” lets you set a predetermined location option
• “bbox_to_anchor” lets you set specific (x, y) coordinates
Setting coordinates beyond 1 will push
the legend outside the chart area
(useful when there is no whitespace!)
*Copyright Maven Analytics, LLC
LINE
STYLE
You can change the line style with the “linestyle”, “linewidth”, and “color”
arguments
• Common line styles are “solid”, “dashed”, or “dotted” (you can also use “-”, “--”, or “:”)
We will dive into colors in depth later, including
changing the default color palette and using hex
color codes!
*Copyright Maven Analytics, LLC
AXIS
LIMITS
The set_ylim() and set_xlim() functions let you modify the axis
limits
• ax.set_xlim(lower limit, upper limit)
Your date x-axis ticks may change interval size!
PRO TIP: Keeping the base of the y-axis
at 0 highlights the true magnitude of
change across periods and the differences
between series
*Copyright Maven Analytics, LLC
FIGURE
SIZE
You can adjust the figure size with the “figsize”
argument
• figsize=(width, height) – the default is 6.4 x 4.8 inches
PRO TIP: Increasing figure size lets you
add whitespace to your visual, which can
reduce clutter and add space to crowded
axes
*Copyright Maven Analytics, LLC
CUSTOM X-
TICKS
You can apply custom x-ticks with the set_xticks() and xticks()
functions
• ax.set_xticks(iterable)
This sets the xticks at every 2 nd date
from
the index and rotates them by 45
degrees *Copyright Maven Analytics, LLC
ADDING VERTICAL
LINES
You can add vertical lines to mark key points with the axvline()
function
Set the coordinate (in this case days since Jan 1,
1970) and an optional color and style
*Copyright Maven Analytics, LLC
TEXT
You can add text at specific coordinates with the text()
function
• ax.text(x-coordinate, y-coordinate, string, additional text
formatting)
*Copyright Maven Analytics, LLC
PRO TIP:
ANNOTATIONS
Annotations are a great way to call-out and label important datapoints
• ax.annotate(string, datapoint coordinate, text coordinate, arrow style dictionary, text
formatting)
Annotations have many more options that we won’t cover in
depth,
but the documentation has great examples worth looking into!
For a more info on annotations, visit: https://matplotlib.org/stable/tutorials/text/annotations.html#sphx-glr-tutorials-text-annotations- *Copyright Maven Analytics, LLC
REMOVING CHART
BORDERS
You can remove specific chart borders with
ax.spines[].set_visible(False)
This removes the right and top
borders
*Copyright Maven Analytics, LLC
ASSIGNMENT: CHART
FORMATTING
Results
NEW MESSAGE Preview
August 30, 2022
From: Ian Intern (Summer Consultant)
Subject: RE: Final Charts for Client
Hi there!
The data you plotted earlier looks good, but can you clean
up the chart a little bit? I want it to to look polished for
our client. This is my last day in my summer internship
and I want to get hired back!
Thanks!
section02_assignments.ipynb
*Copyright Maven Analytics, LLC
SOLUTION: CHART
FORMATTING
Solution
NEW MESSAGE Code
August 30, 2022
From: Ian Intern (Summer Consultant)
Subject: Final Charts for Client
Hi there!
The data you plotted earlier looks good, but can you
clean up the chart a little bit! Want to to look polished
for our client.
This is my last day in my summer internship and I want
to get hired back!
Thanks!
section02_solutions.ipynb
*Copyright Maven Analytics, LLC
LINE
CHARTS
Line charts are used for showing trends over
time
• ax.plot(x-axis series, series values, formatting
options)
Column for each
series
Dates as the
index
PRO TIPS
Pivot tabular data to turn each unique series into a DataFrame column, and set the datetime as the
index Divide your series by the appropriate units while plotting to simplify the y-axis scale
*Copyright Maven Analytics, LLC
LINE
CHARTS
EXAMPL Available Housing Units by
E Week
*Copyright Maven Analytics, LLC
STACKED LINE
CHARTS
Use stackplot() to create a stacked line chart, which lets you visualize the
overall trend over time, as well as its composition by series
*Copyright Maven Analytics, LLC
STACKED LINE
CHARTS
Use stackplot() to create a stacked line chart, which lets you visualize the
overall trend over time, as well as its composition by series
PRO TIP: Use the bottom series in
the stacked line chart to draw focus
to its individual trend – it’s the most
visible!
*Copyright Maven Analytics, LLC
PRO TIP: DUAL AXIS
CHARTS
Use twinx() to create a dual axis chart, which lets you plot series with
values on significantly different scales inside a single visual
The “Inventory” values are so small compared to “Price”
that
they appear to be 0 when plotted on the same y-axis
*Copyright Maven Analytics, LLC
PRO TIP: DUAL AXIS
CHARTS
Use twinx() to create a dual axis chart, which lets you plot series with
values on significantly different scales inside a single visual
Create a second axis (ax2) with
ax.twinx(), then create the desired
plot on ax2
Note that using the figure level
legend picks up both series
*Copyright Maven Analytics, LLC
ASSIGNMENT: LINE
CHARTS
Results
NEW MESSAGE Preview
August 30, 2022
From: Ian Intern (Summer Consultant)
Subject: Re: Re: Final Charts for Client
Hey again,
Great work on those charts!
Final request - we want to plot compare room nights
booked vs cancellations over time, we might need a dual
axis chart to effectively do this. I’m totally checked out,
so can you do this? You’ll be put in contact with the
client soon.
Thanks!
section02_assignments.ipynb
*Copyright Maven Analytics, LLC
SOLUTION: LINE
CHARTS
Solution
NEW MESSAGE Code
August 30, 2022
From: Ian Intern (Summer Consultant)
Subject: Re: Re: Final Charts for Client
Hey again,
Great work on those charts!
Final request - we want to plot compare room nights
booked vs cancellations over time, we might need a dual
axis chart to effectively do this. I’m totally checked out,
so can you do this? You’ll be put in contact with the
client soon.
Thanks!
section02_solutions.ipynb
*Copyright Maven Analytics, LLC
BAR
CHARTS
Bar charts are used to compare values across different
categories
• ax.bar(category labels, bar heights, formatting options)
Values in a single
Categories as the column
index
PRO TIPS
Use .groupby() and .agg() to aggregate your data by category and push the labels into the
index Use Seaborn or the Pandas plot API for grouped bar charts
*Copyright Maven Analytics, LLC
BAR
CHARTS
EXAMPL Median Home Price by
E City
*Copyright Maven Analytics, LLC
PRO TIP: HORIZONTAL
LINES
Use axhline() to add a horizontal line at a specified y-value on a bar
chart
• This will typically be something to benchmark against, like a mean or target
*Copyright Maven Analytics, LLC
HORIZONTAL BAR
CHARTS
Use barh() to create a horizontal bar
chart
Note that the Series in a horizontal bar chart are
sorted in the opposite order as in a vertical bar
chart
*Copyright Maven Analytics, LLC
PRO TIP:
HIGHLIGHTS
Use the “color” argument to highlight the series you’d like to focus
on
Use a list to specify the color for each
Series
*Copyright Maven Analytics, LLC
ASSIGNMENT: BAR
CHARTS
Results
NEW MESSAGE Preview
September 1, 2022
From: Sarah Shark (Managing Director)
Subject: CHARTS NEEDED ASAP
Hello,
Our hotel client is concerned about our intern’s
departure. I need YOU to step up and make sure
they’re happy with us.
Start by taking a quick look at room nights and
lodging by country for our top 10 countries by total
nights booked.
I expect the results in my inbox by morning (more
details in
the notebook attached).
-S section02_assignments.ipynb
*Copyright Maven Analytics, LLC
ASSIGNMENT: BAR
CHARTS
Solution
NEW MESSAGE Code
September 1, 2022
From: Sarah Shark (Managing Director)
Subject: CHARTS NEEDED ASAP
Hello,
Our hotel client is concerned about our intern’s
departure. I need YOU to step up and make sure
they’re happy with us.
Start by taking a quick look at room nights and
lodging by country for our top 10 countries by total
nights booked.
I expect the results in my inbox by morning (more
details in
the notebook attached).
-S section02_solutions.ipynb
*Copyright Maven Analytics, LLC
STACKED BAR
CHARTS
You can create a stacked bar chart by setting the “bottom” argument for
the
second “stacked” series as the values from the bars below it
• This will use those values as the baseline for the stacked bars instead of the x-axis
The Oregon bars are plotted by
using the
California values as their “bottom”
*Copyright Maven Analytics, LLC
100% STACKED BAR
CHARTS
To create a 100% stacked bar chart, convert your DataFrame to row-
level percentages before plotting
*Copyright Maven Analytics, LLC
PRO TIP: GROUPED BAR
CHARTS
You can create a grouped bar chart by reducing the width of each series
and shifting them evenly around their corresponding label
This shifts the bars to the left
across the x-axis by half their
width
This shifts these bars to the
right
Grouped bar charts are much easier to
create by using Seaborn or Pandas’
Matplotlib API
*Copyright Maven Analytics, LLC
PRO TIP: COMB O
CHARTS
You can create a combo chart by specifying different chart types in a dual axis
plot
PRO TIP: Use the “alpha” argument
to
modify the transparency of each plot
(0 is invisible and 1 is solid)
*Copyright Maven Analytics, LLC
ASSIGNMENT: ADVANCED BAR
CHARTS
Results
NEW MESSAGE Preview
September 2, 2022
From: Sarah Shark (Managing Director)
Subject: RE: RE: CHARTS NEEDED ASAP
Hello,
Nice work…so far. I need some more detailed views on
the breakdown of lodging revenue vs. other revenue by
country.
Build a grouped bar chart with the lodging revenue and
other revenue for each country. Then, build a 100%
stacked bar chart showing how much each revenue
category contributes to overall country revenue. Add a
reference line at 80% to help illustrate which countries
get less than 80% of their revenue from lodging.
-S
section02_assignments.ipynb
*Copyright Maven Analytics, LLC
SOLUTION: ADVANCED BAR
CHARTS
Solution
NEW MESSAGE Code
September 2, 2022
From: Sarah Shark (Managing Director)
Subject: RE: RE: CHARTS NEEDED ASAP
Hello,
Nice work…so far. I need some more detailed views on
the breakdown of lodging revenue vs. other revenue by
country.
Build a grouped bar chart with the lodging revenue and
other revenue for each country. Then, build a 100%
stacked bar chart showing how much each revenue
category contributes to overall country revenue. Add a
reference line at 80% to help illustrate which countries
get less than 80% of their revenue from lodging.
-S
section02_solutions.ipynb
*Copyright Maven Analytics, LLC
PIE
CHARTS
Pie charts are used to compare proportions totaling 100%
• ax.pie(series values, labels= , startangle= , autopct=, pctdistance=,
explode=)
Values in a single
column
Labels as the
index
PRO TIPS
Keep the number of slices low (<7) to enhance readability – you can group “others” into a single
slice Use bar charts if you want to compare the categories – pies are for showing how they make
up a whole Donut charts make great KPI progress trackers
*Copyright Maven Analytics, LLC
PIE
CHARTS
EXAMPL Homes Sold by
E City
*Copyright Maven Analytics, LLC
PRO TIP: DONUT
CHARTS
You can create a donut chart by adding a “hole” to a pie chart and shifting the
labels
How does this code work?
• It pushes the data labels 85% of the way towards the edge of the pie
chart
• Then adds a white circle that covers the center of the pie chart to the
figure
*Copyright Maven Analytics, LLC
ASSIGNMENT: PIE & DONUT
CHARTS
Results
NEW MESSAGE Preview
September 3, 2022
From: Sarah Shark (Managing Director)
Subject: UPDATED CHARTS
Hello,
Our hotel client is looking for a pie/donut chart to
represent the share of revenue by country.
Create a pie chart with slices for the top 5 countries by
revenue, and a single “other” slice for the rest of the
countries.
Need it ASAP.
Thx
section02_assignments.ipynb
*Copyright Maven Analytics, LLC
SOLUTION: PIE & DONUT
CHARTS
Solution
NEW MESSAGE Code
September 3, 2022
From: Sarah Shark (Managing Director)
Subject: UPDATED CHARTS
Hello,
Our hotel client is looking for a pie/donut chart to
represent the share of revenue by country.
Create a pie chart with slices for the top 5 countries by
revenue, and a single “other” slice for the rest of the
countries.
Need it ASAP.
Thx
section02_solutions.ipynb
*Copyright Maven Analytics, LLC
SCATTERPLOTS
Scatterplots are used to visualize the relationship between numerical
variables
• ax.scatter(x-axis series, y-axis series, size= , alpha=)
One row per x-series y-series
point
PRO TIPS
Modify the alpha (transparency) level to make overlapping points more visible
Bubble charts can be useful in some cases, but they often add confusion rather than
clarity
*Copyright Maven Analytics, LLC
SCATTERPLOTS
EXAMPL Months of Supply vs. Median List
E Price
*Copyright Maven Analytics, LLC
BUBBLE
CHARTS
To create a bubble chart, specify a third series in the “size” argument
of .scatter()
• You may need to apply some arithmetic to adjust the bubble sizes
*Copyright Maven Analytics, LLC
HISTOGRAMS
Histograms are used to visualize the distribution of a numeric
variable
• ax.hist(series, density= , alpha=, bins=)
numerical
series
PRO TIPS
Modify the alpha (transparency) level to plot multiple distributions on the same
axis Set density=True to use relative frequencies on the y-axis (percent of total)
*Copyright Maven Analytics, LLC
HISTOGRAMS
EXAMPL Distribution Y-o-Y Growth in Home Price for Calendar
E Weeks
*Copyright Maven Analytics, LLC
ASSIGNMENT: SCATTERPLOTS &
HISTOGRAMS
Results
NEW MESSAGE Preview
September 4, 2022
From: Sarah Shark (Managing Director)
Subject: Additional Customer Profiling
Not bad rookie – thanks for the quick turnaround.
I need two more charts to help finalize a marketing
strategy targeting overseas guests:
1. A chart comparing average revenue per customer and
average nights stayed, with average nightly revenue
as the size of the bubbles (you’ll need to aggregate
the data by country)
2. The distribution of customer ages in France &
Germany
-sent from my yPhone
section02_assignments.ipynb
*Copyright Maven Analytics, LLC
SOLUTION: SCATTERPLOTS &
HISTOGRAMS
Solution
NEW MESSAGE Code
September 4, 2022
From: Sarah Shark (Managing Director)
Subject: Additional Customer Profiling
Not bad rookie – thanks for the quick turnaround.
I need two more charts to help finalize a marketing
strategy targeting overseas guests:
1. A chart comparing average revenue per customer and
average nights stayed, with average nightly revenue
as the size of the bubbles (you’ll need to aggregate
the data by country)
2. The distribution of customer ages in France &
Germany
-sent from my yPhone
section02_solutions.ipynb
*Copyright Maven Analytics, LLC
KEY TAKEAWAYS
Matplotlib has two methods for plotting data: PyPlot API & Object
Oriented
• Both can visualize many data types (lists, DataFrames, etc.), but object-oriented plots are easier to fully
customize
Object Oriented plots are built by adding axes to a figure
• You can layer on different elements to these objects to modify the chart formatting
You can create common chart types by using Matplotlib functions
• Each chart type can be customized further to create more advanced variations
Matplotlib's extreme customizability also adds complexity
• Understanding the anatomy of a Matplotlib figure helps pinpoint how to change every component in your *Copyright Maven Analytics, LLC
*Copyright Maven Analytics, LLC
PROJECT DATA: COFFEE
PRODUCTION
*Copyright Maven Analytics, LLC
PROJECT DATA: COFFEE
IMPORTS
*Copyright Maven Analytics, LLC
PROJECT DATA: COFFEE
PRICES
*Copyright Maven Analytics, LLC
ASSIGNMENT: MID-COURSE
PROJECT
Key Objectives
NEW MESSAGE
September 7, 2022 1. Read in data from multiple csv files
From: Sarah Shark (Managing Director) 2. Reshape the data to prepare it for
Subject: Coffee Industry Deep Dive visualization
3. Build & customize charts to communicate the
Hi there, key insights to the client
I’m starting to trust you… which is rare. We just got an
inquiry from a major coffee trader looking to get an
outside view on the coffee industry. They’re particularly
interested in Brazil’s production relative to other nations.
We’ll also look at a comparison of importer volume vs
the prices they pay to understand if we can unlock
margin by diversifying into new markets.
Do well on this and you’ll be on promotion track.
section03_coffee_project_part1.ipynb
*Copyright Maven Analytics, LLC
*Copyright Maven Analytics, LLC
ADVANCED CUSTOMIZATION
In this section we’ll cover advanced customization techniques in Matplotlib,
including
multi-chart figures, custom layouts & colors, style sheets, and more
TOPICS WE’LL GOALS FOR THIS SECTION:
COVER:
• Understand how to build multi-chart figures both
with subplots and GridSpec layouts
• Learn how to customize chart colors, by leveraging
custom colormaps and creating your own!
• Take a look at pre-built stylesheets, and dive into
the settings behind them that allow for extreme
chart customization
*Copyright Maven Analytics, LLC
SUBPLOTS
Subplots let you create a grid of equally sized charts in a single figure
• fig, ax = plt.subplots(rows, columns) – this creates a grid with the specified rows &
columns
Column Column
0 1
Row (0, (0,
0 0) 1)
Row (1, (1,
1 0) 1)
This creates a 2 row, 2
column grid that can be
populated with individual
charts
*Copyright Maven Analytics, LLC
SUBPLOTS
Subplots let you create a grid of equally sized charts in a single figure
• fig, ax = plt.subplots(rows, columns) – this creates a grid with the specified rows &
columns
(0, (0,
0) 1)
(1, (1,
0) 1)
Specify ax[row][column] to
create and modify individual
subplots
*Copyright Maven Analytics, LLC
SUBPLOTS
Subplots let you create a grid of equally sized charts in a single figure
• fig, ax = plt.subplots(rows, columns) – this creates a grid with the specified rows &
columns
*Copyright Maven Analytics, LLC
SUBPLOTS
Use the “sharex “& “sharey” arguments to set the same axis limits on all the
plots
• This is set as “none” by default, but can be set to “all”, “row”, or “col”
*Copyright Maven Analytics, LLC
SUBPLOTS
Subplots can be any chart type, and do not have to be the same
type
*Copyright Maven Analytics, LLC
ASSIGNMENT:
SUBPLOTS
Results
NEW MESSAGE Preview
September 10, 2022
From: Wendy Whiz (Data Scientist)
Subject: Deeper Exploration
Hey there,
I want to get a quick read on the distribution of revenue
by customer for our top 5 countries – I’m working on a
model for a similar client and want to see if the
distributions are similar.
Doesn’t need to be polished, just need the 5 histograms in
a
single figure.
Thanks, and looking forward to working with you
more! Wendy
Section04_assignments.ipynb
*Copyright Maven Analytics, LLC
SOLUTION:
SUBPLOTS
Solution
NEW MESSAGE Code
September 10, 2022
From: Wendy Whiz (Data Scientist)
Subject: Deeper Exploration
Hey there,
I want to get a quick read on the distribution of revenue
by customer for our top 5 countries – I’m working on a
model for a similar client and want to see if the
distributions are similar.
Doesn’t need to be polished, just need the 5 histograms in
a
single figure.
Thanks, and looking forward to working with you
more! Wendy
Section04_solutions.ipynb
*Copyright Maven Analytics, LLC
GRIDSPEC
You can build layouts with charts of varying sizes by setting a gridspec
object
• This creates a grid with a specified number of rows & columns
Column 0 Column 1 Column 2
Column 3
Row 0
Row 1
Row 2
Row 3
Row 4
Row 5
Row 6
Row 7
*Copyright Maven Analytics, LLC
GRIDSPEC
You can build layouts with charts of varying sizes by setting a gridspec
object
• This creates a grid with a specified number of rows & columns
• Each axis, or chart, can then occupy a group of squares in the grid
Column 0 Column 1 Column 2
Column 3
Row 0
Row 1 ax1
Row 2
Use a slice to specify the Row
ranges of rows and columns 3
for each axis Row
4
Row
5
Row
6
Row
7 *Copyright Maven Analytics, LLC
GRIDSPEC
You can build layouts with charts of varying sizes by setting a gridspec
object
• This creates a grid with a specified number of rows & columns
• Each axis, or chart, can then occupy a group of squares in the grid
Column 0 Column 1 Column 2
Column 3
Row 0
Row 1 ax1 ax2
Row 2
Row 3
Row 4
Row 5
Row 6
Row 7
*Copyright Maven Analytics, LLC
GRIDSPEC
You can build layouts with charts of varying sizes by setting a gridspec
object
• This creates a grid with a specified number of rows & columns
• Each axis, or chart, can then occupy a group of squares in the grid
Column 0 Column 1 Column 2
Column 3
Row 0
Row 1 ax1 ax2
Row 2
Row 3
Row 4
Row 5 ax3
Row 6
Row 7
*Copyright Maven Analytics, LLC
GRIDSPEC
You can build layouts with charts of varying sizes by setting a gridspec
object
• This creates a grid with a specified number of rows & columns
• Each axis, or chart, can then occupy a group of squares in the grid
*Copyright Maven Analytics, LLC
GRIDSPEC
You can build layouts with charts of varying sizes by setting a gridspec
object
• This creates a grid with a specified number of rows & columns
• Each axis, or chart, can then occupy a group of squares in the grid
*Copyright Maven Analytics, LLC
ASSIGNMENT:
GRIDSPEC
Results
NEW MESSAGE Preview
September 12, 2022
From: Sarah Shark (Managing Director)
Subject: Revenue Report Format
Hi there,
Big meeting with our hotel client coming up – we want to
propose a report format that will help track their revenue,
specifically with respect to their goal to get French
customers to surpass German customers.
Can you create a figure with a line chart tracking revenue
by category, a bar chart with revenue for the top 5
countries, and a chart indicating progress towards our
French revenue goal?
Thanks!
section04_assignments.ipynb
*Copyright Maven Analytics, LLC
SOLUTION:
GRIDSPEC
Solution
NEW MESSAGE Code
September 12, 2022
GridSpec Layout (see notebook for chart
From: Sarah Shark (Managing Director) code):
Subject: Revenue Report Format
Hi there,
Big meeting with our hotel client coming up – we want to
propose a report format that will help track their revenue,
specifically with respect to their goal to get French
customers to surpass German customers.
Can you create a figure with a line chart tracking revenue
by category, a bar chart with revenue for the top 5
countries, and a chart indicating progress towards our
French revenue goal?
Thanks!
section04_solutions.ipynb
*Copyright Maven Analytics, LLC
COLORS
You can pass colors to a plot by assigning them to
a list
This assigns each color in
the list to each bar in
the plot
*Copyright Maven Analytics, LLC
COLORS
You can also loop through a list of colors to pass them to separate series in a
plot
*Copyright Maven Analytics, LLC
COLORS
Hex codes can be used to supply specific color
pantones
PRO TIP: Sites like Google
have
helpful hexadecimal color
pickers
*Copyright Maven Analytics, LLC
PRO TIP: COLOR
PALETTES
You can also modify the entire color palette for the series in a
plot
Default Color Map:
The “Set2” color map is applied
here
Series colors are applied in this
sequential
order (at 10+ series, the cycle
repeats)
rcParams are the underlying settings for Matplotlib charts and can
be
modified to gain a high level of customization (more on these
soon!)
For more on color palettes, visit: *Copyright Maven Analytics, LLC
ASSIGNMENT:
COLORS
Results
NEW MESSAGE Preview
September 13, 2022
From: Sarah Shark (Managing Director)
Subject: Re: Revenue Report Format
Hi again,
Love the layout, HATE the colors! Let’s show some polish by
getting away from the defaults.
Apply the “Set2” colormap to the line chart and look up
the national color hex codes for the top 5 countries to
use them for the rest of the charts.
Thanks,
Sarah
section04_assignments.ipynb
*Copyright Maven Analytics, LLC
SOLUTION:
COLORS
Solution Code
NEW MESSAGE
Apply Set2 (see notebook for chart
September 13, 2022
code): :
From: Sarah Shark (Managing Director)
Subject: Re: Revenue Report Format Country
Colors:
Hi again,
Donut
Love the layout, HATE the colors! Let’s show some polish by Chart
getting away from the defaults.
Apply the “Set2” colormap to the line chart and look up
the national color hex codes for the top 5 countries to
use them for the rest of the charts.
Thanks,
Sarah
section04_solutions.ipynb
*Copyright Maven Analytics, LLC
STYLE
SHEETS
Matplotlib (and Seaborn) have style sheets that can be used instead of the
default
The style is set in
advance
The “fivethirtyeight”
style has larger font
sizing, and adds
gridlines and a
background color
*Copyright Maven Analytics, LLC
STYLE
SHEETS
Matplotlib (and Seaborn) have style sheets that can be used instead of the
default
• You can still customize individual formatting options after setting a style
*Copyright Maven Analytics, LLC
STYLE
SHEETS
Matplotlib (and Seaborn) have style sheets that can be used instead of the
default
• You can still customize individual formatting options after setting a style
The Seaborn library
has additional styles
that can be used with
Matplotlib charts, like
“darkgrid”
*Copyright Maven Analytics, LLC
ADDITIONAL
STYLES
These are some of the additional styles available in both
libraries:
*Copyright Maven Analytics, LLC
ASSIGNMENT: STYLE
SHEETS
Results
NEW MESSAGE Preview
September 14, 2022
From: Sarah Shark (Managing Director)
Subject: Re: Re: Revenue Report Format
Hi,
Layout and colors look great now, but can we spruce up
the chart styling?
Use a style sheet of your choice.
Once we’ve done that it should be ready to ship.
Thx
-S
section04_assignments.ipynb
*Copyright Maven Analytics, LLC
SOLUTION: STYLE
SHEETS
Solution
NEW MESSAGE Code
September 14, 2022
Style Setting Only (see notebook for chart
From: Sarah Shark (Managing Director) code):
Subject: Re: Re: Revenue Report Format
Hi,
Layout and colors look great now, but can we spruce up
the chart styling?
Use a style sheet of your choice.
Once we’ve done that it should be ready to ship.
Thx
-S
section04_solutions.ipynb
*Copyright Maven Analytics, LLC
STYLE PARAMETERS
Viewing the parameters of a style sheet can help format charts properly and
provide inspiration for your own formatting changes
*Copyright Maven Analytics, LLC
PARAMETER
GROUPS
There are 300+ parameters that can be modified, which fall into parameter
groups:
axes Chart-level formatting axes.spine.top = False, axis.titlesize=‘Large’
date Date formatting options date.autoformatter.month = %Y-%m
figure Figure-level formatting figure.figsize = (8.5, 11), figure.facecolor=“grey”
font Font settings font.size = 16, font.style=‘helvetica’, font.weight=‘bold’
grid Gridline settings grid.linestyle = ‘:’, grid.linewidth = 2
legend Legend settings legend.loc = ‘lower right’, legend.frameon=False
savefig Saved figure Settings savefig.dpi = 1000, savefig.format = ‘png’
text Text settings text.color = ‘grey’, text.usetex = True
xtick/ytick X and Y tick settings xtick.labelcolor=‘green’, ytick.minor.visible = True
boxplot Settings for boxplots boxplot.whiskerprops.color = ‘orange’
hist Settings for histograms hist.bins = 20
lines Settings for line charts lines.linewidth = 2, lines.color = ‘red’,
scatter Settings for scatterplots scatter.marker = “+”
For more on rcParams, visit: *Copyright Maven Analytics, LLC
MODIFYING
PARAMETERS
There are two ways to modify parameters:
1. You can change individual parameters via assignment
2. You can change multiple parameters from the same group with the rc() function
Turn off top and right spines
Change default axes title size to 20
Modify
figure size to 8”x 6”
PRO TIP: M odify parameters to avoid having
to
repeat the same formatting options on each
chart
*Copyright Maven Analytics, LLC
SAVING
FIGURES
The savefig() function will save figures as an image
file
• Simply specify the desired filename and format
Screenshotting the images with your operating
system’s snipping tool will often be sufficient for
building plots into presentations like this course
;).
*Copyright Maven Analytics, LLC
SAVING
FIGURES
The savefig() function will save figures as an image
file
• Simply specify the desired filename and format
If no extension in the filename is specified, the
file will be saved as a .png. Most systems
support
.jpg, .jpeg, .svg, and .pdf, among others. The
default resolution is 100dpi (pixels per inch)
*Copyright Maven Analytics, LLC
KEY TAKEAWAYS
Subplots and GridSpec allow us to create multi-chart
figures
• Subplots are equally sized grids, GridSpec allows for custom
layouts
Colors can be set by specifying a colormap or by assigning colors to the
data of interest
• Common color names and hex codes can be used to assign colors to your
data
Set a style to spruce up the default aesthetics, or use rcParams to
completely
customize your
• Pre-built stylescharts
can add some nice aesthetic polish compared to the matplotlib defaults
• Understanding how to modify rcParams will allow you full control over chart customization, and reduce
the need
for manual formatting
*Copyright Maven Analytics, LLC
*Copyright Maven Analytics, LLC
PROJECT DATA:
OVERVIEW
Coffee
Production
*Copyright Maven Analytics, LLC
PROJECT DATA:
OVERVIEW
Prices Paid To
Growers
*Copyright Maven Analytics, LLC
ASSIGNMENT: MID-COURSE
PROJECT
Key Objectives
NEW MESSAGE
September 18, 2022 1. Read in data from multiple csv files
From: Clarissa Café (Coffee Client) 2. Reshape the data with Pandas to set up
Subject: Summary Report charts
Hi there, 3. Build and customize line charts, bar
charts, histograms and more to
Sarah told me to reach out directly to you – we loved the communicate key insights to our client
work you did on breaking down the industry, but we want
to summarize your findings on Brazil into a single figure 4. Modify chart colors to represent national
we can pass around. flags
Can you combine your findings into a single figure report? 5. Combine modified charts into a single
We’ll also want to modify colors. There are more details in
the attached notebook.
report by
leveraging meshgrid and subplots
Thanks!
Clarissa
section05_coffee_project_part2.ipynb
*Copyright Maven Analytics, LLC
*Copyright Maven Analytics, LLC
DATA VISUALIZATION WITH
SEABORN
In this section we’ll cover data visualization with Seaborn, another Python library
that
introduces new chart types and layouts, and interacts well with Matplotlib
TOPICS WE’LL GOALS FOR THIS SECTION:
COVER:
• Introduce the basics of plotting data with
Seaborn
• Build variations of Matplotlib charts like bar charts
and histograms, as well as new visuals like boxplots,
violin plots, and linear relationship plots
• Create FacetGrid layouts as an alternative to
subplots
• Integrate Seaborn plots with Matplotlib objects to
get the best of both worlds
*Copyright Maven Analytics, LLC
MEET
SEABORN
Seaborn is a Python library for built for easily visualizing Pandas
DataFrames,
taking away some of the “drawing” required when using Matplotlib
‘sns’ is the standard alias for
Seaborn
You simply need to
specify a DataFrame as
the “data” argument
and set columns as the
“x” and “y” axes
Seaborn will
automatically aggregate
the results!
*Copyright Maven Analytics, LLC
MEET
SEABORN
Seaborn is a Python library for built for easily visualizing Pandas
DataFrames,
taking away some of the “drawing” required when using Matplotlib
You can change the aggregation
method and suppress the
confidence intervals
*Copyright Maven Analytics, LLC
CHART FORMATTING
You can apply chart formatting to Seaborn plots using Matplotlib
arguments
• These are passed to the Matplotlib object that Seaborn creates internally
We’ll cover integration with Matplotlib later, which is where you’ll be able
to leverage the chart formatting skills you’ve learned throughout the
course
*Copyright Maven Analytics, LLC
CHART FORMATTING
Seaborn still has some useful chart formatting functions like
despine()
*Copyright Maven Analytics, LLC
BAR
CHARTS
Bar charts can be created in Seaborn with sns.barplot()
• Simply specify the desired category labels and series values as “x” & “y”
arguments
Note that Seaborn automatically aggregates the data for the plot, using unique category values as the
labels for the bars, the mean of each category for the bar length, and the column headers as the axis
labels
*Copyright Maven Analytics, LLC
BAR
CHARTS
Bar charts can be created in Seaborn with sns.barplot()
• Simply specify the desired category labels and series values as “x” & “y”
arguments
To create a horizontal bar chart, specify “x” as the data
and “y” as the labels. ci=None will suppress error
bars.
*Copyright Maven Analytics, LLC
GROUPED BAR
CHARTS
Grouped bar charts can be created by specifying a categorical column as
“hue”
You can also sort the bars by one of
the columns, and apply a different
color map
*Copyright Maven Analytics, LLC
HISTOGRAMS
Histograms can be created with sns.histplot() and a single “x”
argument
*Copyright Maven Analytics, LLC
HISTOGRAMS
Histograms can be created with sns.histplot() and a single “x”
argument
• You can also specify the number of “bins” and add the kernel density
(kde=True)
The default style for Seaborn plots can
be nicer than their Matplotlib
counterparts, and vice versa, so choose
the library the works best for each
chart!
*Copyright Maven Analytics, LLC
ASSIGNMENT: BASIC
CHARTS
Results
NEW MESSAGE Preview
September 20, 2022
From: Sarah Shark (Managing Director)
Subject: New Charts
Hi,
Need a few more views on the hotel data using Seaborn.
Can we look at the distribution of lodging revenue for
each booking? Only plot customers with less than 1,500
dollars to weed out longer term stays.
Then, build a bar chart with the average room nights
stayed
for our top 5 countries.
Thanks
section06_assignments.ipynb
*Copyright Maven Analytics, LLC
SOLUTION: BASIC
CHARTS
Solution
NEW MESSAGE Code
September 20, 2022
From: Sarah Shark (Managing Director)
Subject: New Charts
Hi,
Need a few more views on the hotel data using Seaborn.
Can we look at the distribution of lodging revenue for
each booking? Only plot customers with less than 1,500
dollars to weed out longer term stays.
The build a bar chart with the average room nights stayed
for
our top 5 countries.
Thanks
section06_solutions.ipynb
*Copyright Maven Analytics, LLC
BOXPLOTS
Boxplots can be created with sns.boxplot()
• They visualize the distribution of a variable by plotting key
statistics
Q1 Media Q3
n
Min Q3+1.5*IQR
Boxplot
statistics:
• M edian (50th percentile) Max
• 1 st & 3 rd Quartiles (25th & 75 th
percentiles) Outlier
s
• Interquartile Range (IQR)
• M in & M ax Values (or 1.5x the IQR)
• Outliers
IQR
*Copyright Maven Analytics, LLC
BOXPLOTS
Boxplots can be created with sns.boxplot()
• They visualize the distribution of a variable by plotting key
statistics
Specify a second axis to
create separate boxplots
by category
*Copyright Maven Analytics, LLC
VIOLIN
PLOTS
Violin plots can be created with sns.violinplot()
• They are boxplots with symmetrical kernel densities along their
sides
*Copyright Maven Analytics, LLC
ASSIGNMENT: BOX & VIOLIN
PLOTS
Results
NEW MESSAGE Preview
September 24, 2022
From: Sarah Shark (Managing Director)
Subject: Re: New Charts
Hi,
Let’s view the distribution of lodging revenue using a boxplot
instead, once again capping the revenue at 1500.
Then filter the data to the top 5 countries and build a
violin plot of their lodging revenue, as well as their age
distribution.
Sarah
section06_assignments.ipynb
*Copyright Maven Analytics, LLC
SOLUTION: BOX & VIOLIN
PLOTS
Solution
NEW MESSAGE Code
September 24, 2022
From: Sarah Shark (Managing Director)
Subject: Re: New Charts
Hi,
Let’s view the distribution of lodging revenue using a boxplot
instead, once again capping the revenue at 1500.
Then filter the data to the top 5 countries and build a
violin plot of their lodging revenue, as well as their age
distribution.
Sarah
section06_solutions.ipynb
*Copyright Maven Analytics, LLC
LINEAR RELATIONSHIP
PLOTS
Seaborn has several plots to explore linear
relationships:
Creates a scatterplot sns.scatterplot(x, y,
data)
Creates a scatterplot with a fitted regression line sns.regplot(x, y,
data)
Create a scatterplot with a fitted regression line, and can
visualize sns.lmplot(x, y, hue, row, col,
multiple categories using color, or splitting into rows & columns data)
Creates a scatterplot and adds the distribution for each variable sns.jointplot(x, y, kind,
data)
Creates a matrix of scatterplots comparing multiple variables,
and shows the distribution for each one sns.pairplot(cols
)
*Copyright Maven Analytics, LLC
REGPLOT()
sns.regplot() creates a scatterplot with a fitted regression
line
*Copyright Maven Analytics, LLC
LMPLOT(
)
sns.lmplot() lets you explore the impact of other variables on the
relationship
Specify the ‘hue’ to
create a line for each
category in the
specified column and
set a different color
for each category
*Copyright Maven Analytics, LLC
LMPLOT(
)
sns.lmplot() lets you explore the impact of other variables on the
relationship
Specify the ‘row’ and
‘column’ to create regression
plots for each combination
of variables
PRO TIP: This type of visual is
great for exploring your data, but
way too complex for a
presentation!
*Copyright Maven Analytics, LLC
JOINTPLOT()
sns.jointplot() creates a scatterplot and adds the distribution of each
variable
The ‘kind’ argument
has several options
like ‘kde’, which
plots the kernel
densities, and ‘reg’,
which plots the
regression line
*Copyright Maven Analytics, LLC
PAIRPLOT(
)
sns.pairplot() creates a matrix of scatterplots comparing multiple variables,
and shows the distribution for each one along the diagonal
This lets you see the relationship between a
diamond’s
weight (carat) and its length (x), width (y), and depth
(z)
You can see that the weight of the diamond has a
positive relationship with height, width, and length,
with the relationships being VERY strong for width
and depth
*Copyright Maven Analytics, LLC
ASSIGNMENT: LINEAR RELATIONSHIP
PLOTS
Results
NEW MESSAGE Preview
September 26, 2022
From: Wendy Whiz (Data Scientist)
Subject: More Exploration
Hi there,
Can you produce charts to explore the relationship
between room nights and lodging revenue?
First for all the data and then for each top 5 country.
Can you also produce a pairplot comparing lodging
revenue
to several key variables? (more details in the notebook)
Best,
Wendy
section06_assignments.ipynb
*Copyright Maven Analytics, LLC
SOLUTION: LINEAR RELATIONSHIP
PLOTS
Solution
NEW MESSAGE Code
September 26, 2022
From: Wendy Whiz (Data Scientist)
Subject: More Exploration
Hi there,
Can you produce charts to explore the relationship
between room nights and lodging revenue?
First for all the data and then for each top 5 country.
Can you also produce a pairplot comparing lodging
revenue
to several key variables? (more details in the notebook)
Best,
Wendy
section06_solutions.ipynb
*Copyright Maven Analytics, LLC
HEATMAPS
Create a heatmap to visualize a table of data with
sns.heatmap()
PRO TIP: Pandas’
pivot_table method is a great
way to set up the data
needed for a heat map!
*Copyright Maven Analytics, LLC
HEATMAPS
Create a heatmap to visualize a table of data with
sns.heatmap()
You can modify rcParameters
with sns.set(), but we’ll show
the syntax for combining
Matplotlib and Seaborn
shortly!
*Copyright Maven Analytics, LLC
ASSIGNMENT:
HEATMAPS
Results
NEW MESSAGE Preview
September 26, 2022
From: Wendy Whiz (Data Scientist)
Subject: RE: More Exploration
Hi there,
Last piece to help me look at features for my modeling work.
Can you build a heatmap with countries as rows and
market segment as columns with the mean lodging
revenue for each?
Then build a heatmap for a correlation
matrix. Thanks,
Wendy
section06_assignments.ipynb
*Copyright Maven Analytics, LLC
SOLUTION:
HEATMAPS
Solution
NEW MESSAGE Code
September 26, 2022
From: Wendy Whiz (Data Scientist)
Subject: RE: More Exploration
Hi there,
Last piece to help me look at features for my modeling work.
Can you build a heatmap with countries as rows and
market segment as columns with the mean lodging
revenue for each?
Then build a heatmap for a correlation
matrix. Thanks,
Wendy
section06_solutions.ipynb
*Copyright Maven Analytics, LLC
FACETGRID
Seaborn’s FacetGrid is a convenient alternative to Matplotlib’s subplot
grids
• sns.FacetGrid(DataFrame, column, column wrap)
This creates 7 charts, one for
each
“color”, in a grid with 3
columns
*Copyright Maven Analytics, LLC
FACETGRID
Seaborn’s FacetGrid is a convenient alternative to Matplotlib’s subplot
grids
• sns.FacetGrid(DataFrame, column, column wrap)
This plots a
histogram of “price”
for each “color” in
the DataFrame
*Copyright Maven Analytics, LLC
MATPLOTLIB
INTEGRATION
You can build Seaborn plots in Matplotlib objects, which lets you customize
and integrate Seaborn charts as if they were built using Matplotlib
This creates a Matplotlib figure and axis, sets a Seaborn
style, creates a Seaborn bar chart, and then adds
Matplotlib labels
*Copyright Maven Analytics, LLC
MATPLOTLIB
INTEGRATION
You can build Seaborn plots in Matplotlib objects, which lets you customize
and integrate Seaborn charts as if they were built using Matplotlib
This lets you specify
which
axes to plot the chart
on
*Copyright Maven Analytics, LLC
KEY TAKEAWAYS
Seaborn is a user-friendly extension of Matplotlib
• It has a simple interface, nice aesthetics, and works well with Pandas DataFrames
Seaborn adds new chart types that are useful in exploring data
• Boxplots, violin plots, and linear model plots help profile data and identify relationships between
variables
Seaborn is very compatible with
Matplotlib
• Seaborn charts are extensions of Matplotlib objects, so they can be placed in Matplotlib
figures
• Matplotlib formatting arguments can passed to corresponding Seaborn plotting functions
*Copyright Maven Analytics, LLC
*Copyright Maven Analytics, LLC
PROJECT DATA: USED CARS
DATA
*Copyright Maven Analytics, LLC
ASSIGNMENT: FINAL
PROJECT
Key Objectives
NEW MESSAGE
October 10, 2022 1. Read in and manipulate data with Pandas
From: Aaron Auto (VP of Fleet Management) 2. Build summary charts with M atplotlib and
Subject: Optimal Fleet Truck Purchase Seaborn
3. Leverage Seaborn’s advanced chart types to
Hello,
mine
We need an outside analysis on auto procurement for insights from the data and make a decision
our fleet of service vehicles. We lease trucks to
contractors and other businesses, but a recent spike in
demand has meant we’re unable to get cars from
traditional suppliers.
I want to see an overview of the automotive auction
industry, before diving into where we can get Ford F150s
for the most affordable price on the market (more details
in the notebook).
Thanks
section07_final_project.ipynb
*Copyright Maven Analytics, LLC