DATA VISUALIZATION WITH
With Expert Python Instructor Chris Bruehl
*Copyright Maven Analytics, LLC
COURSE STRUCTURE
This is a project-based course for students looking for a practical, hands-on approach to
learning data visualization with Python using the Matplotlib and Seaborn libraries
Additional resources include:
Downloadable PDF to serve as a helpful reference when you’re offline or on the go
Quizzes & Assignments to test and reinforce key concepts, with step-by-step solutions
Interactive demos to keep you engaged and apply your skills throughout the course
*Copyright Maven Analytics, LLC
COURSE OUTLINE
Cover key data visualization best practices for clear communication, with tips for
1 Intro to Data Visualization choosing the right chart, formatting it effectively, and using it to tell a story
Introduce the Matplotlib library and use it to build & customize several chart types,
2 Matplotlib Fundamentals including line charts, bar charts, pie charts, scatterplots, and histograms
PROJECT: Visualizing Coffee Industry Data
3 Advanced Customization Apply advanced customization techniques in Matplotlib, including multi-chart
figures, custom layouts & colors, style sheets, and more
PROJECT: Consolidating Coffee Industry Data into a Report
Visualize data with Seaborn, another Python library that introduces new chart
4 Data Viz with Seaborn types and layouts, and interacts will with Matplotlib
PROJECT: Highlighting Insights from the Automotive Auction Industry
*Copyright Maven Analytics, LLC
WELCOME TO MAVEN CONSULTING GROUP
You’ve just been hired as an Associate Consultant for Maven Consulting Group
THE (MCG), a multinational firm that provides strategic advice to companies across
SITUATION different industries. Your new role will see you take on projects in the hotel,
coffee, automotive, and diamond industries.
Your task is to effectively visualize data from these industries to deliver key
THE insights to MCG’s clients.
ASSIGNMENT This will range from analyzing hotel customer demographics to understanding the
major players in the global coffee industry.
• Use Pandas to read & manipulate multiple datasets
THE
• Use Matplotlib to visualize data & communicate insights,
OBJECTIVES and then build reports to consolidate your findings
• Use Seaborn to conduct advanced exploratory analysis
and aid the decision-making process
*Copyright Maven Analytics, LLC
SETTING EXPECTATIONS
This course covers the core functionality for Matplotlib & Seaborn
• We’ll cover chart types, common customization options, and best practices for visualizing and analyzing data
• We’ll give the tools to use the official documentation to apply any customization option not covered in the course
We’ll focus on creating static visuals & dashboards
• Interactive data visualization with Python will be covered in a separate course
We’ll use Jupyter Notebooks as our primary coding environment
• Jupyter Notebooks are free to use, and the industry standard for conducting data analysis with Python
(we’ll introduce Google Colab as an alternative, cloud-based environment as well)
You do NOT need to be a Python expert to take this course
• It is strongly recommended that you complete our Python Foundations and Data Analysis with Pandas courses, or
have a solid understanding of basic Python syntax and DataFrame manipulation with the Pandas library
*Copyright Maven Analytics, LLC
*Copyright Maven Analytics, LLC
INSTALLING ANACONDA (MAC)
1) Go to anaconda.com/products/distribution and click
4) Follow the installation steps
(default settings are OK)
2) Click X on the Anaconda Nucleus pop-up
(no need to launch)
3) Launch the downloaded Anaconda pkg file
*Copyright Maven Analytics, LLC
INSTALLING ANACONDA (PC)
1) Go to anaconda.com/products/distribution and click
4) Follow the installation steps
(default settings are OK)
2) Click X on the Anaconda Nucleus pop-up
(no need to launch)
3) Launch the downloaded Anaconda exe file
*Copyright Maven Analytics, LLC
LAUNCHING JUPYTER
1) Launch Anaconda Navigator 2) Find Jupyter Notebook and click
*Copyright Maven Analytics, LLC
YOUR FIRST JUPYTER NOTEBOOK
1) Once inside the Jupyter interface, create a folder to store your notebooks for the course
NOTE: You can rename your folder by clicking “Rename” in the top left corner
2) Open your new coursework folder and launch your first Jupyter notebook!
NOTE: You can rename your notebook by clicking on the title at the top of the screen
*Copyright Maven Analytics, LLC
THE NOTEBOOK SERVER
NOTE: When you launch a Jupyter notebook, a terminal window may pop up as
well; this is called a notebook server, and it powers the notebook interface
If you close the server window,
your notebooks will not run!
Depending on your OS, and method
of launching Jupyter, one may not
open. As long as you can run your
notebooks, don’t worry!
*Copyright Maven Analytics, LLC
ALTERNATIVE: GOOGLE COLAB
Google Colab is Google’s cloud-based version of Jupyter Notebooks
To create a Colab notebook:
1. Log in to a Gmail account
2. Go to colab.research.google.com
3. Click “new notebook”
Colab is very similar to Jupyter Notebooks
(they even share the same file extension); the main
difference is that you are connecting to Google
Drive rather than your machine, so files will be
stored in Google’s cloud
*Copyright Maven Analytics, LLC
*Copyright Maven Analytics, LLC
DATA VISUALIZATION
In this section we’ll cover key data visualization best practices for clear communication,
with tips for choosing the right chart, formatting it effectively, and using it to tell a story
TOPICS WE’LL COVER: GOALS FOR THIS SECTION:
• Understand the purpose behind visualizing data
• Learn the common chart types and their use cases
• Apply data visualization best practices to create clear
and compelling charts
• Address common errors and how to avoid them
*Copyright Maven Analytics, LLC
WHY VISUALIZE DATA?
Data visualization allows you to bring your data to life
• The human brain is built to interpret raw data as meaningless numbers and noise
• We need clear patterns and visual cues to help us quickly make sense of complex information
Prefrontal Cortex Visual Cortex
• Located in the frontal lobe • Located in the occipital lobe
• Responsible for cognitive • Responsible for visual perception
functioning & problem solving & understanding
• Helps us make sense of non-visual • Helps us make sense of colors,
information (like raw data) patterns, shapes, sizes, etc.
• Slow & conscious • Instantaneous & subconscious
Data visualization puts both our prefrontal and visual cortex to work, combining
the power of cognition (slow and conscious) and perception (instantaneous)
*Copyright Maven Analytics, LLC
THE TEN SECOND RULE
In 10 seconds, what can you learn from the data below?
0 TIME’S UP!
10
*Copyright Maven Analytics, LLC
THE TEN SECOND RULE
What if you were given the averages?
*Copyright Maven Analytics, LLC
THE TEN SECOND RULE
What if you visualize it?
This is a slight twist on
Anscombe’s Quartet
Despite sharing nearly
identical descriptive stats,
each series tells a very
different visual story
*Copyright Maven Analytics, LLC
THE 3 KEY QUESTIONS
The 3 key questions are a great way to help choose the right visual
What type of data are What do you want to Who is the end user and
you working with? communicate? what do they need?
• Time-series • Comparison • Analyst
Data that spans across Compares values over time or Likes to see details and understand
continuous time periods across categories what’s happening at a granular level
• Categorical • Composition • Manager
Data that can be split up into Breaks down the component Wants summarized information
groups or categories parts of a whole with clear, actionable insights
• Numeric • Distribution • Executive
Data with quantitative values, Shows the frequency of values Needs high-level, clear KPIs to track
either discrete or continuous within a series business health and performance
• Hierarchical • Relationship • General Public
Data with natural groups and Shows the correlation between Requires engaging visuals and a
sub-groups multiple variables clear story to follow
*Copyright Maven Analytics, LLC
ESSENTIAL VISUALS
KPI CARD PIE CHART TABLE
Sometimes Sort the slices, keep Add a color scale to
simple text them under ~5, and highlight patterns in
works best focus on one the data
LINE CHART BAR CHART SCATTER PLOT
Remember that
correlation does not
imply causation
The dates must be
continuous
Baseline must start at zero
AREA CHART 100% STACKED HISTOGRAM
Comparison &
composition Avoid using too
many bins!
*Copyright Maven Analytics, LLC
CHART FORMATTING
Chart formatting should be used to eliminate noise & facilitate understanding
BEFORE: Cluttered chart This is the right chart type… so why is it
so hard to understand the visual?
× The chart border and gridlines are more
distracting than useful
× The vertical axis labels are hard to read
and lack context – it’s using scientific
notation and doesn’t start at 0
× Data labels can help add context, but they
just add noise here
× It’s not clear what each line represents
PRO TIP: Be intentional about the formatting you apply – don’t just use the default settings!
*Copyright Maven Analytics, LLC
CHART FORMATTING
Chart formatting should be used to eliminate noise & facilitate understanding
AFTER: Clear chart
PRO TIPS:
✓ Remove the chart border & gridlines
✓ Format the axis labels clearly
✓ Add context with the chart title
✓ Create a visual order
✓ Make sure the story is clear
“Perfection is achieved not when there is nothing more to add, but when there is nothing left to take away”
Antoine de Saint-Exupery
*Copyright Maven Analytics, LLC
STORYTELLING
Descriptive titles and data labels can be used to tell a clear story within your visuals
AFTER: Compelling chart
PRO TIPS:
✓ Leverage the title to guide the audience
toward specific insights
✓ Insert text & shapes directly inside the chart
✓ Use data labels and annotations to draw
attention to the main data points
✓ Use color strategically
*Copyright Maven Analytics, LLC
COMMON ERRORS
Choosing the wrong visual to represent the type of data
Using a line chart, which is
meant for time series data,
with categorical data gives the
false sense of a trend
Bar charts are great for showing
comparison with categorical data
While a tree map can work,
comparisons and compositions are
harder to make than with a bar or
pie chart
It’s best to use them with PRO TIP: Don’t prioritize
hierarchical data variety over effectiveness; use
the right chart for the job!
*Copyright Maven Analytics, LLC
COMMON ERRORS
Including too many series in a single visual
It’s hard to focus or extract
any valuable information
Try highlighting the series
you want, or aggregating
other categories
You can also group the other
categories into a single series
*Copyright Maven Analytics, LLC
COMMON ERRORS
Providing little to no context with text and labels
What does each
line represent?
What are
these values?
What does each
period represent?
When removing elements from a chart to reduce clutter and noise,
remember to keep all the elements that add understanding
*Copyright Maven Analytics, LLC
COMMON ERRORS
Using inconsistent colors between related visuals
Using different colors for the same series
makes it difficult to associate them visually
Consistency gains more
importance as the number
of visuals increases, making
it critical for dashboards
Using the same colors consistently makes
them easier to understand, and in some
cases allows you to remove the legend
*Copyright Maven Analytics, LLC
KEY TAKEAWAYS
Always answer the 3 key questions to choose the right visual
• What type of data are you working with? What do you want to communicate? Who is the end user?
Do NOT prioritize variety over effectiveness
• Choose chart types based on how clearly they communicate the data underneath – you can customize later!
Eliminate noise and distractions to facilitate understanding
• “Perfection is achieved not when there is nothing more to add, but when there is nothing left to take away”
Tell a story with the data to guide the user to the insights
• Use titles, strategic labels, and callouts to create a clear narrative
*Copyright Maven Analytics, LLC
*Copyright Maven Analytics, LLC
INTRO TO MATPLOTLIB
In this section we’ll introduce the Matplotlib library and use it to build & customize several
chart types, including line charts, bar charts, pie charts, scatterplots, and histograms
TOPICS WE’LL COVER: GOALS FOR THIS SECTION:
• Understand the difference between the two primary
Matplotlib plotting frameworks
• Identify the key components of an object-oriented plot
• Build different variations of line, bar and pie charts, as
well as scatterplots and histograms
• Customize your charts by adding custom titles, labels,
legends, annotations and much more!
*Copyright Maven Analytics, LLC
MEET MATPLOTLIB
Matplotlib is an open-source Python library built for data visualization that lets you
produce a wide variety of highly customizable charts & graphs
‘plt’ is the standard alias for Matplotlib
The plot() function creates a line
chart by default, using the index
as the x-values and the list
elements as the y-values
*Copyright Maven Analytics, LLC
COMPATIBLE DATA TYPES
Matplotlib can plot many data types, including base Python sequences, NumPy
Arrays, and Pandas Series & DataFrames
Python List Pandas Series Pandas DataFrame
*Copyright Maven Analytics, LLC
PLOTTING METHODS
Matplotlib has two plotting methods, or interfaces:
Charts are created with the plot() function, Charts are created by defining a plot object,
and modified with additional functions and modified using figure & axis methods
1. Create the figure object and assign it to
the ‘fig’ variable
2. Add a chart, or axis, object to the figure
and assign it to the ‘ax’ variable
3. Call the axis plot() method to draw the
chart
We’ll mostly focus on the
Object-Oriented approach,
as it provides more clear
control over customization
*Copyright Maven Analytics, LLC
OBJECT-ORIENTED PLOTTING
Object-Oriented plots are built by adding axes, or charts, to a figure
• The subplots() function lets you create the figure and axes in a single line of code
• You can then use figure & axis methods to customize the different elements in the plot
Creates the figure and axis
Plots “y”
Adds a title to the figure and axis
We’ll start by adding a single
subplot to each figure for now,
but will dive deeper into
subplots later in the course!
*Copyright Maven Analytics, LLC
PLOTTING DATAFRAMES
When plotting DataFrames using the Object-Oriented interface, Matplotlib will
use the index as the x-axis and plot each column as a separate series by default
*Copyright Maven Analytics, LLC
PLOTTING DATAFRAMES
Plotting each series independently allows for improved customization
• ax.plot(x-axis series, y-series values)
*Copyright Maven Analytics, LLC
ASSIGNMENT: PLOTTING DATAFRAMES
Results Preview
NEW MESSAGE
August 29, 2022
From: Ian Intern (Summer Consultant)
Subject: Do you know Matplotlib?
Hi!
I need someone who knows Matplotlib for help with some
client work.
Can you plot Lodging Revenue and Other Revenue over time
for our hotel client?
Thanks!
section02_assignments.ipynb
*Copyright Maven Analytics, LLC
SOLUTION: PLOTTING DATAFRAMES
Solution Code
NEW MESSAGE
August 29, 2022
Plot Each Series
From: Ian Intern (Summer Consultant)
Subject: Do you know Matplotlib?
Hi!
I need someone who knows Matplotlib for help with some Plot The DataFrame
client work.
Can you plot Lodging Revenue and Other Revenue over time
for our hotel client?
Thanks!
section02_solutions.ipynb
*Copyright Maven Analytics, LLC
FORMATTING OPTIONS
Matplotlib has these formatting options for PyPlot and Object-Oriented plots:
Figure Title
Y-axis Tick
Legend Figure Title fig.suptitle() plt.suptitle()
Axis Title Chart Title ax.set_title() plt.subtitle()
Y-axis Label X-Axis Label ax.set_xlabel() plt.xlabel()
Y-Axis Label ax.set_ylabel() plt.ylabel()
Legend ax.legend() plt.legend()
Text
X-Axis Limit ax.set_xlim() plt.xlim()
Y-Axis Limit ax.set_ylim() plt.ylim()
Axes X-Axis Ticks ax.set_xticks() plt.xticks()
Figure Y-Axis Ticks ax.set_yticks() plt.yticks()
Vertical Line
Vertical Line ax.axvline() plt.axvline()
Horizontal Line ax.axhline() plt.axhline()
X-axis Tick spine[‘bottom’]
Text ax.text() plt.text()
X-axis Label
Spines (borders) ax.spines[‘side’] plt.spines[‘side’]
*Copyright Maven Analytics, LLC
CHART TITLES
The set_title() and set_label() methods let you add chart titles and axis labels
• fig.suptitle() serves as an overall figure title
*Copyright Maven Analytics, LLC
FONT SIZES
You can modify chart font sizes with the “fontsize” argument
• You can specify the size in points (10, 12, etc.) or relative size (“smaller”, “x-large”, etc.)
*Copyright Maven Analytics, LLC
CHART LEGENDS
The legend() method lets you add a chart legend to identify each series
• The series labels are used by default, but custom values can also be passed through
*Copyright Maven Analytics, LLC
CHART LEGENDS
The legend() method lets you add a chart legend to identify each series
• The series labels are used by default, but custom values can also be passed through
*Copyright Maven Analytics, LLC
LEGEND LOCATION
You can change the legend location with the “loc” or “bbox_to_anchor” arguments
• “loc” lets you set a predetermined location option
• “bbox_to_anchor” lets you set specific (x, y) coordinates
1
best (default)
upper right
upper left
upper center
lower right
lower left
lower center
center right
center left 0
center bbox
0 1
*Copyright Maven Analytics, LLC
LEGEND LOCATION
You can change the legend location with the “loc” or “bbox_to_anchor” arguments
• “loc” lets you set a predetermined location option
• “bbox_to_anchor” lets you set specific (x, y) coordinates
*Copyright Maven Analytics, LLC
LEGEND LOCATION
You can change the legend location with the “loc” or “bbox_to_anchor” arguments
• “loc” lets you set a predetermined location option
• “bbox_to_anchor” lets you set specific (x, y) coordinates
*Copyright Maven Analytics, LLC
LEGEND LOCATION
You can change the legend location with the “loc” or “bbox_to_anchor” arguments
• “loc” lets you set a predetermined location option
• “bbox_to_anchor” lets you set specific (x, y) coordinates
*Copyright Maven Analytics, LLC
LEGEND LOCATION
You can change the legend location with the “loc” or “bbox_to_anchor” arguments
• “loc” lets you set a predetermined location option
• “bbox_to_anchor” lets you set specific (x, y) coordinates
Setting coordinates beyond 1 will push the
legend outside the chart area
(useful when there is no whitespace!)
*Copyright Maven Analytics, LLC
LINE STYLE
You can change the line style with the “linestyle”, “linewidth”, and “color” arguments
• Common line styles are “solid”, “dashed”, or “dotted” (you can also use “-”, “--”, or “:”)
We will dive into colors in depth later, including changing
the default color palette and using hex color codes!
*Copyright Maven Analytics, LLC
AXIS LIMITS
The set_ylim() and set_xlim() functions let you modify the axis limits
• ax.set_xlim(lower limit, upper limit)
Your date x-axis ticks may change interval size!
PRO TIP: Keeping the base of the y-axis at 0
highlights the true magnitude of change across
periods and the differences between series
*Copyright Maven Analytics, LLC
FIGURE SIZE
You can adjust the figure size with the “figsize” argument
• figsize=(width, height) – the default is 6.4 x 4.8 inches
PRO TIP: Increasing figure size lets you add
whitespace to your visual, which can reduce
clutter and add space to crowded axes
*Copyright Maven Analytics, LLC
CUSTOM X-TICKS
You can apply custom x-ticks with the set_xticks() and xticks() functions
• ax.set_xticks(iterable)
This sets the xticks at every 2nd date from
the index and rotates them by 45 degrees
*Copyright Maven Analytics, LLC
ADDING VERTICAL LINES
You can add vertical lines to mark key points with the axvline() function
Set the coordinate (in this case days since Jan 1, 1970)
and an optional color and style
*Copyright Maven Analytics, LLC
TEXT
You can add text at specific coordinates with the text() function
• ax.text(x-coordinate, y-coordinate, string, additional text formatting)
*Copyright Maven Analytics, LLC
PRO TIP: ANNOTATIONS
Annotations are a great way to call-out and label important datapoints
• ax.annotate(string, datapoint coordinate, text coordinate, arrow style dictionary, text formatting)
Annotations have many more options that we won’t cover in depth,
but the documentation has great examples worth looking into!
For a more info on annotations, visit: https://matplotlib.org/stable/tutorials/text/annotations.html#sphx-glr-tutorials-text-annotations-py *Copyright Maven Analytics, LLC
REMOVING CHART BORDERS
You can remove specific chart borders with ax.spines[].set_visible(False)
This removes the right and top borders
*Copyright Maven Analytics, LLC
ASSIGNMENT: CHART FORMATTING
Results Preview
NEW MESSAGE
August 30, 2022
From: Ian Intern (Summer Consultant)
Subject: RE: Final Charts for Client
Hi there!
The data you plotted earlier looks good, but can you clean up
the chart a little bit? I want it to to look polished for our client.
This is my last day in my summer internship and I want to get
hired back!
Thanks!
section02_assignments.ipynb
*Copyright Maven Analytics, LLC
SOLUTION: CHART FORMATTING
Solution Code
NEW MESSAGE
August 30, 2022
From: Ian Intern (Summer Consultant)
Subject: Final Charts for Client
Hi there!
The data you plotted earlier looks good, but can you clean up
the chart a little bit! Want to to look polished for our client.
This is my last day in my summer internship and I want to get
hired back!
Thanks!
section02_solutions.ipynb
*Copyright Maven Analytics, LLC
LINE CHARTS
Line charts are used for showing trends over time
• ax.plot(x-axis series, series values, formatting options)
Column for each series
Dates as the index
PRO TIPS
Pivot tabular data to turn each unique series into a DataFrame column, and set the datetime as the index
Divide your series by the appropriate units while plotting to simplify the y-axis scale
*Copyright Maven Analytics, LLC
LINE CHARTS
EXAMPLE Available Housing Units by Week
*Copyright Maven Analytics, LLC
STACKED LINE CHARTS
Use stackplot() to create a stacked line chart, which lets you visualize the overall
trend over time, as well as its composition by series
*Copyright Maven Analytics, LLC
STACKED LINE CHARTS
Use stackplot() to create a stacked line chart, which lets you visualize the overall
trend over time, as well as its composition by series
PRO TIP: Use the bottom series in the
stacked line chart to draw focus to its
individual trend – it’s the most visible!
*Copyright Maven Analytics, LLC
PRO TIP: DUAL AXIS CHARTS
Use twinx() to create a dual axis chart, which lets you plot series with values on
significantly different scales inside a single visual
The “Inventory” values are so small compared to “Price” that
they appear to be 0 when plotted on the same y-axis
*Copyright Maven Analytics, LLC
PRO TIP: DUAL AXIS CHARTS
Use twinx() to create a dual axis chart, which lets you plot series with values on
significantly different scales inside a single visual
Create a second axis (ax2) with ax.twinx(),
then create the desired plot on ax2
Note that using the figure level
legend picks up both series
*Copyright Maven Analytics, LLC
ASSIGNMENT: LINE CHARTS
Results Preview
NEW MESSAGE
August 30, 2022
From: Ian Intern (Summer Consultant)
Subject: Re: Re: Final Charts for Client
Hey again,
Great work on those charts!
Final request - we want to plot compare room nights booked
vs cancellations over time, we might need a dual axis chart to
effectively do this. I’m totally checked out, so can you do this?
You’ll be put in contact with the client soon.
Thanks!
section02_assignments.ipynb
*Copyright Maven Analytics, LLC
SOLUTION: LINE CHARTS
Solution Code
NEW MESSAGE
August 30, 2022
From: Ian Intern (Summer Consultant)
Subject: Re: Re: Final Charts for Client
Hey again,
Great work on those charts!
Final request - we want to plot compare room nights booked
vs cancellations over time, we might need a dual axis chart to
effectively do this. I’m totally checked out, so can you do this?
You’ll be put in contact with the client soon.
Thanks!
section02_solutions.ipynb
*Copyright Maven Analytics, LLC
BAR CHARTS
Bar charts are used to compare values across different categories
• ax.bar(category labels, bar heights, formatting options)
Values in a single column
Categories as the index
PRO TIPS
Use .groupby() and .agg() to aggregate your data by category and push the labels into the index
Use Seaborn or the Pandas plot API for grouped bar charts
*Copyright Maven Analytics, LLC
BAR CHARTS
EXAMPLE Median Home Price by City
*Copyright Maven Analytics, LLC
PRO TIP: HORIZONTAL LINES
Use axhline() to add a horizontal line at a specified y-value on a bar chart
• This will typically be something to benchmark against, like a mean or target
*Copyright Maven Analytics, LLC
HORIZONTAL BAR CHARTS
Use barh() to create a horizontal bar chart
Note that the Series in a horizontal bar chart are
sorted in the opposite order as in a vertical bar chart
*Copyright Maven Analytics, LLC
PRO TIP: HIGHLIGHTS
Use the “color” argument to highlight the series you’d like to focus on
Use a list to specify the color for each Series
*Copyright Maven Analytics, LLC
ASSIGNMENT: BAR CHARTS
Results Preview
NEW MESSAGE
September 1, 2022
From: Sarah Shark (Managing Director)
Subject: CHARTS NEEDED ASAP
Hello,
Our hotel client is concerned about our intern’s departure.
I need YOU to step up and make sure they’re happy with us.
Start by taking a quick look at room nights and lodging by
country for our top 10 countries by total nights booked.
I expect the results in my inbox by morning (more details in
the notebook attached).
-S
section02_assignments.ipynb
*Copyright Maven Analytics, LLC
ASSIGNMENT: BAR CHARTS
Solution Code
NEW MESSAGE
September 1, 2022
From: Sarah Shark (Managing Director)
Subject: CHARTS NEEDED ASAP
Hello,
Our hotel client is concerned about our intern’s departure.
I need YOU to step up and make sure they’re happy with us.
Start by taking a quick look at room nights and lodging by
country for our top 10 countries by total nights booked.
I expect the results in my inbox by morning (more details in
the notebook attached).
-S
section02_solutions.ipynb
*Copyright Maven Analytics, LLC
STACKED BAR CHARTS
You can create a stacked bar chart by setting the “bottom” argument for the
second “stacked” series as the values from the bars below it
• This will use those values as the baseline for the stacked bars instead of the x-axis
The Oregon bars are plotted by using the
California values as their “bottom”
*Copyright Maven Analytics, LLC
100% STACKED BAR CHARTS
To create a 100% stacked bar chart, convert your DataFrame to row-level
percentages before plotting
*Copyright Maven Analytics, LLC
PRO TIP: GROUPED BAR CHARTS
You can create a grouped bar chart by reducing the width of each series and
shifting them evenly around their corresponding label
This shifts the bars to the left across
the x-axis by half their width
This shifts these bars to the right
Grouped bar charts are much easier to create
by using Seaborn or Pandas’ Matplotlib API
*Copyright Maven Analytics, LLC
PRO TIP: COMBO CHARTS
You can create a combo chart by specifying different chart types in a dual axis plot
PRO TIP: Use the “alpha” argument to
modify the transparency of each plot
(0 is invisible and 1 is solid)
*Copyright Maven Analytics, LLC
ASSIGNMENT: ADVANCED BAR CHARTS
Results Preview
NEW MESSAGE
September 2, 2022
From: Sarah Shark (Managing Director)
Subject: RE: RE: CHARTS NEEDED ASAP
Hello,
Nice work…so far. I need some more detailed views on the
breakdown of lodging revenue vs. other revenue by country.
Build a grouped bar chart with the lodging revenue and other
revenue for each country. Then, build a 100% stacked bar
chart showing how much each revenue category contributes
to overall country revenue. Add a reference line at 80% to
help illustrate which countries get less than 80% of their
revenue from lodging.
-S
section02_assignments.ipynb
*Copyright Maven Analytics, LLC
SOLUTION: ADVANCED BAR CHARTS
Solution Code
NEW MESSAGE
September 2, 2022
From: Sarah Shark (Managing Director)
Subject: RE: RE: CHARTS NEEDED ASAP
Hello,
Nice work…so far. I need some more detailed views on the
breakdown of lodging revenue vs. other revenue by country.
Build a grouped bar chart with the lodging revenue and other
revenue for each country. Then, build a 100% stacked bar
chart showing how much each revenue category contributes
to overall country revenue. Add a reference line at 80% to
help illustrate which countries get less than 80% of their
revenue from lodging.
-S
section02_solutions.ipynb
*Copyright Maven Analytics, LLC
PIE CHARTS
Pie charts are used to compare proportions totaling 100%
• ax.pie(series values, labels= , startangle= , autopct=, pctdistance=, explode=)
Values in a single column
Labels as the index
PRO TIPS
Keep the number of slices low (<7) to enhance readability – you can group “others” into a single slice
Use bar charts if you want to compare the categories – pies are for showing how they make up a whole
Donut charts make great KPI progress trackers
*Copyright Maven Analytics, LLC
PIE CHARTS
EXAMPLE Homes Sold by City
*Copyright Maven Analytics, LLC
PRO TIP: DONUT CHARTS
You can create a donut chart by adding a “hole” to a pie chart and shifting the labels
How does this code work?
• It pushes the data labels 85% of the way towards the edge of the pie chart
• Then adds a white circle that covers the center of the pie chart to the figure
*Copyright Maven Analytics, LLC
ASSIGNMENT: PIE & DONUT CHARTS
Results Preview
NEW MESSAGE
September 3, 2022
From: Sarah Shark (Managing Director)
Subject: UPDATED CHARTS
Hello,
Our hotel client is looking for a pie/donut chart to represent
the share of revenue by country.
Create a pie chart with slices for the top 5 countries by
revenue, and a single “other” slice for the rest of the countries.
Need it ASAP.
Thx
section02_assignments.ipynb
*Copyright Maven Analytics, LLC
SOLUTION: PIE & DONUT CHARTS
Solution Code
NEW MESSAGE
September 3, 2022
From: Sarah Shark (Managing Director)
Subject: UPDATED CHARTS
Hello,
Our hotel client is looking for a pie/donut chart to represent
the share of revenue by country.
Create a pie chart with slices for the top 5 countries by
revenue, and a single “other” slice for the rest of the countries.
Need it ASAP.
Thx
section02_solutions.ipynb
*Copyright Maven Analytics, LLC
SCATTERPLOTS
Scatterplots are used to visualize the relationship between numerical variables
• ax.scatter(x-axis series, y-axis series, size= , alpha=)
One row per point x-series y-series
PRO TIPS
Modify the alpha (transparency) level to make overlapping points more visible
Bubble charts can be useful in some cases, but they often add confusion rather than clarity
*Copyright Maven Analytics, LLC
SCATTERPLOTS
EXAMPLE Months of Supply vs. Median List Price
*Copyright Maven Analytics, LLC
BUBBLE CHARTS
To create a bubble chart, specify a third series in the “size” argument of .scatter()
• You may need to apply some arithmetic to adjust the bubble sizes
*Copyright Maven Analytics, LLC
HISTOGRAMS
Histograms are used to visualize the distribution of a numeric variable
• ax.hist(series, density= , alpha=, bins=)
numerical series
PRO TIPS
Modify the alpha (transparency) level to plot multiple distributions on the same axis
Set density=True to use relative frequencies on the y-axis (percent of total)
*Copyright Maven Analytics, LLC
HISTOGRAMS
EXAMPLE Distribution Y-o-Y Growth in Home Price for Calendar Weeks
*Copyright Maven Analytics, LLC
ASSIGNMENT: SCATTERPLOTS & HISTOGRAMS
Results Preview
NEW MESSAGE
September 4, 2022
From: Sarah Shark (Managing Director)
Subject: Additional Customer Profiling
Not bad rookie – thanks for the quick turnaround.
I need two more charts to help finalize a marketing strategy
targeting overseas guests:
1. A chart comparing average revenue per customer and
average nights stayed, with average nightly revenue as
the size of the bubbles (you’ll need to aggregate the data
by country)
2. The distribution of customer ages in France & Germany
-sent from my yPhone
section02_assignments.ipynb
*Copyright Maven Analytics, LLC
SOLUTION: SCATTERPLOTS & HISTOGRAMS
Solution Code
NEW MESSAGE
September 4, 2022
From: Sarah Shark (Managing Director)
Subject: Additional Customer Profiling
Not bad rookie – thanks for the quick turnaround.
I need two more charts to help finalize a marketing strategy
targeting overseas guests:
1. A chart comparing average revenue per customer and
average nights stayed, with average nightly revenue as
the size of the bubbles (you’ll need to aggregate the data
by country)
2. The distribution of customer ages in France & Germany
-sent from my yPhone
section02_solutions.ipynb
*Copyright Maven Analytics, LLC
KEY TAKEAWAYS
Matplotlib has two methods for plotting data: PyPlot API & Object Oriented
• Both can visualize many data types (lists, DataFrames, etc.), but object-oriented plots are easier to fully customize
Object Oriented plots are built by adding axes to a figure
• You can layer on different elements to these objects to modify the chart formatting
You can create common chart types by using Matplotlib functions
• Each chart type can be customized further to create more advanced variations
Matplotlib's extreme customizability also adds complexity
• Understanding the anatomy of a Matplotlib figure helps pinpoint how to change every component in your chart
*Copyright Maven Analytics, LLC
*Copyright Maven Analytics, LLC
PROJECT DATA: COFFEE PRODUCTION
*Copyright Maven Analytics, LLC
PROJECT DATA: COFFEE IMPORTS
*Copyright Maven Analytics, LLC
PROJECT DATA: COFFEE PRICES
*Copyright Maven Analytics, LLC
ASSIGNMENT: MID-COURSE PROJECT
Key Objectives
NEW MESSAGE
September 7, 2022 1. Read in data from multiple csv files
From: Sarah Shark (Managing Director) 2. Reshape the data to prepare it for visualization
Subject: Coffee Industry Deep Dive
3. Build & customize charts to communicate the
key insights to the client
Hi there,
I’m starting to trust you… which is rare. We just got an inquiry
from a major coffee trader looking to get an outside view on
the coffee industry. They’re particularly interested in Brazil’s
production relative to other nations.
We’ll also look at a comparison of importer volume vs the
prices they pay to understand if we can unlock margin by
diversifying into new markets.
Do well on this and you’ll be on promotion track.
section03_coffee_project_part1.ipynb
*Copyright Maven Analytics, LLC
*Copyright Maven Analytics, LLC
ADVANCED CUSTOMIZATION
In this section we’ll cover advanced customization techniques in Matplotlib, including
multi-chart figures, custom layouts & colors, style sheets, and more
TOPICS WE’LL COVER: GOALS FOR THIS SECTION:
• Understand how to build multi-chart figures both with
subplots and GridSpec layouts
• Learn how to customize chart colors, by leveraging
custom colormaps and creating your own!
• Take a look at pre-built stylesheets, and dive into the
settings behind them that allow for extreme chart
customization
*Copyright Maven Analytics, LLC
SUBPLOTS
Subplots let you create a grid of equally sized charts in a single figure
• fig, ax = plt.subplots(rows, columns) – this creates a grid with the specified rows & columns
Column 0 Column 1
Row 0 (0, 0) (0, 1)
Row 1 (1, 0) (1, 1)
This creates a 2 row, 2 column
grid that can be populated with
individual charts
*Copyright Maven Analytics, LLC
SUBPLOTS
Subplots let you create a grid of equally sized charts in a single figure
• fig, ax = plt.subplots(rows, columns) – this creates a grid with the specified rows & columns
(0, 0) (0, 1)
(1, 0) (1, 1)
Specify ax[row][column] to create
and modify individual subplots
*Copyright Maven Analytics, LLC
SUBPLOTS
Subplots let you create a grid of equally sized charts in a single figure
• fig, ax = plt.subplots(rows, columns) – this creates a grid with the specified rows & columns
*Copyright Maven Analytics, LLC
SUBPLOTS
Use the “sharex “& “sharey” arguments to set the same axis limits on all the plots
• This is set as “none” by default, but can be set to “all”, “row”, or “col”
*Copyright Maven Analytics, LLC
SUBPLOTS
Subplots can be any chart type, and do not have to be the same type
*Copyright Maven Analytics, LLC
ASSIGNMENT: SUBPLOTS
Results Preview
NEW MESSAGE
September 10, 2022
From: Wendy Whiz (Data Scientist)
Subject: Deeper Exploration
Hey there,
I want to get a quick read on the distribution of revenue by
customer for our top 5 countries – I’m working on a model for
a similar client and want to see if the distributions are similar.
Doesn’t need to be polished, just need the 5 histograms in a
single figure.
Thanks, and looking forward to working with you more!
Wendy
Section04_assignments.ipynb
*Copyright Maven Analytics, LLC
SOLUTION: SUBPLOTS
Solution Code
NEW MESSAGE
September 10, 2022
From: Wendy Whiz (Data Scientist)
Subject: Deeper Exploration
Hey there,
I want to get a quick read on the distribution of revenue by
customer for our top 5 countries – I’m working on a model for
a similar client and want to see if the distributions are similar.
Doesn’t need to be polished, just need the 5 histograms in a
single figure.
Thanks, and looking forward to working with you more!
Wendy
Section04_solutions.ipynb
*Copyright Maven Analytics, LLC
GRIDSPEC
You can build layouts with charts of varying sizes by setting a gridspec object
• This creates a grid with a specified number of rows & columns
Column 0 Column 1 Column 2 Column 3
Row 0
Row 1
Row 2
Row 3
Row 4
Row 5
Row 6
Row 7
*Copyright Maven Analytics, LLC
GRIDSPEC
You can build layouts with charts of varying sizes by setting a gridspec object
• This creates a grid with a specified number of rows & columns
• Each axis, or chart, can then occupy a group of squares in the grid
Column 0 Column 1 Column 2 Column 3
Row 0
Row 1
ax1
Row 2
Use a slice to specify the ranges of Row 3
rows and columns for each axis
Row 4
Row 5
Row 6
Row 7
*Copyright Maven Analytics, LLC
GRIDSPEC
You can build layouts with charts of varying sizes by setting a gridspec object
• This creates a grid with a specified number of rows & columns
• Each axis, or chart, can then occupy a group of squares in the grid
Column 0 Column 1 Column 2 Column 3
Row 0
Row 1
ax1 ax2
Row 2
Row 3
Row 4
Row 5
Row 6
Row 7
*Copyright Maven Analytics, LLC
GRIDSPEC
You can build layouts with charts of varying sizes by setting a gridspec object
• This creates a grid with a specified number of rows & columns
• Each axis, or chart, can then occupy a group of squares in the grid
Column 0 Column 1 Column 2 Column 3
Row 0
Row 1
ax1 ax2
Row 2
Row 3
Row 4
Row 5
ax3
Row 6
Row 7
*Copyright Maven Analytics, LLC
GRIDSPEC
You can build layouts with charts of varying sizes by setting a gridspec object
• This creates a grid with a specified number of rows & columns
• Each axis, or chart, can then occupy a group of squares in the grid
*Copyright Maven Analytics, LLC
GRIDSPEC
You can build layouts with charts of varying sizes by setting a gridspec object
• This creates a grid with a specified number of rows & columns
• Each axis, or chart, can then occupy a group of squares in the grid
*Copyright Maven Analytics, LLC
ASSIGNMENT: GRIDSPEC
Results Preview
NEW MESSAGE
September 12, 2022
From: Sarah Shark (Managing Director)
Subject: Revenue Report Format
Hi there,
Big meeting with our hotel client coming up – we want to
propose a report format that will help track their revenue,
specifically with respect to their goal to get French customers
to surpass German customers.
Can you create a figure with a line chart tracking revenue by
category, a bar chart with revenue for the top 5 countries, and
a chart indicating progress towards our French revenue goal?
Thanks!
section04_assignments.ipynb
*Copyright Maven Analytics, LLC
SOLUTION: GRIDSPEC
Solution Code
NEW MESSAGE
September 12, 2022
GridSpec Layout (see notebook for chart code):
From: Sarah Shark (Managing Director)
Subject: Revenue Report Format
Hi there,
Big meeting with our hotel client coming up – we want to
propose a report format that will help track their revenue,
specifically with respect to their goal to get French customers
to surpass German customers.
Can you create a figure with a line chart tracking revenue by
category, a bar chart with revenue for the top 5 countries, and
a chart indicating progress towards our French revenue goal?
Thanks!
section04_solutions.ipynb
*Copyright Maven Analytics, LLC
COLORS
You can pass colors to a plot by assigning them to a list
This assigns each color in the
list to each bar in the plot
*Copyright Maven Analytics, LLC
COLORS
You can also loop through a list of colors to pass them to separate series in a plot
*Copyright Maven Analytics, LLC
COLORS
Hex codes can be used to supply specific color pantones
PRO TIP: Sites like Google have
helpful hexadecimal color pickers
*Copyright Maven Analytics, LLC
PRO TIP: COLOR PALETTES
You can also modify the entire color palette for the series in a plot
Default Color Map:
The “Set2” color map is applied here
Series colors are applied in this sequential
order (at 10+ series, the cycle repeats)
rcParams are the underlying settings for Matplotlib charts and can be
modified to gain a high level of customization (more on these soon!)
For more on color palettes, visit: https://matplotlib.org/3.5.0/tutorials/colors/colormaps.html *Copyright Maven Analytics, LLC
ASSIGNMENT: COLORS
Results Preview
NEW MESSAGE
September 13, 2022
From: Sarah Shark (Managing Director)
Subject: Re: Revenue Report Format
Hi again,
Love the layout, HATE the colors! Let’s show some polish by
getting away from the defaults.
Apply the “Set2” colormap to the line chart and look up the
national color hex codes for the top 5 countries to use them
for the rest of the charts.
Thanks,
Sarah
section04_assignments.ipynb
*Copyright Maven Analytics, LLC
SOLUTION: COLORS
Solution Code
NEW MESSAGE
Apply Set2 (see notebook for chart code): :
September 13, 2022
From: Sarah Shark (Managing Director)
Subject: Re: Revenue Report Format Country Colors:
Hi again,
Donut Chart
Love the layout, HATE the colors! Let’s show some polish by
getting away from the defaults.
Apply the “Set2” colormap to the line chart and look up the
national color hex codes for the top 5 countries to use them
for the rest of the charts.
Thanks,
Sarah
section04_solutions.ipynb
*Copyright Maven Analytics, LLC
STYLE SHEETS
Matplotlib (and Seaborn) have style sheets that can be used instead of the default
The style is set in advance
The “fivethirtyeight” style
has larger font sizing, and
adds gridlines and a
background color
*Copyright Maven Analytics, LLC
STYLE SHEETS
Matplotlib (and Seaborn) have style sheets that can be used instead of the default
• You can still customize individual formatting options after setting a style
*Copyright Maven Analytics, LLC
STYLE SHEETS
Matplotlib (and Seaborn) have style sheets that can be used instead of the default
• You can still customize individual formatting options after setting a style
The Seaborn library has
additional styles that can
be used with Matplotlib
charts, like “darkgrid”
*Copyright Maven Analytics, LLC
ADDITIONAL STYLES
These are some of the additional styles available in both libraries:
*Copyright Maven Analytics, LLC
ASSIGNMENT: STYLE SHEETS
Results Preview
NEW MESSAGE
September 14, 2022
From: Sarah Shark (Managing Director)
Subject: Re: Re: Revenue Report Format
Hi,
Layout and colors look great now, but can we spruce up the
chart styling?
Use a style sheet of your choice.
Once we’ve done that it should be ready to ship.
Thx
-S
section04_assignments.ipynb
*Copyright Maven Analytics, LLC
SOLUTION: STYLE SHEETS
Solution Code
NEW MESSAGE
September 14, 2022
Style Setting Only (see notebook for chart code):
From: Sarah Shark (Managing Director)
Subject: Re: Re: Revenue Report Format
Hi,
Layout and colors look great now, but can we spruce up the
chart styling?
Use a style sheet of your choice.
Once we’ve done that it should be ready to ship.
Thx
-S
section04_solutions.ipynb
*Copyright Maven Analytics, LLC
STYLE PARAMETERS
Viewing the parameters of a style sheet can help format charts properly and provide
inspiration for your own formatting changes
*Copyright Maven Analytics, LLC
PARAMETER GROUPS
There are 300+ parameters that can be modified, which fall into parameter groups:
axes Chart-level formatting axes.spine.top = False, axis.titlesize=‘Large’
date Date formatting options date.autoformatter.month = %Y-%m
figure Figure-level formatting figure.figsize = (8.5, 11), figure.facecolor=“grey”
font Font settings font.size = 16, font.style=‘helvetica’, font.weight=‘bold’
grid Gridline settings grid.linestyle = ‘:’, grid.linewidth = 2
legend Legend settings legend.loc = ‘lower right’, legend.frameon=False
savefig Saved figure Settings savefig.dpi = 1000, savefig.format = ‘png’
text Text settings text.color = ‘grey’, text.usetex = True
xtick/ytick X and Y tick settings xtick.labelcolor=‘green’, ytick.minor.visible = True
boxplot Settings for boxplots boxplot.whiskerprops.color = ‘orange’
hist Settings for histograms hist.bins = 20
lines Settings for line charts lines.linewidth = 2, lines.color = ‘red’,
scatter Settings for scatterplots scatter.marker = “+”
For more on rcParams, visit: https://matplotlib.org/stable/api/matplotlib_configuration_api.html *Copyright Maven Analytics, LLC
MODIFYING PARAMETERS
There are two ways to modify parameters:
1. You can change individual parameters via assignment
2. You can change multiple parameters from the same group with the rc() function
Turn off top and right spines
Change default axes title size to 20 Modify
figure size to 8”x 6”
PRO TIP: Modify parameters to avoid having to
repeat the same formatting options on each chart
*Copyright Maven Analytics, LLC
SAVING FIGURES
The savefig() function will save figures as an image file
• Simply specify the desired filename and format
Screenshotting the images with your operating
system’s snipping tool will often be sufficient for
building plots into presentations like this course ;).
*Copyright Maven Analytics, LLC
SAVING FIGURES
The savefig() function will save figures as an image file
• Simply specify the desired filename and format
If no extension in the filename is specified, the
file will be saved as a .png. Most systems support
.jpg, .jpeg, .svg, and .pdf, among others. The
default resolution is 100dpi (pixels per inch)
*Copyright Maven Analytics, LLC
KEY TAKEAWAYS
Subplots and GridSpec allow us to create multi-chart figures
• Subplots are equally sized grids, GridSpec allows for custom layouts
Colors can be set by specifying a colormap or by assigning colors to the data of
interest
• Common color names and hex codes can be used to assign colors to your data
Set a style to spruce up the default aesthetics, or use rcParams to completely
customize your charts
• Pre-built styles can add some nice aesthetic polish compared to the matplotlib defaults
• Understanding how to modify rcParams will allow you full control over chart customization, and reduce the need
for manual formatting
*Copyright Maven Analytics, LLC
*Copyright Maven Analytics, LLC
PROJECT DATA: OVERVIEW
Coffee Production
*Copyright Maven Analytics, LLC
PROJECT DATA: OVERVIEW
Prices Paid To Growers
*Copyright Maven Analytics, LLC
ASSIGNMENT: MID-COURSE PROJECT
Key Objectives
NEW MESSAGE
September 18, 2022 1. Read in data from multiple csv files
From: Clarissa Café (Coffee Client) 2. Reshape the data with Pandas to set up charts
Subject: Summary Report
3. Build and customize line charts, bar charts,
Hi there, histograms and more to communicate key
insights to our client
Sarah told me to reach out directly to you – we loved the work
you did on breaking down the industry, but we want to 4. Modify chart colors to represent national flags
summarize your findings on Brazil into a single figure we can
pass around. 5. Combine modified charts into a single report by
leveraging meshgrid and subplots
Can you combine your findings into a single figure report?
We’ll also want to modify colors. There are more details in the
attached notebook.
Thanks!
Clarissa
section05_coffee_project_part2.ipynb
*Copyright Maven Analytics, LLC
*Copyright Maven Analytics, LLC
DATA VISUALIZATION WITH SEABORN
In this section we’ll cover data visualization with Seaborn, another Python library that
introduces new chart types and layouts, and interacts well with Matplotlib
TOPICS WE’LL COVER: GOALS FOR THIS SECTION:
• Introduce the basics of plotting data with Seaborn
• Build variations of Matplotlib charts like bar charts and
histograms, as well as new visuals like boxplots, violin
plots, and linear relationship plots
• Create FacetGrid layouts as an alternative to subplots
• Integrate Seaborn plots with Matplotlib objects to get
the best of both worlds
*Copyright Maven Analytics, LLC
MEET SEABORN
Seaborn is a Python library for built for easily visualizing Pandas DataFrames,
taking away some of the “drawing” required when using Matplotlib
‘sns’ is the standard alias for Seaborn
You simply need to specify
a DataFrame as the “data”
argument and set columns
as the “x” and “y” axes
Seaborn will automatically
aggregate the results!
*Copyright Maven Analytics, LLC
MEET SEABORN
Seaborn is a Python library for built for easily visualizing Pandas DataFrames,
taking away some of the “drawing” required when using Matplotlib
You can change the aggregation method
and suppress the confidence intervals
*Copyright Maven Analytics, LLC
CHART FORMATTING
You can apply chart formatting to Seaborn plots using Matplotlib arguments
• These are passed to the Matplotlib object that Seaborn creates internally
We’ll cover integration with Matplotlib later, which is where you’ll be able to
leverage the chart formatting skills you’ve learned throughout the course
*Copyright Maven Analytics, LLC
CHART FORMATTING
Seaborn still has some useful chart formatting functions like despine()
*Copyright Maven Analytics, LLC
BAR CHARTS
Bar charts can be created in Seaborn with sns.barplot()
• Simply specify the desired category labels and series values as “x” & “y” arguments
Note that Seaborn automatically aggregates the data for the plot, using unique category values as the labels
for the bars, the mean of each category for the bar length, and the column headers as the axis labels
*Copyright Maven Analytics, LLC
BAR CHARTS
Bar charts can be created in Seaborn with sns.barplot()
• Simply specify the desired category labels and series values as “x” & “y” arguments
To create a horizontal bar chart, specify “x” as the data and
“y” as the labels. ci=None will suppress error bars.
*Copyright Maven Analytics, LLC
GROUPED BAR CHARTS
Grouped bar charts can be created by specifying a categorical column as “hue”
You can also sort the bars by one of the
columns, and apply a different color map
*Copyright Maven Analytics, LLC
HISTOGRAMS
Histograms can be created with sns.histplot() and a single “x” argument
*Copyright Maven Analytics, LLC
HISTOGRAMS
Histograms can be created with sns.histplot() and a single “x” argument
• You can also specify the number of “bins” and add the kernel density (kde=True)
The default style for Seaborn plots can be
nicer than their Matplotlib counterparts,
and vice versa, so choose the library the
works best for each chart!
*Copyright Maven Analytics, LLC
ASSIGNMENT: BASIC CHARTS
Results Preview
NEW MESSAGE
September 20, 2022
From: Sarah Shark (Managing Director)
Subject: New Charts
Hi,
Need a few more views on the hotel data using Seaborn.
Can we look at the distribution of lodging revenue for each
booking? Only plot customers with less than 1,500 dollars to
weed out longer term stays.
Then, build a bar chart with the average room nights stayed
for our top 5 countries.
Thanks
section06_assignments.ipynb
*Copyright Maven Analytics, LLC
SOLUTION: BASIC CHARTS
Solution Code
NEW MESSAGE
September 20, 2022
From: Sarah Shark (Managing Director)
Subject: New Charts
Hi,
Need a few more views on the hotel data using Seaborn.
Can we look at the distribution of lodging revenue for each
booking? Only plot customers with less than 1,500 dollars to
weed out longer term stays.
The build a bar chart with the average room nights stayed for
our top 5 countries.
Thanks
section06_solutions.ipynb
*Copyright Maven Analytics, LLC
BOXPLOTS
Boxplots can be created with sns.boxplot()
• They visualize the distribution of a variable by plotting key statistics
Q1 Median Q3
Min Q3+1.5*IQR
Boxplot statistics:
• Median (50th percentile) Max
• 1st & 3rd Quartiles (25th & 75th percentiles)
• Interquartile Range (IQR) Outliers
• Min & Max Values (or 1.5x the IQR)
• Outliers
IQR
*Copyright Maven Analytics, LLC
BOXPLOTS
Boxplots can be created with sns.boxplot()
• They visualize the distribution of a variable by plotting key statistics
Specify a second axis to create
separate boxplots by category
*Copyright Maven Analytics, LLC
VIOLIN PLOTS
Violin plots can be created with sns.violinplot()
• They are boxplots with symmetrical kernel densities along their sides
*Copyright Maven Analytics, LLC
ASSIGNMENT: BOX & VIOLIN PLOTS
Results Preview
NEW MESSAGE
September 24, 2022
From: Sarah Shark (Managing Director)
Subject: Re: New Charts
Hi,
Let’s view the distribution of lodging revenue using a boxplot
instead, once again capping the revenue at 1500.
Then filter the data to the top 5 countries and build a violin
plot of their lodging revenue, as well as their age distribution.
Sarah
section06_assignments.ipynb
*Copyright Maven Analytics, LLC
SOLUTION: BOX & VIOLIN PLOTS
Solution Code
NEW MESSAGE
September 24, 2022
From: Sarah Shark (Managing Director)
Subject: Re: New Charts
Hi,
Let’s view the distribution of lodging revenue using a boxplot
instead, once again capping the revenue at 1500.
Then filter the data to the top 5 countries and build a violin
plot of their lodging revenue, as well as their age distribution.
Sarah
section06_solutions.ipynb
*Copyright Maven Analytics, LLC
LINEAR RELATIONSHIP PLOTS
Seaborn has several plots to explore linear relationships:
Creates a scatterplot sns.scatterplot(x, y, data)
Creates a scatterplot with a fitted regression line sns.regplot(x, y, data)
Create a scatterplot with a fitted regression line, and can visualize
multiple categories using color, or splitting into rows & columns sns.lmplot(x, y, hue, row, col, data)
Creates a scatterplot and adds the distribution for each variable sns.jointplot(x, y, kind, data)
Creates a matrix of scatterplots comparing multiple variables, and
shows the distribution for each one sns.pairplot(cols)
*Copyright Maven Analytics, LLC
REGPLOT()
sns.regplot() creates a scatterplot with a fitted regression line
*Copyright Maven Analytics, LLC
LMPLOT()
sns.lmplot() lets you explore the impact of other variables on the relationship
Specify the ‘hue’ to
create a line for each
category in the specified
column and set a
different color for each
category
*Copyright Maven Analytics, LLC
LMPLOT()
sns.lmplot() lets you explore the impact of other variables on the relationship
Specify the ‘row’ and ‘column’ to
create regression plots for each
combination of variables
PRO TIP: This type of visual is great
for exploring your data, but way too
complex for a presentation!
*Copyright Maven Analytics, LLC
JOINTPLOT()
sns.jointplot() creates a scatterplot and adds the distribution of each variable
The ‘kind’ argument has
several options like
‘kde’, which plots the
kernel densities, and
‘reg’, which plots the
regression line
*Copyright Maven Analytics, LLC
PAIRPLOT()
sns.pairplot() creates a matrix of scatterplots comparing multiple variables, and
shows the distribution for each one along the diagonal
This lets you see the relationship between a diamond’s
weight (carat) and its length (x), width (y), and depth (z)
You can see that the weight of the diamond has a positive
relationship with height, width, and length, with the
relationships being VERY strong for width and depth
*Copyright Maven Analytics, LLC
ASSIGNMENT: LINEAR RELATIONSHIP PLOTS
Results Preview
NEW MESSAGE
September 26, 2022
From: Wendy Whiz (Data Scientist)
Subject: More Exploration
Hi there,
Can you produce charts to explore the relationship between
room nights and lodging revenue?
First for all the data and then for each top 5 country.
Can you also produce a pairplot comparing lodging revenue
to several key variables? (more details in the notebook)
Best,
Wendy
section06_assignments.ipynb
*Copyright Maven Analytics, LLC
SOLUTION: LINEAR RELATIONSHIP PLOTS
Solution Code
NEW MESSAGE
September 26, 2022
From: Wendy Whiz (Data Scientist)
Subject: More Exploration
Hi there,
Can you produce charts to explore the relationship between
room nights and lodging revenue?
First for all the data and then for each top 5 country.
Can you also produce a pairplot comparing lodging revenue
to several key variables? (more details in the notebook)
Best,
Wendy
section06_solutions.ipynb
*Copyright Maven Analytics, LLC
HEATMAPS
Create a heatmap to visualize a table of data with sns.heatmap()
PRO TIP: Pandas’ pivot_table
method is a great way to set up
the data needed for a heat map!
*Copyright Maven Analytics, LLC
HEATMAPS
Create a heatmap to visualize a table of data with sns.heatmap()
You can modify rcParameters
with sns.set(), but we’ll show the
syntax for combining Matplotlib
and Seaborn shortly!
*Copyright Maven Analytics, LLC
ASSIGNMENT: HEATMAPS
Results Preview
NEW MESSAGE
September 26, 2022
From: Wendy Whiz (Data Scientist)
Subject: RE: More Exploration
Hi there,
Last piece to help me look at features for my modeling work.
Can you build a heatmap with countries as rows and market
segment as columns with the mean lodging revenue for each?
Then build a heatmap for a correlation matrix.
Thanks,
Wendy
section06_assignments.ipynb
*Copyright Maven Analytics, LLC
SOLUTION: HEATMAPS
Solution Code
NEW MESSAGE
September 26, 2022
From: Wendy Whiz (Data Scientist)
Subject: RE: More Exploration
Hi there,
Last piece to help me look at features for my modeling work.
Can you build a heatmap with countries as rows and market
segment as columns with the mean lodging revenue for each?
Then build a heatmap for a correlation matrix.
Thanks,
Wendy
section06_solutions.ipynb
*Copyright Maven Analytics, LLC
FACETGRID
Seaborn’s FacetGrid is a convenient alternative to Matplotlib’s subplot grids
• sns.FacetGrid(DataFrame, column, column wrap)
This creates 7 charts, one for each
“color”, in a grid with 3 columns
*Copyright Maven Analytics, LLC
FACETGRID
Seaborn’s FacetGrid is a convenient alternative to Matplotlib’s subplot grids
• sns.FacetGrid(DataFrame, column, column wrap)
This plots a histogram of
“price” for each “color” in
the DataFrame
*Copyright Maven Analytics, LLC
MATPLOTLIB INTEGRATION
You can build Seaborn plots in Matplotlib objects, which lets you customize and
integrate Seaborn charts as if they were built using Matplotlib
This creates a Matplotlib figure and axis, sets a Seaborn style,
creates a Seaborn bar chart, and then adds Matplotlib labels
*Copyright Maven Analytics, LLC
MATPLOTLIB INTEGRATION
You can build Seaborn plots in Matplotlib objects, which lets you customize and
integrate Seaborn charts as if they were built using Matplotlib
This lets you specify which
axes to plot the chart on
*Copyright Maven Analytics, LLC
KEY TAKEAWAYS
Seaborn is a user-friendly extension of Matplotlib
• It has a simple interface, nice aesthetics, and works well with Pandas DataFrames
Seaborn adds new chart types that are useful in exploring data
• Boxplots, violin plots, and linear model plots help profile data and identify relationships between variables
Seaborn is very compatible with Matplotlib
• Seaborn charts are extensions of Matplotlib objects, so they can be placed in Matplotlib figures
• Matplotlib formatting arguments can passed to corresponding Seaborn plotting functions
*Copyright Maven Analytics, LLC
*Copyright Maven Analytics, LLC
PROJECT DATA: USED CARS DATA
*Copyright Maven Analytics, LLC
ASSIGNMENT: FINAL PROJECT
Key Objectives
NEW MESSAGE
October 10, 2022 1. Read in and manipulate data with Pandas
From: Aaron Auto (VP of Fleet Management) 2. Build summary charts with Matplotlib and Seaborn
Subject: Optimal Fleet Truck Purchase
3. Leverage Seaborn’s advanced chart types to mine
insights from the data and make a decision
Hello,
We need an outside analysis on auto procurement for our
fleet of service vehicles. We lease trucks to contractors and
other businesses, but a recent spike in demand has meant
we’re unable to get cars from traditional suppliers.
I want to see an overview of the automotive auction industry,
before diving into where we can get Ford F150s for the most
affordable price on the market (more details in the notebook).
Thanks
section07_final_project.ipynb
*Copyright Maven Analytics, LLC