SlideShare a Scribd company logo
Data Vis for Data Science
Usage of Python Visualisation Libraries
Amit Kapoor
@amitkaps
Data Science Pipeline
— Frame: Problem definition
— Acquire: Data ingestion
— Refine: Data wrangling
— Transform: Feature creation
— Explore: Feature selection
— Model: Model creation & assessment
— Insight: Solution communication
Role of Visualisation
— Frame: Structuring (issue tree, hypotheses)
— Acquire: Loading (progress, errors)
— Refine: Profiling (missing values, outliers)
— Transform: Univariate & Bivariate Vis (1D, 2D)
— Explore: Multi Dimensional Vis (3D ... ND)
— Model: Model Vis (predictions, errors, models)
— Insight: Vis Comm (chart, narrative, dashboard)
Understanding Visualisation
— Domain & Task Layer e.g. Tabular Data for EDA
— Data Layer e.g. Data Types, Transformation
— Visual Layer e.g. Encoding, Marks, Coordinate
— Annotation Layer e.g. Labels, Ticks, Titles
— Interaction Layer e.g. Filtering, Highlighting,
Selection
Python Visualisation Libraries
— Matplotlib
— Pandas built-in plotting
— ggpy
— Altair
— Seaborn
— Plotly
— Bokeh
— HoloViews
— VisPy
— Lightning
— pygg
Choosing a Visualisation Library
— Ease of Learning: How hard is the API?
— Coverage: How many graphic types can it cover?
— Approach: Is it Charting or Grammar based?
— Documentation: How easy is it to make basics
graphs?
— Community Support: How hard is it to make complex
graphs?
Notes in Circulation
year | type | denom | value | money | number |
------- | -------| ------ | ------ | ------- | ------ |
1977 | Notes | 0001 | 1 | 2.72 | 2.720 |
1977 | Notes | 1000 | 1000 | 0.55 | 0.001 |
1977 | Notes | 0002 | 2 | 1.48 | 0.740 |
1977 | Notes | 0050 | 50 | 9.95 | 0.199 |
... | ... | ... | ... | ... | ... |
2015 | Notes | 0500 | 500 | 7853.75 | 15.708 |
2015 | Notes | 0001 | 1 | 3.09 | 3.090 |
2015 | Notes | 0010 | 10 | 320.15 | 32.015 |
2015 | Notes | 1000 | 1000 | 6325.68 | 6.326 |
Use Pandas for Base Plotting
# Loading Data
import pandas as pd
notes = pd.read_csv('notes.csv')
# Data Transformation
notes_wide = pd.pivot_table(data = notes, index="year",
columns="denom", values="money")
# Plotting
notes_wide.plot(kind="line")
Python Visualisation for Data Science
Use Matplotlib for Annotation
# Basic Styling
import matplotlib.pyplot as pet
plt.rcParams['figure.figsize'] = (9,6)
plt.style.use('ggplot')
# Plotting
notes_wide.plot(kind="line")
# Adding Annotation
plt.ylabel('Value INR Bns')
plt.title('Notes in Circulation')
Python Visualisation for Data Science
Ideally use ggplot like R
from plot import *
ggplot(notes, aes(x='year',
y='money',
color='denom')) + /
geom_line()
Use Altair for Grammar Visualisation
from altair import Chart
Chart(notes).mark_line().encode(
x='year:N',
y='money',
color='denom'
)
Python Visualisation for Data Science
Personal Usage
— Use Pandas for base plotting and time series
— Use Matplotlib for matrices and customisation
— Use Seaborn for 1D & 2D statistical graphs,
especially categorical variable
— Use IPython Widgets for model interaction
— Use Datashader for Big Data Visualisation
— Experimenting with Altair
What about interactivity?
— Watch out for Altair - Interaction will be build
in soon
— Use Bokeh for web-based interactive dashboard,
but require learning a different API
— Use Plotly for creating full interactive charts.
Integration with Matplotlib available.
Get in touch with me
Amit Kapoor
@amitkaps
amitkaps.com

More Related Content

PDF
Model Visualisation
PPT
Tspbug 2 24_2014_final
PDF
Creating graphs -_key_points
PPTX
Pandas data transformational data structure patterns and challenges final
PDF
Python business intelligence (PyData 2012 talk)
PPTX
UNIT I- Introduction- data science key components, features
PDF
Unlocking Insights Data Analysis Visualization
DOCX
Tableau Online Training course by FuturePoint Technologies
Model Visualisation
Tspbug 2 24_2014_final
Creating graphs -_key_points
Pandas data transformational data structure patterns and challenges final
Python business intelligence (PyData 2012 talk)
UNIT I- Introduction- data science key components, features
Unlocking Insights Data Analysis Visualization
Tableau Online Training course by FuturePoint Technologies

Similar to Python Visualisation for Data Science (20)

PDF
Best Practices for Building and Deploying Data Pipelines in Apache Spark
PPTX
Tech Launch Program Data science pr.pptx
PPTX
Data Science for Dummies - Data Engineering with Titanic dataset + Databricks...
PPT
MSBI and Data WareHouse techniques by Quontra
PPTX
BDA_MO_1_S6_Basic_data_analytics_,reporting,_and_apply_basic_data.pptx
PPTX
IaaS, PaaS, and DevOps for Data Scientist
PPTX
Msbi online training
PDF
Machine Learning with Python
PDF
Data Product Architectures
PPTX
Data Observability Best Pracices
PDF
Introduction to Data Analtics with Pandas [PyCon Cz]
PDF
One Year in Fabric: Lessons Learned from Implementing Real-World Projects (PA...
PDF
UNIT -1 Data exploration and visualization ppt
PDF
Day 1 DAMC.pdfaqwerfdggghbbjmjm jolk lṇn
PPTX
PYTHON-Chapter 4-Plotting and Data Science PyLab - MAULIK BORSANIYA
PDF
The Function, the Context, and the Data—Enabling ML Ops at Stitch Fix
PPTX
To understand the importance of Python libraries in data analysis.
PDF
Using the LEADing Data Reference Content
PPTX
Machine Learning with Azure
PPT
Cssu dw dm
Best Practices for Building and Deploying Data Pipelines in Apache Spark
Tech Launch Program Data science pr.pptx
Data Science for Dummies - Data Engineering with Titanic dataset + Databricks...
MSBI and Data WareHouse techniques by Quontra
BDA_MO_1_S6_Basic_data_analytics_,reporting,_and_apply_basic_data.pptx
IaaS, PaaS, and DevOps for Data Scientist
Msbi online training
Machine Learning with Python
Data Product Architectures
Data Observability Best Pracices
Introduction to Data Analtics with Pandas [PyCon Cz]
One Year in Fabric: Lessons Learned from Implementing Real-World Projects (PA...
UNIT -1 Data exploration and visualization ppt
Day 1 DAMC.pdfaqwerfdggghbbjmjm jolk lṇn
PYTHON-Chapter 4-Plotting and Data Science PyLab - MAULIK BORSANIYA
The Function, the Context, and the Data—Enabling ML Ops at Stitch Fix
To understand the importance of Python libraries in data analysis.
Using the LEADing Data Reference Content
Machine Learning with Azure
Cssu dw dm
Ad

More from Amit Kapoor (19)

PDF
Deep Learning for NLP
PDF
The Power of Ensembles in Machine Learning
PDF
Storytelling with Data - Approach | Skills
PDF
Visualising Big Data
PDF
Learning the Craft of Data Visualisation
PDF
Visualising Multi Dimensional Data
PDF
Tools & Resources for Data Visualisation
PDF
Fifth Elephant 2014 talk - Crafting Visual Stories with Data
PDF
Storytelling with Data - See | Show | Tell | Engage
PDF
Crafting Visual Stories with Data
PDF
Business Process Improvement - A Strategic and Supply Chain Perspective
PDF
What makes a data-story work?
PDF
What is Strategy - Thinking like a Strategist
PPTX
Telling Stories with Data - Using Story Spine
PPTX
Story Structure and Modern Storytelling
PPTX
Targeting the Moment of Truth - Using Big Data in Retail
PDF
Storytelling - Gutenberg
PDF
Analytics in Consulting
PPT
Retail Pricing Perspective
Deep Learning for NLP
The Power of Ensembles in Machine Learning
Storytelling with Data - Approach | Skills
Visualising Big Data
Learning the Craft of Data Visualisation
Visualising Multi Dimensional Data
Tools & Resources for Data Visualisation
Fifth Elephant 2014 talk - Crafting Visual Stories with Data
Storytelling with Data - See | Show | Tell | Engage
Crafting Visual Stories with Data
Business Process Improvement - A Strategic and Supply Chain Perspective
What makes a data-story work?
What is Strategy - Thinking like a Strategist
Telling Stories with Data - Using Story Spine
Story Structure and Modern Storytelling
Targeting the Moment of Truth - Using Big Data in Retail
Storytelling - Gutenberg
Analytics in Consulting
Retail Pricing Perspective
Ad

Recently uploaded (20)

PDF
Taxes Foundatisdcsdcsdon Certificate.pdf
PPT
Chapter 3 METAL JOINING.pptnnnnnnnnnnnnn
PDF
Linux OS guide to know, operate. Linux Filesystem, command, users and system
PPTX
Computer network topology notes for revision
PPT
Quality review (1)_presentation of this 21
PPTX
Understanding Prototyping in Design and Development
PPTX
Introduction to Firewall Analytics - Interfirewall and Transfirewall.pptx
PDF
.pdf is not working space design for the following data for the following dat...
PPTX
oil_refinery_comprehensive_20250804084928 (1).pptx
PPTX
Business Acumen Training GuidePresentation.pptx
PPTX
Business Ppt On Nestle.pptx huunnnhhgfvu
PPT
Miokarditis (Inflamasi pada Otot Jantung)
PPTX
1intro to AI.pptx AI components & composition
PDF
Launch Your Data Science Career in Kochi – 2025
PDF
“Getting Started with Data Analytics Using R – Concepts, Tools & Case Studies”
PDF
Master Databricks SQL with AccentFuture – The Future of Data Warehousing
PDF
Data Analyst Certificate Programs for Beginners | IABAC
PDF
Oracle OFSAA_ The Complete Guide to Transforming Financial Risk Management an...
PPTX
IB Computer Science - Internal Assessment.pptx
PPTX
batch data Retailer Data management Project.pptx
Taxes Foundatisdcsdcsdon Certificate.pdf
Chapter 3 METAL JOINING.pptnnnnnnnnnnnnn
Linux OS guide to know, operate. Linux Filesystem, command, users and system
Computer network topology notes for revision
Quality review (1)_presentation of this 21
Understanding Prototyping in Design and Development
Introduction to Firewall Analytics - Interfirewall and Transfirewall.pptx
.pdf is not working space design for the following data for the following dat...
oil_refinery_comprehensive_20250804084928 (1).pptx
Business Acumen Training GuidePresentation.pptx
Business Ppt On Nestle.pptx huunnnhhgfvu
Miokarditis (Inflamasi pada Otot Jantung)
1intro to AI.pptx AI components & composition
Launch Your Data Science Career in Kochi – 2025
“Getting Started with Data Analytics Using R – Concepts, Tools & Case Studies”
Master Databricks SQL with AccentFuture – The Future of Data Warehousing
Data Analyst Certificate Programs for Beginners | IABAC
Oracle OFSAA_ The Complete Guide to Transforming Financial Risk Management an...
IB Computer Science - Internal Assessment.pptx
batch data Retailer Data management Project.pptx

Python Visualisation for Data Science

  • 1. Data Vis for Data Science Usage of Python Visualisation Libraries Amit Kapoor @amitkaps
  • 2. Data Science Pipeline — Frame: Problem definition — Acquire: Data ingestion — Refine: Data wrangling — Transform: Feature creation — Explore: Feature selection — Model: Model creation & assessment — Insight: Solution communication
  • 3. Role of Visualisation — Frame: Structuring (issue tree, hypotheses) — Acquire: Loading (progress, errors) — Refine: Profiling (missing values, outliers) — Transform: Univariate & Bivariate Vis (1D, 2D) — Explore: Multi Dimensional Vis (3D ... ND) — Model: Model Vis (predictions, errors, models) — Insight: Vis Comm (chart, narrative, dashboard)
  • 4. Understanding Visualisation — Domain & Task Layer e.g. Tabular Data for EDA — Data Layer e.g. Data Types, Transformation — Visual Layer e.g. Encoding, Marks, Coordinate — Annotation Layer e.g. Labels, Ticks, Titles — Interaction Layer e.g. Filtering, Highlighting, Selection
  • 5. Python Visualisation Libraries — Matplotlib — Pandas built-in plotting — ggpy — Altair — Seaborn — Plotly — Bokeh — HoloViews — VisPy — Lightning — pygg
  • 6. Choosing a Visualisation Library — Ease of Learning: How hard is the API? — Coverage: How many graphic types can it cover? — Approach: Is it Charting or Grammar based? — Documentation: How easy is it to make basics graphs? — Community Support: How hard is it to make complex graphs?
  • 7. Notes in Circulation year | type | denom | value | money | number | ------- | -------| ------ | ------ | ------- | ------ | 1977 | Notes | 0001 | 1 | 2.72 | 2.720 | 1977 | Notes | 1000 | 1000 | 0.55 | 0.001 | 1977 | Notes | 0002 | 2 | 1.48 | 0.740 | 1977 | Notes | 0050 | 50 | 9.95 | 0.199 | ... | ... | ... | ... | ... | ... | 2015 | Notes | 0500 | 500 | 7853.75 | 15.708 | 2015 | Notes | 0001 | 1 | 3.09 | 3.090 | 2015 | Notes | 0010 | 10 | 320.15 | 32.015 | 2015 | Notes | 1000 | 1000 | 6325.68 | 6.326 |
  • 8. Use Pandas for Base Plotting # Loading Data import pandas as pd notes = pd.read_csv('notes.csv') # Data Transformation notes_wide = pd.pivot_table(data = notes, index="year", columns="denom", values="money") # Plotting notes_wide.plot(kind="line")
  • 10. Use Matplotlib for Annotation # Basic Styling import matplotlib.pyplot as pet plt.rcParams['figure.figsize'] = (9,6) plt.style.use('ggplot') # Plotting notes_wide.plot(kind="line") # Adding Annotation plt.ylabel('Value INR Bns') plt.title('Notes in Circulation')
  • 12. Ideally use ggplot like R from plot import * ggplot(notes, aes(x='year', y='money', color='denom')) + / geom_line()
  • 13. Use Altair for Grammar Visualisation from altair import Chart Chart(notes).mark_line().encode( x='year:N', y='money', color='denom' )
  • 15. Personal Usage — Use Pandas for base plotting and time series — Use Matplotlib for matrices and customisation — Use Seaborn for 1D & 2D statistical graphs, especially categorical variable — Use IPython Widgets for model interaction — Use Datashader for Big Data Visualisation — Experimenting with Altair
  • 16. What about interactivity? — Watch out for Altair - Interaction will be build in soon — Use Bokeh for web-based interactive dashboard, but require learning a different API — Use Plotly for creating full interactive charts. Integration with Matplotlib available.
  • 17. Get in touch with me Amit Kapoor @amitkaps amitkaps.com