SlideShare a Scribd company logo
Classificatie: vertrouwelijk
Apache
Superset
Data
Exploration,
Visualization &
Analysis
co-star: Steampipe & Trino
Conclusion Code Café – 20 maart 2023
Lucas Jellema, CTO & Architect AMIS | Conclusion
SQL
Classificatie: vertrouwelijk
Apache Superset
• Data Visualization – ready to use product
• browser based UI & web server backend
• any SQL data source
• quick table-to-visualization & dashboard
• open source, end user friendly/self service
• Design principles:
• single or multi-user,
• no data is stored in Superset (except meta-data)
and ephemeral cache,
• light weight & optional semantic layer
• row based access control applied
Conclusion Code Café - Apache Superset - Data Exploration, Visualization & Analysis 2
Classificatie: vertrouwelijk
Apache Superset
• Typical workflow
• connect data source and on boards “tables”
• explore / filter/ aggregate / slide & dice data
• create visualizations and annotate findings
• compose and publish dashboards
Conclusion Code Café - Apache Superset - Data Exploration, Visualization & Analysis 3
Classificatie: vertrouwelijk
History
• Originated at Airbnb in 2015 as the frontend for Apache Druid
• Druid: open source multi-dimensional in memory
distributed real time timeseries database
• Earlier names: Panoramix & Caravel
• Under Apache Software Foundation since 2017
• July 2022 – Release 2.0
• Tech Debt resolved, better Databricks, Pinot & Trino
support, much improved UI experience
• Technology stack
• JavaScript: React/Redux, D3.js, webpack
• Python, Flask, Pandas, SQLAlchemy
• Thriving open source project
• Used in many companies.
Example: Airbnb – 600+ daily users, 100K+ charts
• Offered as SaaS: Preset
Conclusion Code Café - Apache Superset - Data Exploration, Visualization & Analysis 4
Classificatie: vertrouwelijk
How to work with Apache Superset?
• Install, Configure & Run
=> for example Kubernetes, Docker Compose, Gitpod
• Configure Database Connection(s)
• or upload CSV data files
• Define Data Set(s) based on “Tables”
• explore/refine in SQL Lab
• define “Ninja Templates” –
custom filters for specific SQL or context data
Conclusion Code Café - Apache Superset - Data Exploration, Visualization & Analysis 5
Classificatie: vertrouwelijk
How to work with Apache Superset?
• Install, Configure & Run
=> for example Kubernetes, Docker Compose, Gitpod
• Configure Database Connection(s)
• or upload CSV data files
• Define Data Set(s) based on “Tables”
• explore/refine in SQL Lab
• define “Ninja Templates” – custom filters for specific SQL or context data
• Create a Chart on that Data Set
• select type of visualization
• map data to visualization (x, y, series, time, ..)
• configure chart: color-scheme, titles, legend
• annotate chart – provide commentary
• publish chart – image, CSV/Excel/JSON, email, add to dashboard
• define alerts (on SQL condition) and schedule reports – Slack or Email
• Compose and Expose dashboard
Conclusion Code Café - Apache Superset - Data Exploration, Visualization & Analysis 6
Classificatie: vertrouwelijk
How to work with Apache Superset?
• Install, Configure & Run
Configure Database Connection(s)
• Define Data Set(s) based on “Tables”
• Create a Chart on that Data Set
• Compose and Expose dashboard
Conclusion Code Café - Apache Superset - Data Exploration, Visualization & Analysis 7
Classificatie: vertrouwelijk
Demo Apache Superset
• Create Data Set from SQL Query
• Explore Data
• Create Visualization
• Demonstrate Dashboard
Conclusion Code Café - Apache Superset - Data Exploration, Visualization & Analysis 8
Rob
Laughter
on
Unsplash.com
Classificatie: vertrouwelijk
Define Data Set for SQL Table or View
Conclusion Code Café - Apache Superset - Data Exploration, Visualization & Analysis 9
Classificatie: vertrouwelijk
Explore Data
Conclusion Code Café - Apache Superset - Data Exploration, Visualization & Analysis 10
Classificatie: vertrouwelijk
Visualize Data Set
Conclusion Code Café - Apache Superset - Data Exploration, Visualization & Analysis 11
Classificatie: vertrouwelijk
End of Demo Apache Superset
Conclusion Code Café - Apache Superset - Data Exploration, Visualization & Analysis 12
Rob
Laughter
on
Unsplash.com
Classificatie: vertrouwelijk
Predictive Analytics
• Predict values into the
future
• extrapolate from past
• take seasonality into
consideration
• Based on Prophet
• open source Python
Conclusion Code Café - Apache Superset - Data Exploration, Visualization & Analysis 13
Classificatie: vertrouwelijk
Notifications - Alarms and Scheduled Reports
• When?
• condition
• schedule
• What?
• To whom?
• Channel/Method?
Conclusion Code Café - Apache Superset - Data Exploration, Visualization & Analysis 14
Classificatie: vertrouwelijk
Annotation – Multiple Layers –
Label Time Intervals and Timestamps
Conclusion Code Café - Apache Superset - Data Exploration, Visualization & Analysis 15
Classificatie: vertrouwelijk
Security
• Users identified through OAuth2 providers such as GitHub, Twitter, LinkedIn,
Google, Azure, and custom OAuth2 providers
• Users are associated with roles
• Roles are authorized on data sources, views, dashboards
• Row level access / Group level data filters
• Define a security filter for a table and associate
the filter with a specific group
• Any data access on that table by someone
in the group will have the filter applied “transparently”
• Multiple filters will be combined
Conclusion Code Café - Apache Superset - Data Exploration, Visualization & Analysis 16
table
Group
Security
Filter
SQL
Classificatie: vertrouwelijk
Custom Visualization Plugins
• Adding custom data visualizations to Superset is well supported
• Steps:
• Generate Skeleton for custom plugin (CLI, Yeoman)
• Register plugin (JSON)
• Configure plugin – valid input, labels, hooks (JSON)
• Implement/link React component that actually renders data (Typescript/JS)
• At runtime: Superset
• exposes custom plugin in gallery
• allows users to set relevant configuration for plugin
• passes data – query result set – to plugin
• embeds the rendered outcome appropriately in the webpage
Conclusion Code Café - Apache Superset - Data Exploration, Visualization & Analysis 17
Classificatie: vertrouwelijk
Data Source Reach of Apache Superset
• Superset can process data in SQL enabled sources
Conclusion Code Café - Apache Superset - Data Exploration, Visualization & Analysis 18
Classificatie: vertrouwelijk
Trino (pka Presto SQL)
Conclusion Code Café - Apache Superset - Data Exploration, Visualization & Analysis 19
• Distributed Federated Query Engine
• OLAP system, enables data mesh
• does not store data itself
• MPP architecture
• Trino processes SQL queries
against multiple data engines
• SQL and NoSQL
• database and other
(queue, event broker, file
system, cache)
• combines results across sources:
join, union, group by / aggregate
• Started in 2012 at Facebook as Presto
• to replace Hive
• Offered as SaaS by Galaxy
Classificatie: vertrouwelijk
Superset can access data via Trino using SQL
Conclusion Code Café - Apache Superset - Data Exploration, Visualization & Analysis 20
Classificatie: vertrouwelijk
Superset can access & combine non-SQL and SQL sources
via Trino
Conclusion Code Café - Apache Superset - Data Exploration, Visualization & Analysis 21
Classificatie: vertrouwelijk
Steampipe
Conclusion Code Café - Apache Superset - Data Exploration, Visualization & Analysis 22
102 plugins for
various data
sources
data can be joined,
filtered, union-
ed/minussed,
aggregated
Classificatie: vertrouwelijk
Via Steampipe – Superset has access to 100 more sources
Conclusion Code Café - Apache Superset - Data Exploration, Visualization & Analysis 23
Classificatie: vertrouwelijk
Extending Source Reach of Apache Superset – across
platforms, data formats, protocols and query languages
Conclusion Code Café - Apache Superset - Data Exploration, Visualization & Analysis 24
Shillelagh
Google Sheets
HTTP => JSON, CSV
GitHub
GraphQL
Datasette
HTML Table
S3
Weather API
Socrata
Classificatie: vertrouwelijk
workspace
Superset web app
at port 8088
docker-compose –
running 6 containers
database connection
<>
plugin
Gitpod Workspace for Trying Out
Apache Superset and Steampipe
Classificatie: vertrouwelijk
Summary
Data Visualization
Across virtually any data source
(also leveraging Trino, Steampipe etc)
User friendly
Appealing, insightful visualizations
Data exploration (slice & dice)
Customizable (custom visualizations)
Open source, open architecture
Fine grained security
Free (and better?) alternative to Tableau,
Qlik, PowerBI
Conclusion Code Café - Apache Superset - Data Exploration, Visualization & Analysis 26
Classificatie: vertrouwelijk
Gitpod Workspace for Trying Out Apache Superset
Conclusion Code Café - Apache Superset - Data Exploration, Visualization & Analysis 27
workspace
Redis
superset_app
web app at
port 8088
superset_worker
superset_init
superset_worker_beat
superset_cache
superset_db

More Related Content

PDF
AWS Data Analytics on AWS
PDF
Open Source DataViz with Apache Superset
PDF
Introduction to Apache NiFi 1.11.4
PDF
Apache Superset at Airbnb
PDF
Building a Data Pipeline using Apache Airflow (on AWS / GCP)
PDF
Dataflow with Apache NiFi
PPTX
Best practices and lessons learnt from Running Apache NiFi at Renault
ODP
Presto
AWS Data Analytics on AWS
Open Source DataViz with Apache Superset
Introduction to Apache NiFi 1.11.4
Apache Superset at Airbnb
Building a Data Pipeline using Apache Airflow (on AWS / GCP)
Dataflow with Apache NiFi
Best practices and lessons learnt from Running Apache NiFi at Renault
Presto

What's hot (20)

PDF
An overview of Neo4j Internals
PPTX
PPTX
Hive + Tez: A Performance Deep Dive
PPTX
Elastic Stack Introduction
PDF
Introduction to Presto at Treasure Data
PDF
The Rise of ZStandard: Apache Spark/Parquet/ORC/Avro
PDF
Running Kafka On Kubernetes With Strimzi For Real-Time Streaming Applications
PDF
PPTX
OpenTelemetry For Operators
PPTX
Elastic stack Presentation
PPTX
Migrating with Debezium
PPTX
Docker Networking - Common Issues and Troubleshooting Techniques
PPTX
Evening out the uneven: dealing with skew in Flink
PDF
MariaDB 마이그레이션 - 네오클로바
PDF
OpenStack Architecture
PPTX
Microservices
PPTX
Low Level CPU Performance Profiling Examples
PDF
DATABASE AUTOMATION with Thousands of database, monitoring and backup
PPTX
Troubleshooting Kerberos in Hadoop: Taming the Beast
PDF
Getting Started with Apache Spark on Kubernetes
An overview of Neo4j Internals
Hive + Tez: A Performance Deep Dive
Elastic Stack Introduction
Introduction to Presto at Treasure Data
The Rise of ZStandard: Apache Spark/Parquet/ORC/Avro
Running Kafka On Kubernetes With Strimzi For Real-Time Streaming Applications
OpenTelemetry For Operators
Elastic stack Presentation
Migrating with Debezium
Docker Networking - Common Issues and Troubleshooting Techniques
Evening out the uneven: dealing with skew in Flink
MariaDB 마이그레이션 - 네오클로바
OpenStack Architecture
Microservices
Low Level CPU Performance Profiling Examples
DATABASE AUTOMATION with Thousands of database, monitoring and backup
Troubleshooting Kerberos in Hadoop: Taming the Beast
Getting Started with Apache Spark on Kubernetes
Ad

Similar to Apache Superset - open source data exploration and visualization (Conclusion Code Café, march 2023) (15)

PDF
SFScon21 - Maurizio Napolitano - Apache Superset - A modern data exploration ...
PPTX
Privacy by design
PPT
Hatkit Project - Datafiddler
PPTX
Steampipe - use SQL to retrieve data from cloud, platforms and files (Code Ca...
PPTX
Big Data Analytics 2: Leveraging Customer Behavior to Enhance Relevancy in Pe...
PDF
Data Warehousing with Python
PDF
Marc Schwering – Using Flink with MongoDB to enhance relevancy in personaliza...
PPTX
DATA MINING AND DATA WAREHOUSING TOOLS .pptx
PPTX
Introducing NoSQL and MongoDB to complement Relational Databases (AMIS SIG 14...
PPTX
MongoDB Partner Program Update - November 2013
PDF
Privacy by Design - Lars Albertsson, Mapflat
PDF
What does an event mean? Manage the meaning of your data! | Andreas Wombacher...
PDF
Protecting privacy in practice
PPTX
MongoDB Days Germany: Data Processing with MongoDB
PDF
OPENEXPO Madrid 2015 - Advanced Applications with MongoDB
SFScon21 - Maurizio Napolitano - Apache Superset - A modern data exploration ...
Privacy by design
Hatkit Project - Datafiddler
Steampipe - use SQL to retrieve data from cloud, platforms and files (Code Ca...
Big Data Analytics 2: Leveraging Customer Behavior to Enhance Relevancy in Pe...
Data Warehousing with Python
Marc Schwering – Using Flink with MongoDB to enhance relevancy in personaliza...
DATA MINING AND DATA WAREHOUSING TOOLS .pptx
Introducing NoSQL and MongoDB to complement Relational Databases (AMIS SIG 14...
MongoDB Partner Program Update - November 2013
Privacy by Design - Lars Albertsson, Mapflat
What does an event mean? Manage the meaning of your data! | Andreas Wombacher...
Protecting privacy in practice
MongoDB Days Germany: Data Processing with MongoDB
OPENEXPO Madrid 2015 - Advanced Applications with MongoDB
Ad

More from Lucas Jellema (20)

PPTX
Introduction to web application development with Vue (for absolute beginners)...
PPTX
Making the Shift Left - Bringing Ops to Dev before bringing applications to p...
PPTX
Lightweight coding in powerful Cloud Development Environments (DigitalXchange...
PPTX
CONNECTING THE REAL WORLD TO ENTERPRISE IT – HOW IoT DRIVES OUR ENERGY TRANSI...
PPTX
Help me move away from Oracle - or not?! (Oracle Community Tour EMEA - LVOUG...
PPTX
Op je vingers tellen... tot 1000!
PPTX
IoT - from prototype to enterprise platform (DigitalXchange 2022)
PPTX
Who Wants to Become an IT Architect-A Look at the Bigger Picture - DigitalXch...
PPTX
Automation of Software Engineering with OCI DevOps Build and Deployment Pipel...
PPTX
Introducing Dapr.io - the open source personal assistant to microservices and...
PPTX
How and Why you can and should Participate in Open Source Projects (AMIS, Sof...
PPTX
Microservices, Apache Kafka, Node, Dapr and more - Part Two (Fontys Hogeschoo...
PPTX
Microservices, Node, Dapr and more - Part One (Fontys Hogeschool, Spring 2022)
PPTX
6Reinventing Oracle Systems in a Cloudy World (RMOUG Trainingdays, February 2...
PPTX
Help me move away from Oracle! (RMOUG Training Days 2022, February 2022)
PPTX
Tech Talks 101 - DevOps (jan 2022)
PPTX
Conclusion Code Cafe - Microcks for Mocking and Testing Async APIs (January 2...
PPTX
Cloud Native Application Development - build fast, low TCO, scalable & agile ...
PPTX
Software Engineering as the Next Level Up from Programming (Oracle Groundbrea...
PPTX
Triple C - Centralize, Cloudify and Consolidate Dozens of Oracle Databases (O...
Introduction to web application development with Vue (for absolute beginners)...
Making the Shift Left - Bringing Ops to Dev before bringing applications to p...
Lightweight coding in powerful Cloud Development Environments (DigitalXchange...
CONNECTING THE REAL WORLD TO ENTERPRISE IT – HOW IoT DRIVES OUR ENERGY TRANSI...
Help me move away from Oracle - or not?! (Oracle Community Tour EMEA - LVOUG...
Op je vingers tellen... tot 1000!
IoT - from prototype to enterprise platform (DigitalXchange 2022)
Who Wants to Become an IT Architect-A Look at the Bigger Picture - DigitalXch...
Automation of Software Engineering with OCI DevOps Build and Deployment Pipel...
Introducing Dapr.io - the open source personal assistant to microservices and...
How and Why you can and should Participate in Open Source Projects (AMIS, Sof...
Microservices, Apache Kafka, Node, Dapr and more - Part Two (Fontys Hogeschoo...
Microservices, Node, Dapr and more - Part One (Fontys Hogeschool, Spring 2022)
6Reinventing Oracle Systems in a Cloudy World (RMOUG Trainingdays, February 2...
Help me move away from Oracle! (RMOUG Training Days 2022, February 2022)
Tech Talks 101 - DevOps (jan 2022)
Conclusion Code Cafe - Microcks for Mocking and Testing Async APIs (January 2...
Cloud Native Application Development - build fast, low TCO, scalable & agile ...
Software Engineering as the Next Level Up from Programming (Oracle Groundbrea...
Triple C - Centralize, Cloudify and Consolidate Dozens of Oracle Databases (O...

Recently uploaded (20)

PPT
Quality review (1)_presentation of this 21
PDF
Lecture1 pattern recognition............
PPTX
MODULE 8 - DISASTER risk PREPAREDNESS.pptx
PDF
Capcut Pro Crack For PC Latest Version {Fully Unlocked 2025}
PPTX
Leprosy and NLEP programme community medicine
PPTX
iec ppt-1 pptx icmr ppt on rehabilitation.pptx
PDF
annual-report-2024-2025 original latest.
PPTX
Computer network topology notes for revision
PDF
Galatica Smart Energy Infrastructure Startup Pitch Deck
PPTX
Data_Analytics_and_PowerBI_Presentation.pptx
PPT
Predictive modeling basics in data cleaning process
PPTX
STERILIZATION AND DISINFECTION-1.ppthhhbx
PDF
Mega Projects Data Mega Projects Data
PPTX
Supervised vs unsupervised machine learning algorithms
PPTX
Introduction to Basics of Ethical Hacking and Penetration Testing -Unit No. 1...
PPTX
Qualitative Qantitative and Mixed Methods.pptx
PPTX
Managing Community Partner Relationships
PDF
Optimise Shopper Experiences with a Strong Data Estate.pdf
PPTX
IBA_Chapter_11_Slides_Final_Accessible.pptx
PPTX
SAP 2 completion done . PRESENTATION.pptx
Quality review (1)_presentation of this 21
Lecture1 pattern recognition............
MODULE 8 - DISASTER risk PREPAREDNESS.pptx
Capcut Pro Crack For PC Latest Version {Fully Unlocked 2025}
Leprosy and NLEP programme community medicine
iec ppt-1 pptx icmr ppt on rehabilitation.pptx
annual-report-2024-2025 original latest.
Computer network topology notes for revision
Galatica Smart Energy Infrastructure Startup Pitch Deck
Data_Analytics_and_PowerBI_Presentation.pptx
Predictive modeling basics in data cleaning process
STERILIZATION AND DISINFECTION-1.ppthhhbx
Mega Projects Data Mega Projects Data
Supervised vs unsupervised machine learning algorithms
Introduction to Basics of Ethical Hacking and Penetration Testing -Unit No. 1...
Qualitative Qantitative and Mixed Methods.pptx
Managing Community Partner Relationships
Optimise Shopper Experiences with a Strong Data Estate.pdf
IBA_Chapter_11_Slides_Final_Accessible.pptx
SAP 2 completion done . PRESENTATION.pptx

Apache Superset - open source data exploration and visualization (Conclusion Code Café, march 2023)

  • 1. Classificatie: vertrouwelijk Apache Superset Data Exploration, Visualization & Analysis co-star: Steampipe & Trino Conclusion Code Café – 20 maart 2023 Lucas Jellema, CTO & Architect AMIS | Conclusion SQL
  • 2. Classificatie: vertrouwelijk Apache Superset • Data Visualization – ready to use product • browser based UI & web server backend • any SQL data source • quick table-to-visualization & dashboard • open source, end user friendly/self service • Design principles: • single or multi-user, • no data is stored in Superset (except meta-data) and ephemeral cache, • light weight & optional semantic layer • row based access control applied Conclusion Code Café - Apache Superset - Data Exploration, Visualization & Analysis 2
  • 3. Classificatie: vertrouwelijk Apache Superset • Typical workflow • connect data source and on boards “tables” • explore / filter/ aggregate / slide & dice data • create visualizations and annotate findings • compose and publish dashboards Conclusion Code Café - Apache Superset - Data Exploration, Visualization & Analysis 3
  • 4. Classificatie: vertrouwelijk History • Originated at Airbnb in 2015 as the frontend for Apache Druid • Druid: open source multi-dimensional in memory distributed real time timeseries database • Earlier names: Panoramix & Caravel • Under Apache Software Foundation since 2017 • July 2022 – Release 2.0 • Tech Debt resolved, better Databricks, Pinot & Trino support, much improved UI experience • Technology stack • JavaScript: React/Redux, D3.js, webpack • Python, Flask, Pandas, SQLAlchemy • Thriving open source project • Used in many companies. Example: Airbnb – 600+ daily users, 100K+ charts • Offered as SaaS: Preset Conclusion Code Café - Apache Superset - Data Exploration, Visualization & Analysis 4
  • 5. Classificatie: vertrouwelijk How to work with Apache Superset? • Install, Configure & Run => for example Kubernetes, Docker Compose, Gitpod • Configure Database Connection(s) • or upload CSV data files • Define Data Set(s) based on “Tables” • explore/refine in SQL Lab • define “Ninja Templates” – custom filters for specific SQL or context data Conclusion Code Café - Apache Superset - Data Exploration, Visualization & Analysis 5
  • 6. Classificatie: vertrouwelijk How to work with Apache Superset? • Install, Configure & Run => for example Kubernetes, Docker Compose, Gitpod • Configure Database Connection(s) • or upload CSV data files • Define Data Set(s) based on “Tables” • explore/refine in SQL Lab • define “Ninja Templates” – custom filters for specific SQL or context data • Create a Chart on that Data Set • select type of visualization • map data to visualization (x, y, series, time, ..) • configure chart: color-scheme, titles, legend • annotate chart – provide commentary • publish chart – image, CSV/Excel/JSON, email, add to dashboard • define alerts (on SQL condition) and schedule reports – Slack or Email • Compose and Expose dashboard Conclusion Code Café - Apache Superset - Data Exploration, Visualization & Analysis 6
  • 7. Classificatie: vertrouwelijk How to work with Apache Superset? • Install, Configure & Run Configure Database Connection(s) • Define Data Set(s) based on “Tables” • Create a Chart on that Data Set • Compose and Expose dashboard Conclusion Code Café - Apache Superset - Data Exploration, Visualization & Analysis 7
  • 8. Classificatie: vertrouwelijk Demo Apache Superset • Create Data Set from SQL Query • Explore Data • Create Visualization • Demonstrate Dashboard Conclusion Code Café - Apache Superset - Data Exploration, Visualization & Analysis 8 Rob Laughter on Unsplash.com
  • 9. Classificatie: vertrouwelijk Define Data Set for SQL Table or View Conclusion Code Café - Apache Superset - Data Exploration, Visualization & Analysis 9
  • 10. Classificatie: vertrouwelijk Explore Data Conclusion Code Café - Apache Superset - Data Exploration, Visualization & Analysis 10
  • 11. Classificatie: vertrouwelijk Visualize Data Set Conclusion Code Café - Apache Superset - Data Exploration, Visualization & Analysis 11
  • 12. Classificatie: vertrouwelijk End of Demo Apache Superset Conclusion Code Café - Apache Superset - Data Exploration, Visualization & Analysis 12 Rob Laughter on Unsplash.com
  • 13. Classificatie: vertrouwelijk Predictive Analytics • Predict values into the future • extrapolate from past • take seasonality into consideration • Based on Prophet • open source Python Conclusion Code Café - Apache Superset - Data Exploration, Visualization & Analysis 13
  • 14. Classificatie: vertrouwelijk Notifications - Alarms and Scheduled Reports • When? • condition • schedule • What? • To whom? • Channel/Method? Conclusion Code Café - Apache Superset - Data Exploration, Visualization & Analysis 14
  • 15. Classificatie: vertrouwelijk Annotation – Multiple Layers – Label Time Intervals and Timestamps Conclusion Code Café - Apache Superset - Data Exploration, Visualization & Analysis 15
  • 16. Classificatie: vertrouwelijk Security • Users identified through OAuth2 providers such as GitHub, Twitter, LinkedIn, Google, Azure, and custom OAuth2 providers • Users are associated with roles • Roles are authorized on data sources, views, dashboards • Row level access / Group level data filters • Define a security filter for a table and associate the filter with a specific group • Any data access on that table by someone in the group will have the filter applied “transparently” • Multiple filters will be combined Conclusion Code Café - Apache Superset - Data Exploration, Visualization & Analysis 16 table Group Security Filter SQL
  • 17. Classificatie: vertrouwelijk Custom Visualization Plugins • Adding custom data visualizations to Superset is well supported • Steps: • Generate Skeleton for custom plugin (CLI, Yeoman) • Register plugin (JSON) • Configure plugin – valid input, labels, hooks (JSON) • Implement/link React component that actually renders data (Typescript/JS) • At runtime: Superset • exposes custom plugin in gallery • allows users to set relevant configuration for plugin • passes data – query result set – to plugin • embeds the rendered outcome appropriately in the webpage Conclusion Code Café - Apache Superset - Data Exploration, Visualization & Analysis 17
  • 18. Classificatie: vertrouwelijk Data Source Reach of Apache Superset • Superset can process data in SQL enabled sources Conclusion Code Café - Apache Superset - Data Exploration, Visualization & Analysis 18
  • 19. Classificatie: vertrouwelijk Trino (pka Presto SQL) Conclusion Code Café - Apache Superset - Data Exploration, Visualization & Analysis 19 • Distributed Federated Query Engine • OLAP system, enables data mesh • does not store data itself • MPP architecture • Trino processes SQL queries against multiple data engines • SQL and NoSQL • database and other (queue, event broker, file system, cache) • combines results across sources: join, union, group by / aggregate • Started in 2012 at Facebook as Presto • to replace Hive • Offered as SaaS by Galaxy
  • 20. Classificatie: vertrouwelijk Superset can access data via Trino using SQL Conclusion Code Café - Apache Superset - Data Exploration, Visualization & Analysis 20
  • 21. Classificatie: vertrouwelijk Superset can access & combine non-SQL and SQL sources via Trino Conclusion Code Café - Apache Superset - Data Exploration, Visualization & Analysis 21
  • 22. Classificatie: vertrouwelijk Steampipe Conclusion Code Café - Apache Superset - Data Exploration, Visualization & Analysis 22 102 plugins for various data sources data can be joined, filtered, union- ed/minussed, aggregated
  • 23. Classificatie: vertrouwelijk Via Steampipe – Superset has access to 100 more sources Conclusion Code Café - Apache Superset - Data Exploration, Visualization & Analysis 23
  • 24. Classificatie: vertrouwelijk Extending Source Reach of Apache Superset – across platforms, data formats, protocols and query languages Conclusion Code Café - Apache Superset - Data Exploration, Visualization & Analysis 24 Shillelagh Google Sheets HTTP => JSON, CSV GitHub GraphQL Datasette HTML Table S3 Weather API Socrata
  • 25. Classificatie: vertrouwelijk workspace Superset web app at port 8088 docker-compose – running 6 containers database connection <> plugin Gitpod Workspace for Trying Out Apache Superset and Steampipe
  • 26. Classificatie: vertrouwelijk Summary Data Visualization Across virtually any data source (also leveraging Trino, Steampipe etc) User friendly Appealing, insightful visualizations Data exploration (slice & dice) Customizable (custom visualizations) Open source, open architecture Fine grained security Free (and better?) alternative to Tableau, Qlik, PowerBI Conclusion Code Café - Apache Superset - Data Exploration, Visualization & Analysis 26
  • 27. Classificatie: vertrouwelijk Gitpod Workspace for Trying Out Apache Superset Conclusion Code Café - Apache Superset - Data Exploration, Visualization & Analysis 27 workspace Redis superset_app web app at port 8088 superset_worker superset_init superset_worker_beat superset_cache superset_db

Editor's Notes

  • #9: https://unsplash.com/photos/WW1jsInXgwM
  • #13: https://unsplash.com/photos/WW1jsInXgwM