SlideShare a Scribd company logo
SparkOscope: Enabling Apache
Spark Optimization Through Cross-
Stack Monitoring and Visualization
Yiannis Gkoufas
IBM Research Dublin,Ireland
High Performance Systems
whoami
• Research Software Engineer in IBM Research,
Ireland since 2012
• Work on Analytics Foundations Middleware
– Distributed Frameworks, Anything Java/Scala based,
Web-based POCs
• High Performance Systems Group: Kostas,
Andrea, Dimitris, Khalid, Michael, Michele,
Mustafa, Pierre, Sri
Spark Experience
• We love developing in Spark our analytical
workloads and fully embraced it since the early
1.0.x versions
• Last few years, used it to run jobs on large
volume of energy-related sensor data
Jobs on Daily Basis
• Once we managed to develop the needed jobs,
they were executed in a recurring fashion
• We were receiving a new batch of data every
day
Fighting Bugs
• When there was a bug on our code, it was very
easy to discover it the Spark Web UI
• We could easily retrieve information about the
job, stage and line number in our source code
Fighting bottlenecks
• However we couldn’t easily spot which jobs and
stages were causing a slow down
• What was the part of our code that was the
bottleneck?
Ganglia Extension
• We had the option to use the Ganglia
Extension to export the metrics but:
– We need to maintain/configure yet another external
system
– There is no association with the Spark
jobs/stages/source code
Spark Monitoring Framework
• We could use the built-in Spark Monitoring
Framework but:
– Collecting CSVs from the worker nodes and
aggregating them seems cumbersome
– Again we couldn’t easily extract associations with
our source code of the job
Current Monitoring Architecture
Spark Worker1 Spark Worker2
Executor1 Executor2 Executor3 Executor4 Executor5 Executor6
Executor
Source
Executor
Source
Executor
Source
Executor
Source
Executor
Source
Executor
Source
CSV CSV CSV CSV CSV CSV
Job Execution
Monitoring
Framework
Local
Filesystem
Enter SparkOscope
SparkOscope Overview
• Extension to enrich Spark’s Monitoring
Framework with OS-level Metrics
• Enhancement of the Web UI to plot all the
available metrics + the newly developed OS-
level metrics
SparkOscope Modules
• SigarSource: Attached to the executor, leveraging
Hyperic Sigar library to get OS-Level Metrics
• HDFSSink: Exports all available metrics to an HDFS
directory
• MQTTSink: Publishes all available metrics on an MQTT
Topic
• Modified Web UI: Modified Spark Web UI to plot
historical and realtime plots, generated from the modules
SparkOscope Flavors
• Historical Plots: View metrics on the UI after
the job has finished
• Realtime Plots: View metrics on the UI in
realtime as the job is being executed
• Headless: Use SigarSource, HDFSSink,
MQTTSink without viewing the plots on the UI
– https://github.com/ibm-research-ireland/sparkoscope-headless
SparkOscope High-level
Architecture - Historical plots
Spark Worker1 Spark Worker2
Executor1 Executor2 Executor3 Executor4 Executor5 Executor6
Executor
Sigar
Source
Executor
Sigar
Source
Executor
Sigar
Source
Executor
Sigar
Source
Executor
Sigar
Source
Executor
Sigar
Source
Job Execution
Monitoring
Framework
HDFS /custom-metrics/app-xxxxxxx
/executor1
/executor2
/executor3
/executor4
/executor5
/executor6
Spark Web UI
SparkOscope High-level
Architecture - Realtime plots
Spark Worker1 Spark Worker2
Executor1 Executor2 Executor3 Executor4 Executor5 Executor6
Executor
Sigar
Source
Executor
Sigar
Source
Executor
Sigar
Source
Executor
Sigar
Source
Executor
Sigar
Source
Executor
Sigar
Source
Job Execution
Monitoring
Framework
Master /custom-metrics/app-xxxxxxx
Spark Web UI
MQTT Broker
SparkOscope Basic Installation
• Clone the git repo: https://github.com/ibm-research-
ireland/sparkoscope
• Build Spark
• Modify the configuration files:
metrics.properties spark-defaults.conf
SparkOscope OS-level Metrics
• Download the Hyperic Sigar library to all the slave nodes
• Extract it anywhere in the system
• Modify the configuration files
metrics.properties spark-env.sh
SparkOscope Realtime Plots
• Modify the configuration files
metrics.properties spark-defaults.conf
• Make sure that no service is currently running on ports specified on
the Master
• Make sure that executor.sink.mqtt.port is the same as
spark.moquette.conf
SparkOscope Headless Installation
• Clone the git repo: https://github.com/ibm-research-
ireland/sparkoscope-headless
• Build the maven project
• Modify the configuration files as described for SigarSource,
HDFSSink, MQTTSink
• Additionally you need to append to spark.executor.extraClassPath
the paths of the created jars
• No need to have the patched Spark version, since the metrics
are not displayed in the UI
Demo!
Roadmap
• Expand the range of available Sinks and
Sources
• Smart recommendations on infrastructure needs
derived from patterns of resource utilization of
jobs
• Work with the opensource ecosystem to improve
it and target more use cases
Thank You.
Questions?
email: yiannisg@ie.ibm.com

More Related Content

PDF
Azure Event Hubs - Behind the Scenes With Kasun Indrasiri | Current 2022
PDF
DB Time, Average Active Sessions, and ASH Math - Oracle performance fundamentals
PDF
Oracle 12c and its pluggable databases
PDF
MAA Best Practices for Oracle Database 19c
PPTX
Openstack Swift - Lots of small files
PDF
Clone Oracle Databases In Minutes Without Risk Using Enterprise Manager 13c
PDF
Continuous Application with FAIR Scheduler with Robert Xue
PDF
Open Source Logging and Monitoring Tools
Azure Event Hubs - Behind the Scenes With Kasun Indrasiri | Current 2022
DB Time, Average Active Sessions, and ASH Math - Oracle performance fundamentals
Oracle 12c and its pluggable databases
MAA Best Practices for Oracle Database 19c
Openstack Swift - Lots of small files
Clone Oracle Databases In Minutes Without Risk Using Enterprise Manager 13c
Continuous Application with FAIR Scheduler with Robert Xue
Open Source Logging and Monitoring Tools

What's hot (20)

ODP
MySQL Group Replication
PDF
Apache Spark on K8S and HDFS Security with Ilan Flonenko
PDF
Performance Analysis of Apache Spark and Presto in Cloud Environments
PPTX
Systems oracle overview_hardware
PDF
Performance Troubleshooting Using Apache Spark Metrics
PDF
Building Data Lakes with Apache Airflow
PDF
SQLServer Database Structures
PDF
The Total Economic ImpactTM (TEI) of Neo4j, Featuring Forrester
PDF
Everything you ever needed to know about Kafka on Kubernetes but were afraid ...
PPTX
Oracle sql high performance tuning
PDF
Exploring Oracle Database Performance Tuning Best Practices for DBAs and Deve...
PDF
An architecture for federated data discovery and lineage over on-prem datasou...
PPTX
kali linux
PDF
Postgresql database administration volume 1
PDF
Understanding oracle rac internals part 1 - slides
PDF
Distributed Databases Deconstructed: CockroachDB, TiDB and YugaByte DB
PPTX
Visualizing Kafka Security
PDF
Agile Database Development with JSON
PDF
Fast Data with Apache Ignite and Apache Spark with Christos Erotocritou
PPTX
Optimizing Apache Spark SQL Joins
MySQL Group Replication
Apache Spark on K8S and HDFS Security with Ilan Flonenko
Performance Analysis of Apache Spark and Presto in Cloud Environments
Systems oracle overview_hardware
Performance Troubleshooting Using Apache Spark Metrics
Building Data Lakes with Apache Airflow
SQLServer Database Structures
The Total Economic ImpactTM (TEI) of Neo4j, Featuring Forrester
Everything you ever needed to know about Kafka on Kubernetes but were afraid ...
Oracle sql high performance tuning
Exploring Oracle Database Performance Tuning Best Practices for DBAs and Deve...
An architecture for federated data discovery and lineage over on-prem datasou...
kali linux
Postgresql database administration volume 1
Understanding oracle rac internals part 1 - slides
Distributed Databases Deconstructed: CockroachDB, TiDB and YugaByte DB
Visualizing Kafka Security
Agile Database Development with JSON
Fast Data with Apache Ignite and Apache Spark with Christos Erotocritou
Optimizing Apache Spark SQL Joins
Ad

Similar to SparkOscope: Enabling Apache Spark Optimization through Cross Stack Monitoring with Yiannis Gkoufas (20)

PDF
Spark Summit EU talk by Yiannis Gkoufas
PPTX
SplunkLive! Developer Session
PPTX
Integrating Splunk into your Spring Applications
PDF
Spark Uber Development Kit
PPTX
Splunk Developer Platform
PPTX
Serverless spark
PDF
Spark Development Lifecycle at Workday - ApacheCon 2020
PDF
Apache Spark Development Lifecycle @ Workday - ApacheCon 2020
PDF
Monitor Apache Spark 3 on Kubernetes using Metrics and Plugins
PDF
we45 DEFCON Workshop - Building AppSec Automation with Python
POTX
Using the Splunk Java SDK
DOC
CV_RishabhDixit
PPTX
The structured streaming upgrade to Apache Spark and how enterprises can bene...
PPSX
Elastic-Engineering
PDF
Laying the Foundation for Ionic Platform Insights on Spark
PDF
Spark Hsinchu meetup
PPTX
Sas 2015 event_driven
PDF
Running Apache Spark Jobs Using Kubernetes
PDF
Apache Spark At Apple with Sam Maclennan and Vishwanath Lakkundi
PPTX
Modern application development with oracle cloud sangam17
Spark Summit EU talk by Yiannis Gkoufas
SplunkLive! Developer Session
Integrating Splunk into your Spring Applications
Spark Uber Development Kit
Splunk Developer Platform
Serverless spark
Spark Development Lifecycle at Workday - ApacheCon 2020
Apache Spark Development Lifecycle @ Workday - ApacheCon 2020
Monitor Apache Spark 3 on Kubernetes using Metrics and Plugins
we45 DEFCON Workshop - Building AppSec Automation with Python
Using the Splunk Java SDK
CV_RishabhDixit
The structured streaming upgrade to Apache Spark and how enterprises can bene...
Elastic-Engineering
Laying the Foundation for Ionic Platform Insights on Spark
Spark Hsinchu meetup
Sas 2015 event_driven
Running Apache Spark Jobs Using Kubernetes
Apache Spark At Apple with Sam Maclennan and Vishwanath Lakkundi
Modern application development with oracle cloud sangam17
Ad

More from Databricks (20)

PPTX
DW Migration Webinar-March 2022.pptx
PPTX
Data Lakehouse Symposium | Day 1 | Part 1
PPT
Data Lakehouse Symposium | Day 1 | Part 2
PPTX
Data Lakehouse Symposium | Day 2
PPTX
Data Lakehouse Symposium | Day 4
PDF
5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop
PDF
Democratizing Data Quality Through a Centralized Platform
PDF
Learn to Use Databricks for Data Science
PDF
Why APM Is Not the Same As ML Monitoring
PDF
The Function, the Context, and the Data—Enabling ML Ops at Stitch Fix
PDF
Stage Level Scheduling Improving Big Data and AI Integration
PDF
Simplify Data Conversion from Spark to TensorFlow and PyTorch
PDF
Scaling your Data Pipelines with Apache Spark on Kubernetes
PDF
Scaling and Unifying SciKit Learn and Apache Spark Pipelines
PDF
Sawtooth Windows for Feature Aggregations
PDF
Redis + Apache Spark = Swiss Army Knife Meets Kitchen Sink
PDF
Re-imagine Data Monitoring with whylogs and Spark
PDF
Raven: End-to-end Optimization of ML Prediction Queries
PDF
Processing Large Datasets for ADAS Applications using Apache Spark
PDF
Massive Data Processing in Adobe Using Delta Lake
DW Migration Webinar-March 2022.pptx
Data Lakehouse Symposium | Day 1 | Part 1
Data Lakehouse Symposium | Day 1 | Part 2
Data Lakehouse Symposium | Day 2
Data Lakehouse Symposium | Day 4
5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop
Democratizing Data Quality Through a Centralized Platform
Learn to Use Databricks for Data Science
Why APM Is Not the Same As ML Monitoring
The Function, the Context, and the Data—Enabling ML Ops at Stitch Fix
Stage Level Scheduling Improving Big Data and AI Integration
Simplify Data Conversion from Spark to TensorFlow and PyTorch
Scaling your Data Pipelines with Apache Spark on Kubernetes
Scaling and Unifying SciKit Learn and Apache Spark Pipelines
Sawtooth Windows for Feature Aggregations
Redis + Apache Spark = Swiss Army Knife Meets Kitchen Sink
Re-imagine Data Monitoring with whylogs and Spark
Raven: End-to-end Optimization of ML Prediction Queries
Processing Large Datasets for ADAS Applications using Apache Spark
Massive Data Processing in Adobe Using Delta Lake

Recently uploaded (20)

PPTX
Introduction-to-Cloud-ComputingFinal.pptx
PPTX
climate analysis of Dhaka ,Banglades.pptx
PDF
22.Patil - Early prediction of Alzheimer’s disease using convolutional neural...
PPTX
Introduction to machine learning and Linear Models
PDF
Lecture1 pattern recognition............
PPTX
SAP 2 completion done . PRESENTATION.pptx
PPTX
Data_Analytics_and_PowerBI_Presentation.pptx
PPT
Reliability_Chapter_ presentation 1221.5784
PDF
Fluorescence-microscope_Botany_detailed content
PDF
Introduction to Data Science and Data Analysis
PPT
Quality review (1)_presentation of this 21
PDF
Galatica Smart Energy Infrastructure Startup Pitch Deck
PPTX
oil_refinery_comprehensive_20250804084928 (1).pptx
PPT
ISS -ESG Data flows What is ESG and HowHow
PPTX
STERILIZATION AND DISINFECTION-1.ppthhhbx
PPTX
mbdjdhjjodule 5-1 rhfhhfjtjjhafbrhfnfbbfnb
PDF
Business Analytics and business intelligence.pdf
PPT
Miokarditis (Inflamasi pada Otot Jantung)
PPTX
STUDY DESIGN details- Lt Col Maksud (21).pptx
PPTX
IBA_Chapter_11_Slides_Final_Accessible.pptx
Introduction-to-Cloud-ComputingFinal.pptx
climate analysis of Dhaka ,Banglades.pptx
22.Patil - Early prediction of Alzheimer’s disease using convolutional neural...
Introduction to machine learning and Linear Models
Lecture1 pattern recognition............
SAP 2 completion done . PRESENTATION.pptx
Data_Analytics_and_PowerBI_Presentation.pptx
Reliability_Chapter_ presentation 1221.5784
Fluorescence-microscope_Botany_detailed content
Introduction to Data Science and Data Analysis
Quality review (1)_presentation of this 21
Galatica Smart Energy Infrastructure Startup Pitch Deck
oil_refinery_comprehensive_20250804084928 (1).pptx
ISS -ESG Data flows What is ESG and HowHow
STERILIZATION AND DISINFECTION-1.ppthhhbx
mbdjdhjjodule 5-1 rhfhhfjtjjhafbrhfnfbbfnb
Business Analytics and business intelligence.pdf
Miokarditis (Inflamasi pada Otot Jantung)
STUDY DESIGN details- Lt Col Maksud (21).pptx
IBA_Chapter_11_Slides_Final_Accessible.pptx

SparkOscope: Enabling Apache Spark Optimization through Cross Stack Monitoring with Yiannis Gkoufas

  • 1. SparkOscope: Enabling Apache Spark Optimization Through Cross- Stack Monitoring and Visualization Yiannis Gkoufas IBM Research Dublin,Ireland High Performance Systems
  • 2. whoami • Research Software Engineer in IBM Research, Ireland since 2012 • Work on Analytics Foundations Middleware – Distributed Frameworks, Anything Java/Scala based, Web-based POCs • High Performance Systems Group: Kostas, Andrea, Dimitris, Khalid, Michael, Michele, Mustafa, Pierre, Sri
  • 3. Spark Experience • We love developing in Spark our analytical workloads and fully embraced it since the early 1.0.x versions • Last few years, used it to run jobs on large volume of energy-related sensor data
  • 4. Jobs on Daily Basis • Once we managed to develop the needed jobs, they were executed in a recurring fashion • We were receiving a new batch of data every day
  • 5. Fighting Bugs • When there was a bug on our code, it was very easy to discover it the Spark Web UI • We could easily retrieve information about the job, stage and line number in our source code
  • 6. Fighting bottlenecks • However we couldn’t easily spot which jobs and stages were causing a slow down • What was the part of our code that was the bottleneck?
  • 7. Ganglia Extension • We had the option to use the Ganglia Extension to export the metrics but: – We need to maintain/configure yet another external system – There is no association with the Spark jobs/stages/source code
  • 8. Spark Monitoring Framework • We could use the built-in Spark Monitoring Framework but: – Collecting CSVs from the worker nodes and aggregating them seems cumbersome – Again we couldn’t easily extract associations with our source code of the job
  • 9. Current Monitoring Architecture Spark Worker1 Spark Worker2 Executor1 Executor2 Executor3 Executor4 Executor5 Executor6 Executor Source Executor Source Executor Source Executor Source Executor Source Executor Source CSV CSV CSV CSV CSV CSV Job Execution Monitoring Framework Local Filesystem
  • 11. SparkOscope Overview • Extension to enrich Spark’s Monitoring Framework with OS-level Metrics • Enhancement of the Web UI to plot all the available metrics + the newly developed OS- level metrics
  • 12. SparkOscope Modules • SigarSource: Attached to the executor, leveraging Hyperic Sigar library to get OS-Level Metrics • HDFSSink: Exports all available metrics to an HDFS directory • MQTTSink: Publishes all available metrics on an MQTT Topic • Modified Web UI: Modified Spark Web UI to plot historical and realtime plots, generated from the modules
  • 13. SparkOscope Flavors • Historical Plots: View metrics on the UI after the job has finished • Realtime Plots: View metrics on the UI in realtime as the job is being executed • Headless: Use SigarSource, HDFSSink, MQTTSink without viewing the plots on the UI – https://github.com/ibm-research-ireland/sparkoscope-headless
  • 14. SparkOscope High-level Architecture - Historical plots Spark Worker1 Spark Worker2 Executor1 Executor2 Executor3 Executor4 Executor5 Executor6 Executor Sigar Source Executor Sigar Source Executor Sigar Source Executor Sigar Source Executor Sigar Source Executor Sigar Source Job Execution Monitoring Framework HDFS /custom-metrics/app-xxxxxxx /executor1 /executor2 /executor3 /executor4 /executor5 /executor6 Spark Web UI
  • 15. SparkOscope High-level Architecture - Realtime plots Spark Worker1 Spark Worker2 Executor1 Executor2 Executor3 Executor4 Executor5 Executor6 Executor Sigar Source Executor Sigar Source Executor Sigar Source Executor Sigar Source Executor Sigar Source Executor Sigar Source Job Execution Monitoring Framework Master /custom-metrics/app-xxxxxxx Spark Web UI MQTT Broker
  • 16. SparkOscope Basic Installation • Clone the git repo: https://github.com/ibm-research- ireland/sparkoscope • Build Spark • Modify the configuration files: metrics.properties spark-defaults.conf
  • 17. SparkOscope OS-level Metrics • Download the Hyperic Sigar library to all the slave nodes • Extract it anywhere in the system • Modify the configuration files metrics.properties spark-env.sh
  • 18. SparkOscope Realtime Plots • Modify the configuration files metrics.properties spark-defaults.conf • Make sure that no service is currently running on ports specified on the Master • Make sure that executor.sink.mqtt.port is the same as spark.moquette.conf
  • 19. SparkOscope Headless Installation • Clone the git repo: https://github.com/ibm-research- ireland/sparkoscope-headless • Build the maven project • Modify the configuration files as described for SigarSource, HDFSSink, MQTTSink • Additionally you need to append to spark.executor.extraClassPath the paths of the created jars • No need to have the patched Spark version, since the metrics are not displayed in the UI
  • 20. Demo!
  • 21. Roadmap • Expand the range of available Sinks and Sources • Smart recommendations on infrastructure needs derived from patterns of resource utilization of jobs • Work with the opensource ecosystem to improve it and target more use cases