SparkOscope: Enabling Apache Spark Optimization through Cross Stack Monitoring with Yiannis Gkoufas

SparkOscope: Enabling Apache
Spark Optimization Through Cross-
Stack Monitoring and Visualization
Yiannis Gkoufas
IBM Research Dublin,Ireland
High Performance Systems

whoami
• Research Software Engineer in IBM Research,
Ireland since 2012
• Work on Analytics Foundations Middleware
– Distributed Frameworks, Anything Java/Scala based,
Web-based POCs
• High Performance Systems Group: Kostas,
Andrea, Dimitris, Khalid, Michael, Michele,
Mustafa, Pierre, Sri

Spark Experience
• We love developing in Spark our analytical
workloads and fully embraced it since the early
1.0.x versions
• Last few years, used it to run jobs on large
volume of energy-related sensor data

Jobs on Daily Basis
• Once we managed to develop the needed jobs,
they were executed in a recurring fashion
• We were receiving a new batch of data every
day

Fighting Bugs
• When there was a bug on our code, it was very
easy to discover it the Spark Web UI
• We could easily retrieve information about the
job, stage and line number in our source code

Fighting bottlenecks
• However we couldn’t easily spot which jobs and
stages were causing a slow down
• What was the part of our code that was the
bottleneck?

Ganglia Extension
• We had the option to use the Ganglia
Extension to export the metrics but:
– We need to maintain/configure yet another external
system
– There is no association with the Spark
jobs/stages/source code

Spark Monitoring Framework
• We could use the built-in Spark Monitoring
Framework but:
– Collecting CSVs from the worker nodes and
aggregating them seems cumbersome
– Again we couldn’t easily extract associations with
our source code of the job

Current Monitoring Architecture
Spark Worker1 Spark Worker2
Executor1 Executor2 Executor3 Executor4 Executor5 Executor6
Executor
Source
Executor
Source
Executor
Source
Executor
Source
Executor
Source
Executor
Source
CSV CSV CSV CSV CSV CSV
Job Execution
Monitoring
Framework
Local
Filesystem

SparkOscope Overview
• Extension to enrich Spark’s Monitoring
Framework with OS-level Metrics
• Enhancement of the Web UI to plot all the
available metrics + the newly developed OS-
level metrics

SparkOscope Modules
• SigarSource: Attached to the executor, leveraging
Hyperic Sigar library to get OS-Level Metrics
• HDFSSink: Exports all available metrics to an HDFS
directory
• MQTTSink: Publishes all available metrics on an MQTT
Topic
• Modified Web UI: Modified Spark Web UI to plot
historical and realtime plots, generated from the modules

SparkOscope Flavors
• Historical Plots: View metrics on the UI after
the job has finished
• Realtime Plots: View metrics on the UI in
realtime as the job is being executed
• Headless: Use SigarSource, HDFSSink,
MQTTSink without viewing the plots on the UI
– https://github.com/ibm-research-ireland/sparkoscope-headless

SparkOscope High-level
Architecture - Historical plots
Executor
Sigar
Source
Executor
Sigar
Source
Executor
Sigar
Source
Executor
Sigar
Source
Executor
Sigar
Source
Executor
Sigar
Source
Job Execution
Monitoring
Framework
HDFS /custom-metrics/app-xxxxxxx
/executor1
/executor2
/executor3
/executor4
/executor5
/executor6
Spark Web UI

SparkOscope High-level
Architecture - Realtime plots
Executor
Sigar
Source
Executor
Sigar
Source
Executor
Sigar
Source
Executor
Sigar
Source
Executor
Sigar
Source
Executor
Sigar
Source
Job Execution
Monitoring
Framework
Master /custom-metrics/app-xxxxxxx
Spark Web UI
MQTT Broker

SparkOscope Basic Installation
• Clone the git repo: https://github.com/ibm-research-
ireland/sparkoscope
• Build Spark
• Modify the configuration files:
metrics.properties spark-defaults.conf

SparkOscope OS-level Metrics
• Download the Hyperic Sigar library to all the slave nodes
• Extract it anywhere in the system
• Modify the configuration files
metrics.properties spark-env.sh

SparkOscope Realtime Plots
• Modify the configuration files
metrics.properties spark-defaults.conf
• Make sure that no service is currently running on ports specified on
the Master
• Make sure that executor.sink.mqtt.port is the same as
spark.moquette.conf

SparkOscope Headless Installation
• Clone the git repo: https://github.com/ibm-research-
ireland/sparkoscope-headless
• Build the maven project
• Modify the configuration files as described for SigarSource,
HDFSSink, MQTTSink
• Additionally you need to append to spark.executor.extraClassPath
the paths of the created jars
• No need to have the patched Spark version, since the metrics
are not displayed in the UI

Roadmap
• Expand the range of available Sinks and
Sources
• Smart recommendations on infrastructure needs
derived from patterns of resource utilization of
jobs
• Work with the opensource ecosystem to improve
it and target more use cases

Thank You.
Questions?
email: yiannisg@ie.ibm.com

SparkOscope: Enabling Apache Spark Optimization through Cross Stack Monitoring with Yiannis Gkoufas

More Related Content

What's hot (20)

Similar to SparkOscope: Enabling Apache Spark Optimization through Cross Stack Monitoring with Yiannis Gkoufas (20)

More from Databricks (20)

Recently uploaded (20)

SparkOscope: Enabling Apache Spark Optimization through Cross Stack Monitoring with Yiannis Gkoufas