SlideShare a Scribd company logo
Aljoscha Krettek / Till Rohrmann
Flink committers
Co-founders @ data Artisans
aljoscha@apache.org / trohrmann@apache.org
Data Analysis With
Apache Flink
What is Apache Flink?
1
Functional
API
Relational
API
Graph API
Machine
Learning
…
Iterative Dataflow Engine
Apache Flink Stack
2
Python
Gelly
Table
FlinkML
SAMOA
Batch Optimizer
DataSet (Java/Scala) DataStream (Java/Scala)
Stream Builder
Hadoop
M/R
Distributed Runtime
Local Remote Yarn Tez Embedded
Dataflow
Dataflow
*current Flink master + few PRs
Table
Example Use Case: Log
Analysis
3
What Seems to be the Problem?
 Collect clicks from a
webserver log
 Find interesting URLs
 Combine with user
data
4
Web server
log
user
data base
Interesting
User Data
Extract
Clicks
Combine
Massage
The Execution Environment
 Entry point for all Flink programs
 Creates DataSets from data sources
5
ExecutionEnvironment env =
ExecutionEnvironment.getExecutionEnvironment();
Getting at Those Clicks
6
DataSet<String> log = env.readTextFile("hdfs:///log");
DataSet<Tuple2<String, Integer>> clicks = log.flatMap(
(String line, Collector<Tuple2<String, Integer>> out) ->
String[] parts = in.split("*magic regex*");
if (isClick(parts)) {
out.collect(new Tuple2<>(parts[1],Integer.parseInt(parts[2])));
}
}
)
post /foo/bar… 313
get /data/pic.jpg 128
post /bar/baz… 128
post /hello/there… 42
The Table Environment
 Environment for dealing with Tables
 Converts between DataSet and Table
7
TableEnvironment tableEnv = new TableEnvironment();
Counting those Clicks
8
Table clicksTable = tableEnv.toTable(clicks, "url, userId");
Table urlClickCounts = clicksTable
.groupBy("url, userId")
.select("url, userId, url.count as count");
Getting the User Information
9
Table userInfo = tableEnv.toTable(…, "name, id, …");
Table resultTable = urlClickCounts.join(userInfo)
.where("userId = id && count > 10")
.select("url, count, name, …");
The Final Step
10
class Result {
public String url;
public int count;
public String name;
…
}
DataSet<Result> set =
tableEnv.toSet(resultTable, Result.class);
DataSet<Result> result =
set.groupBy("url").reduceGroup(new ComplexOperation());
result.writeAsText("hdfs:///result");
env.execute();
API in a Nutshell
 Element-wise
• map, flatMap, filter
 Group-wise
• groupBy, reduce, reduceGroup, combineGroup,
mapPartition, aggregate, distinct
 Binary
• join, coGroup, union, cross
 Iterations
• iterate, iterateDelta
 Physical re-organization
• rebalance, partitionByHash, sortPartition
 Streaming
• window, windowMap, coMap, ...
11
What happens under the
hood?
12
From Program to Dataflow
13
Flink Program
Dataflow Plan
Optimized Plan
Distributed Execution
14
Orchestration
Recovery
Master
Memory
Management
Serialization
Worker
Streaming
Network
Advanced Analysis:
Website Recommendation
15
Going Further
 Log analysis result:
Which user visited how
often which web site
 Which other websites
might they like?
 Recommendation by
collaborative filtering
16
Collaborative Filtering
 Recommend items based on users with
similar preferences
 Latent factor models capture underlying
characteristics of items and preferences of
user
 Predicted preference:
17
ˆru,i = xu
T
yi
Matrix Factorization
18
minX,Y ru,i - xu
T
yi( )
2
+l nu xu
2
+ ni yi
2
i
å
u
å
æ
è
ç
ö
ø
÷
ru,i¹0
å
R » XT
Y
Alternating least squares
 Iterative approximation
1. Fix X and optimize Y
2. Fix Y and optimize X
 Communication and
computation intensive
19
R=YX x
R=YX x
Matrix Factorization Pipeline
20
val featureExtractor = HashingFT()
val factorizer = ALS()
val pipeline = featureExtractor.chain(factorizer)
val clickstreamDS =
env.readCsvFile[(String, String, Int)](clickStreamData)
val parameters = ParameterMap()
.add(HashingFT.NumFeatures, 1000000)
.add(ALS.Iterations, 10)
.add(ALS.NumFactors, 50)
.add(ALS.Lambda, 1.5)
val factorization = pipeline.fit(clickstreamDS, parameters)
Clickstream
Data
Hashing
Feature
Extractor
ALS
Matrix
factorization
Does it Scale?
21
• 40 node GCE cluster, highmem-8
• 10 ALS iteration with 50 latent factors
• Based on Spark MLlib’s implementation
Scale of Netflix or Spotify
What Else Can You Do?
 Classification using SVMs
• Conversion goal prediction
 Clustering
• Visitor segmentation
 Multiple linear regression
• Visitor prediction
22
Closing
23
What Have You Seen?
 Flink is a general-purpose analytics system
 Highly expressive Table API
 Advanced analysis with Flink’s machine learning
library
 Jobs are executed on powerful distributed
dataflow engine
24
Flink Roadmap for 2015
 Additions to Machine Learning library
 Streaming Machine Learning
 Support for interactive programs
 Optimization for Table API queries
 SQL on top of Table API
25
26
flink.apache.org
@ApacheFlink
Backup Slides
28
WordCount in DataSet API
29
case class Word (word: String, frequency: Int)
val env = ExecutionEnvironment.getExecutionEnvironment()
val lines = env.readTextFile(...)
lines
.flatMap {line => line.split(" ").map(word => Word(word,1))}
.groupBy("word").sum("frequency”)
.print()
env.execute()
Java and Scala APIs offer the same functionality.
Log Analysis Code
30
ExecutionEnvironment env = TableEnvironment tableEnv = new TableEnvironment();
TableEnvironment tableEnv = new TableEnvironment();
DataSet<String> log = env.readTextFile("hdfs:///log");
DataSet<Tuple2<String, Integer>> clicks = log.flatMap(
new FlatMapFunction<String, Tuple2<String, Integer>>() {
public void flatMap(String in, Collector<Tuple2<>> out) {
String[] parts = in.split("*magic regex*");
if (parts[0].equals("click")) {
out.collect(new Tuple2<>(parts[1], Integer.parseInt(parts[4])));
}
}
});
Table clicksTable = tableEnv.toTable(clicks, "url, userId");
Table urlClickCounts = clicksTable
.groupBy("url, userId")
.select("url, userId, url.count as count");
Table userInfo = tableEnv.toTable(…, "name, id, …");
Table resultTable = urlClickCounts.join(userInfo)
.where("userId = id && count > 10")
.select("url, count, name, …");
DataSet<Result> result = tableEnv.toSet(resultTable, Result.class);
result.writeAsText("hdfs:///result");
env.execute();
Log Analysis Dataflow Graph
31
Log
Map
AggUsers
Join
Result
Group
Log
Map
AggUsers
Join
combine
partition
sort
merge
sort
Result
Group
partition
sort
Pipelined Execution
32
Only 1 Stage
(depending on join strategy)
Data transfer in-memory
and disk if needed
Note: Intermediate DataSets
are not necessarily “created”!

More Related Content

PPTX
ApacheCon: Apache Flink - Fast and Reliable Large-Scale Data Processing
PDF
Introduction to Apache Flink - Fast and reliable big data processing
PDF
Flink Apachecon Presentation
PDF
Flink Gelly - Karlsruhe - June 2015
PPTX
Apache Flink - Overview and Use cases of a Distributed Dataflow System (at pr...
PPTX
Apache Flink@ Strata & Hadoop World London
PDF
Parallel Computing with SolrCloud: Presented by Joel Bernstein, Alfresco
PDF
Large-scale graph processing with Apache Flink @GraphDevroom FOSDEM'15
ApacheCon: Apache Flink - Fast and Reliable Large-Scale Data Processing
Introduction to Apache Flink - Fast and reliable big data processing
Flink Apachecon Presentation
Flink Gelly - Karlsruhe - June 2015
Apache Flink - Overview and Use cases of a Distributed Dataflow System (at pr...
Apache Flink@ Strata & Hadoop World London
Parallel Computing with SolrCloud: Presented by Joel Bernstein, Alfresco
Large-scale graph processing with Apache Flink @GraphDevroom FOSDEM'15

What's hot (20)

PDF
Stateful Distributed Stream Processing
PPTX
Flink Streaming @BudapestData
PDF
Gradoop: Scalable Graph Analytics with Apache Flink @ Flink Forward 2015
PDF
FlinkML: Large Scale Machine Learning with Apache Flink
PDF
Flink Streaming Berlin Meetup
PDF
Machine Learning with Apache Flink at Stockholm Machine Learning Group
PPTX
How Rackspace Cloud Monitoring uses Cassandra
PDF
Batch and Stream Graph Processing with Apache Flink
PPTX
A Data Streaming Architecture with Apache Flink (berlin Buzzwords 2016)
PDF
Apache Flink Internals: Stream & Batch Processing in One System – Apache Flin...
PPTX
Flink internals web
PPTX
First Flink Bay Area meetup
PDF
Albert Bifet – Apache Samoa: Mining Big Data Streams with Apache Flink
PDF
Your Big Data Stack is Too Big!: Presented by Timothy Potter, Lucidworks
PPTX
From Apache Flink® 1.3 to 1.4
PDF
Airflow at lyft for Airflow summit 2020 conference
PDF
Gelly-Stream: Single-Pass Graph Streaming Analytics with Apache Flink
PDF
Stream Processing use cases and applications with Apache Apex by Thomas Weise
PDF
Suneel Marthi – BigPetStore Flink: A Comprehensive Blueprint for Apache Flink
PPTX
Building a modern Application with DataFrames
Stateful Distributed Stream Processing
Flink Streaming @BudapestData
Gradoop: Scalable Graph Analytics with Apache Flink @ Flink Forward 2015
FlinkML: Large Scale Machine Learning with Apache Flink
Flink Streaming Berlin Meetup
Machine Learning with Apache Flink at Stockholm Machine Learning Group
How Rackspace Cloud Monitoring uses Cassandra
Batch and Stream Graph Processing with Apache Flink
A Data Streaming Architecture with Apache Flink (berlin Buzzwords 2016)
Apache Flink Internals: Stream & Batch Processing in One System – Apache Flin...
Flink internals web
First Flink Bay Area meetup
Albert Bifet – Apache Samoa: Mining Big Data Streams with Apache Flink
Your Big Data Stack is Too Big!: Presented by Timothy Potter, Lucidworks
From Apache Flink® 1.3 to 1.4
Airflow at lyft for Airflow summit 2020 conference
Gelly-Stream: Single-Pass Graph Streaming Analytics with Apache Flink
Stream Processing use cases and applications with Apache Apex by Thomas Weise
Suneel Marthi – BigPetStore Flink: A Comprehensive Blueprint for Apache Flink
Building a modern Application with DataFrames
Ad

Similar to Data Analysis with Apache Flink (Hadoop Summit, 2015) (20)

PPTX
January 2016 Flink Community Update & Roadmap 2016
PPTX
Apache Flink Training: System Overview
PPTX
Apache Flink Overview at SF Spark and Friends
PPTX
Why apache Flink is the 4G of Big Data Analytics Frameworks
PPTX
Apache Flink Meetup Munich (November 2015): Flink Overview, Architecture, Int...
PDF
Computing recommendations at extreme scale with Apache Flink @Buzzwords 2015
PPTX
Apache Flink Deep Dive
PDF
Metadata and Provenance for ML Pipelines with Hopsworks
PPTX
January 2015 HUG: Apache Flink: Fast and reliable large-scale data processing
PPTX
Introduction to Apache Flink
PPTX
Overview of VS2010 and .NET 4.0
PPTX
Overview of Apache Flink: Next-Gen Big Data Analytics Framework
PDF
Tapping into Scientific Data with Hadoop and Flink
PDF
Linaro Connect 2016 (BKK16) - Introduction to LISA
PDF
Apache Flink internals
PPTX
Why and how to leverage the power and simplicity of SQL on Apache Flink
PPTX
Apache flink
PPTX
Real-time Stream Processing with Apache Flink
PDF
Apache Flink Deep Dive
PDF
Building and deploying LLM applications with Apache Airflow
January 2016 Flink Community Update & Roadmap 2016
Apache Flink Training: System Overview
Apache Flink Overview at SF Spark and Friends
Why apache Flink is the 4G of Big Data Analytics Frameworks
Apache Flink Meetup Munich (November 2015): Flink Overview, Architecture, Int...
Computing recommendations at extreme scale with Apache Flink @Buzzwords 2015
Apache Flink Deep Dive
Metadata and Provenance for ML Pipelines with Hopsworks
January 2015 HUG: Apache Flink: Fast and reliable large-scale data processing
Introduction to Apache Flink
Overview of VS2010 and .NET 4.0
Overview of Apache Flink: Next-Gen Big Data Analytics Framework
Tapping into Scientific Data with Hadoop and Flink
Linaro Connect 2016 (BKK16) - Introduction to LISA
Apache Flink internals
Why and how to leverage the power and simplicity of SQL on Apache Flink
Apache flink
Real-time Stream Processing with Apache Flink
Apache Flink Deep Dive
Building and deploying LLM applications with Apache Airflow
Ad

More from Aljoscha Krettek (16)

PPTX
Apache Flink(tm) - A Next-Generation Stream Processor
PPTX
Talk Python To Me: Stream Processing in your favourite Language with Beam on ...
PPTX
The Evolution of (Open Source) Data Processing
PPTX
Apache Flink and what it is used for
PPTX
The Past, Present, and Future of Apache Flink®
PPTX
(Past), Present, and Future of Apache Flink
PPTX
Python Streaming Pipelines with Beam on Flink
PPTX
The Past, Present, and Future of Apache Flink
PPTX
Robust stream processing with Apache Flink
PDF
Unified stateful big data processing in Apache Beam (incubating)
PPTX
Stream processing for the practitioner: Blueprints for common stream processi...
PPTX
Advanced Flink Training - Design patterns for streaming applications
PPTX
Apache Flink - A Stream Processing Engine
PPTX
Adventures in Timespace - How Apache Flink Handles Time and Windows
PPTX
Flink 0.10 - Upcoming Features
PPTX
Apache Flink Hands-On
Apache Flink(tm) - A Next-Generation Stream Processor
Talk Python To Me: Stream Processing in your favourite Language with Beam on ...
The Evolution of (Open Source) Data Processing
Apache Flink and what it is used for
The Past, Present, and Future of Apache Flink®
(Past), Present, and Future of Apache Flink
Python Streaming Pipelines with Beam on Flink
The Past, Present, and Future of Apache Flink
Robust stream processing with Apache Flink
Unified stateful big data processing in Apache Beam (incubating)
Stream processing for the practitioner: Blueprints for common stream processi...
Advanced Flink Training - Design patterns for streaming applications
Apache Flink - A Stream Processing Engine
Adventures in Timespace - How Apache Flink Handles Time and Windows
Flink 0.10 - Upcoming Features
Apache Flink Hands-On

Recently uploaded (20)

PDF
Mega Projects Data Mega Projects Data
PDF
Galatica Smart Energy Infrastructure Startup Pitch Deck
PPTX
1_Introduction to advance data techniques.pptx
PDF
Fluorescence-microscope_Botany_detailed content
PPTX
mbdjdhjjodule 5-1 rhfhhfjtjjhafbrhfnfbbfnb
PPTX
Microsoft-Fabric-Unifying-Analytics-for-the-Modern-Enterprise Solution.pptx
PPT
Reliability_Chapter_ presentation 1221.5784
PPTX
Data_Analytics_and_PowerBI_Presentation.pptx
PDF
168300704-gasification-ppt.pdfhghhhsjsjhsuxush
PPTX
Business Ppt On Nestle.pptx huunnnhhgfvu
PPTX
iec ppt-1 pptx icmr ppt on rehabilitation.pptx
PPTX
Introduction-to-Cloud-ComputingFinal.pptx
PPTX
Introduction to Knowledge Engineering Part 1
PPTX
Supervised vs unsupervised machine learning algorithms
PDF
Business Analytics and business intelligence.pdf
PPTX
Introduction to machine learning and Linear Models
PPTX
STUDY DESIGN details- Lt Col Maksud (21).pptx
PDF
Clinical guidelines as a resource for EBP(1).pdf
PPTX
IBA_Chapter_11_Slides_Final_Accessible.pptx
PDF
22.Patil - Early prediction of Alzheimer’s disease using convolutional neural...
Mega Projects Data Mega Projects Data
Galatica Smart Energy Infrastructure Startup Pitch Deck
1_Introduction to advance data techniques.pptx
Fluorescence-microscope_Botany_detailed content
mbdjdhjjodule 5-1 rhfhhfjtjjhafbrhfnfbbfnb
Microsoft-Fabric-Unifying-Analytics-for-the-Modern-Enterprise Solution.pptx
Reliability_Chapter_ presentation 1221.5784
Data_Analytics_and_PowerBI_Presentation.pptx
168300704-gasification-ppt.pdfhghhhsjsjhsuxush
Business Ppt On Nestle.pptx huunnnhhgfvu
iec ppt-1 pptx icmr ppt on rehabilitation.pptx
Introduction-to-Cloud-ComputingFinal.pptx
Introduction to Knowledge Engineering Part 1
Supervised vs unsupervised machine learning algorithms
Business Analytics and business intelligence.pdf
Introduction to machine learning and Linear Models
STUDY DESIGN details- Lt Col Maksud (21).pptx
Clinical guidelines as a resource for EBP(1).pdf
IBA_Chapter_11_Slides_Final_Accessible.pptx
22.Patil - Early prediction of Alzheimer’s disease using convolutional neural...

Data Analysis with Apache Flink (Hadoop Summit, 2015)

  • 1. Aljoscha Krettek / Till Rohrmann Flink committers Co-founders @ data Artisans [email protected] / [email protected] Data Analysis With Apache Flink
  • 2. What is Apache Flink? 1 Functional API Relational API Graph API Machine Learning … Iterative Dataflow Engine
  • 3. Apache Flink Stack 2 Python Gelly Table FlinkML SAMOA Batch Optimizer DataSet (Java/Scala) DataStream (Java/Scala) Stream Builder Hadoop M/R Distributed Runtime Local Remote Yarn Tez Embedded Dataflow Dataflow *current Flink master + few PRs Table
  • 4. Example Use Case: Log Analysis 3
  • 5. What Seems to be the Problem?  Collect clicks from a webserver log  Find interesting URLs  Combine with user data 4 Web server log user data base Interesting User Data Extract Clicks Combine Massage
  • 6. The Execution Environment  Entry point for all Flink programs  Creates DataSets from data sources 5 ExecutionEnvironment env = ExecutionEnvironment.getExecutionEnvironment();
  • 7. Getting at Those Clicks 6 DataSet<String> log = env.readTextFile("hdfs:///log"); DataSet<Tuple2<String, Integer>> clicks = log.flatMap( (String line, Collector<Tuple2<String, Integer>> out) -> String[] parts = in.split("*magic regex*"); if (isClick(parts)) { out.collect(new Tuple2<>(parts[1],Integer.parseInt(parts[2]))); } } ) post /foo/bar… 313 get /data/pic.jpg 128 post /bar/baz… 128 post /hello/there… 42
  • 8. The Table Environment  Environment for dealing with Tables  Converts between DataSet and Table 7 TableEnvironment tableEnv = new TableEnvironment();
  • 9. Counting those Clicks 8 Table clicksTable = tableEnv.toTable(clicks, "url, userId"); Table urlClickCounts = clicksTable .groupBy("url, userId") .select("url, userId, url.count as count");
  • 10. Getting the User Information 9 Table userInfo = tableEnv.toTable(…, "name, id, …"); Table resultTable = urlClickCounts.join(userInfo) .where("userId = id && count > 10") .select("url, count, name, …");
  • 11. The Final Step 10 class Result { public String url; public int count; public String name; … } DataSet<Result> set = tableEnv.toSet(resultTable, Result.class); DataSet<Result> result = set.groupBy("url").reduceGroup(new ComplexOperation()); result.writeAsText("hdfs:///result"); env.execute();
  • 12. API in a Nutshell  Element-wise • map, flatMap, filter  Group-wise • groupBy, reduce, reduceGroup, combineGroup, mapPartition, aggregate, distinct  Binary • join, coGroup, union, cross  Iterations • iterate, iterateDelta  Physical re-organization • rebalance, partitionByHash, sortPartition  Streaming • window, windowMap, coMap, ... 11
  • 13. What happens under the hood? 12
  • 14. From Program to Dataflow 13 Flink Program Dataflow Plan Optimized Plan
  • 17. Going Further  Log analysis result: Which user visited how often which web site  Which other websites might they like?  Recommendation by collaborative filtering 16
  • 18. Collaborative Filtering  Recommend items based on users with similar preferences  Latent factor models capture underlying characteristics of items and preferences of user  Predicted preference: 17 ˆru,i = xu T yi
  • 19. Matrix Factorization 18 minX,Y ru,i - xu T yi( ) 2 +l nu xu 2 + ni yi 2 i å u å æ è ç ö ø ÷ ru,i¹0 å R » XT Y
  • 20. Alternating least squares  Iterative approximation 1. Fix X and optimize Y 2. Fix Y and optimize X  Communication and computation intensive 19 R=YX x R=YX x
  • 21. Matrix Factorization Pipeline 20 val featureExtractor = HashingFT() val factorizer = ALS() val pipeline = featureExtractor.chain(factorizer) val clickstreamDS = env.readCsvFile[(String, String, Int)](clickStreamData) val parameters = ParameterMap() .add(HashingFT.NumFeatures, 1000000) .add(ALS.Iterations, 10) .add(ALS.NumFactors, 50) .add(ALS.Lambda, 1.5) val factorization = pipeline.fit(clickstreamDS, parameters) Clickstream Data Hashing Feature Extractor ALS Matrix factorization
  • 22. Does it Scale? 21 • 40 node GCE cluster, highmem-8 • 10 ALS iteration with 50 latent factors • Based on Spark MLlib’s implementation Scale of Netflix or Spotify
  • 23. What Else Can You Do?  Classification using SVMs • Conversion goal prediction  Clustering • Visitor segmentation  Multiple linear regression • Visitor prediction 22
  • 25. What Have You Seen?  Flink is a general-purpose analytics system  Highly expressive Table API  Advanced analysis with Flink’s machine learning library  Jobs are executed on powerful distributed dataflow engine 24
  • 26. Flink Roadmap for 2015  Additions to Machine Learning library  Streaming Machine Learning  Support for interactive programs  Optimization for Table API queries  SQL on top of Table API 25
  • 27. 26
  • 30. WordCount in DataSet API 29 case class Word (word: String, frequency: Int) val env = ExecutionEnvironment.getExecutionEnvironment() val lines = env.readTextFile(...) lines .flatMap {line => line.split(" ").map(word => Word(word,1))} .groupBy("word").sum("frequency”) .print() env.execute() Java and Scala APIs offer the same functionality.
  • 31. Log Analysis Code 30 ExecutionEnvironment env = TableEnvironment tableEnv = new TableEnvironment(); TableEnvironment tableEnv = new TableEnvironment(); DataSet<String> log = env.readTextFile("hdfs:///log"); DataSet<Tuple2<String, Integer>> clicks = log.flatMap( new FlatMapFunction<String, Tuple2<String, Integer>>() { public void flatMap(String in, Collector<Tuple2<>> out) { String[] parts = in.split("*magic regex*"); if (parts[0].equals("click")) { out.collect(new Tuple2<>(parts[1], Integer.parseInt(parts[4]))); } } }); Table clicksTable = tableEnv.toTable(clicks, "url, userId"); Table urlClickCounts = clicksTable .groupBy("url, userId") .select("url, userId, url.count as count"); Table userInfo = tableEnv.toTable(…, "name, id, …"); Table resultTable = urlClickCounts.join(userInfo) .where("userId = id && count > 10") .select("url, count, name, …"); DataSet<Result> result = tableEnv.toSet(resultTable, Result.class); result.writeAsText("hdfs:///result"); env.execute();
  • 32. Log Analysis Dataflow Graph 31 Log Map AggUsers Join Result Group Log Map AggUsers Join combine partition sort merge sort Result Group partition sort
  • 33. Pipelined Execution 32 Only 1 Stage (depending on join strategy) Data transfer in-memory and disk if needed Note: Intermediate DataSets are not necessarily “created”!

Editor's Notes

  • #3: Engine is Batch or Streaming
  • #7: Works also with Scala API
  • #15: Visualization of program to plan to optimized plan to JobGraph What you see is not what you get.
  • #16: Pipelined Execution
  • #27: Algorithms: Decision trees and random forests PCA CCA More transformers: Scaler, Centering, Whitening Feature extractor Count vectorizer Outlier detector Support for cross validation Improved pipeline support Automatic pre- and post-processing pipeline SAMOA support Pending PR which will be merged with the upcoming milestone release Integration with Zeppelin, a IPython Notebook-like web interface for explorative data analysis
  • #34: Visualization of JobGraph to ExecutionGraph