SlideShare a Scribd company logo
ECLAIRJS = NODE.JS +
APACHE SPARK
David Fallside
IBM
Why EclairJS?
• Digital business looking to improve customer interactions, capture “perishable”
insights (Forrester), use new data sources
• Today’s interactive & user-facing applications often developed in JavaScript
using Node.js
• npm provides largest (Node.js) package repo (www.modulecounts.com)
• Handles very large numbers of simultaneous requests
• Compute-intensive workloads handed off to back-end engines
• Apache Spark as a back-end engine
• Scalable, static & streaming data, Spark SQL, ML analytics, graph engine
• But no Spark API for Node.js/JavaScript, hence EclairJS
• So let’s look at an EclairJS application …
demo
Program Flow
Kafka
Spark SQL TempTable
Spark
Node.js
Radial Graph UI Airport
Selection
Flight Data
Word Count
var spark = require(‘eclairjs');
var sc = new spark.SparkContext("local[*]", "foo");
var file = __dirname + '/dream.txt';
var rdd = sc.textFile(file);
var rdd2 = rdd.flatMap(function(sentence) {
return sentence.split(" ");
});
var rdd3 = rdd2.filter(function(word) {
return word.trim().length > 0;
});
var rdd4 = rdd3.mapToPair(function(word, Tuple) {
return new Tuple(word.toLowerCase(), 1);
}, [spark.Tuple]);
var rdd5 = rdd4.reduceByKey(function(value1, value2) {
return value1 + value2;
});
var rdd6 = rdd5.mapToPair(function(tuple, Tuple) {
return new Tuple(tuple[1], tuple[0]);
}, [spark.Tuple]);
var rdd7 = rdd6.sortByKey(false);
rdd7.take(10).then(function(val) {
console.log("Success:", val);
});
Spark Operator
EclairJS Stack
Node.js
Application
EclairJS-Node
Desktop, etc Web Browser
Cluster/Driver Toree*
EclairJS-Nashorn
Java, Nashorn
Spark Context
EclairJS-Nashorn
Java, Nashorn
Spark Executor
Jupyter Gateway Jupyter NB Server
Cloud/IT
Cluster/Worker
*Toree in Apache Incubator
Notebooks
• Notebooks designed for (data) scientists, widely used for data
cleaning and transformation, numerical simulation, statistical
modeling, etc
• Appear in browser as cells, may contain live code,
visualizations, formatted text, widgets, etc
• Jupyter notebooks have pluggable kernel architecture to
enable different languages (jupyter.org)
• EclairJS provides JavaScript kernel so data engineers and web
developers can try-out code and work with data in notebooks
Examplebasedonspark-movie-lens.Copyright2016JoseADianes
In Closing
• EclairJS for web application development in Node.js and
JavaScript
• For Data Engineers with JavaScript in Notebooks
• Project under active development in Github, eclairjs.org
• Examples, documentation, getting-started, etc
• EclairJS Node and EclairJS Nashorn
• Open source, Apache v2 license
• Looking for collaborators!
THANK YOU.
eclairjs.org
fallside at us.ibm.com

More Related Content

PPTX
Real Time Data Processing With Spark Streaming, Node.js and Redis with Visual...
PDF
Huawei Advanced Data Science With Spark Streaming
PDF
Spark Summit EU talk by Shay Nativ and Dvir Volk
PDF
Recent Developments In SparkR For Advanced Analytics
PDF
A Journey into Databricks' Pipelines: Journey and Lessons Learned
PDF
Spark Summit EU talk by Kent Buenaventura and Willaim Lau
PDF
Spark Summit EU talk by Rolf Jagerman
PDF
Apache Spark MLlib 2.0 Preview: Data Science and Production
Real Time Data Processing With Spark Streaming, Node.js and Redis with Visual...
Huawei Advanced Data Science With Spark Streaming
Spark Summit EU talk by Shay Nativ and Dvir Volk
Recent Developments In SparkR For Advanced Analytics
A Journey into Databricks' Pipelines: Journey and Lessons Learned
Spark Summit EU talk by Kent Buenaventura and Willaim Lau
Spark Summit EU talk by Rolf Jagerman
Apache Spark MLlib 2.0 Preview: Data Science and Production

What's hot (20)

PDF
Sparking up Data Engineering: Spark Summit East talk by Rohan Sharma
PDF
Spark Summit EU talk by Heiko Korndorf
PDF
Apache Spark avec NodeJS ? Oui, c'est possible avec EclairJS !
PDF
Spark Summit EU talk by Nick Pentreath
PDF
Scaling Apache Spark MLlib to Billions of Parameters: Spark Summit East talk ...
PDF
Building Realtime Data Pipelines with Kafka Connect and Spark Streaming
PDF
Spark Summit EU talk by Sol Ackerman and Franklyn D'souza
PDF
Spark Summit EU talk by Elena Lazovik
PDF
Using SparkML to Power a DSaaS (Data Science as a Service) with Kiran Muglurm...
PDF
Resource-Efficient Deep Learning Model Selection on Apache Spark
PDF
Spark Summit EU talk by Bas Geerdink
PDF
Spark Summit EU talk by Jakub Hava
PDF
Operational Tips for Deploying Spark
PDF
Operational Tips For Deploying Apache Spark
PDF
NigthClazz Spark - Machine Learning / Introduction à Spark et Zeppelin
PDF
Spark Summit EU talk by Luca Canali
PDF
Spark Summit EU talk by Sital Kedia
PDF
Spark Summit EU talk by Mikhail Semeniuk Hollin Wilkins
PDF
Redis + Apache Spark = Swiss Army Knife Meets Kitchen Sink
PDF
From Python Scikit-learn to Scala Apache Spark—The Road to Uncovering Botnets...
Sparking up Data Engineering: Spark Summit East talk by Rohan Sharma
Spark Summit EU talk by Heiko Korndorf
Apache Spark avec NodeJS ? Oui, c'est possible avec EclairJS !
Spark Summit EU talk by Nick Pentreath
Scaling Apache Spark MLlib to Billions of Parameters: Spark Summit East talk ...
Building Realtime Data Pipelines with Kafka Connect and Spark Streaming
Spark Summit EU talk by Sol Ackerman and Franklyn D'souza
Spark Summit EU talk by Elena Lazovik
Using SparkML to Power a DSaaS (Data Science as a Service) with Kiran Muglurm...
Resource-Efficient Deep Learning Model Selection on Apache Spark
Spark Summit EU talk by Bas Geerdink
Spark Summit EU talk by Jakub Hava
Operational Tips for Deploying Spark
Operational Tips For Deploying Apache Spark
NigthClazz Spark - Machine Learning / Introduction à Spark et Zeppelin
Spark Summit EU talk by Luca Canali
Spark Summit EU talk by Sital Kedia
Spark Summit EU talk by Mikhail Semeniuk Hollin Wilkins
Redis + Apache Spark = Swiss Army Knife Meets Kitchen Sink
From Python Scikit-learn to Scala Apache Spark—The Road to Uncovering Botnets...
Ad

Similar to EclairJS = Node.Js + Apache Spark (20)

PDF
Apache spark linkedin
PPTX
Apache Spark Fundamentals
PPTX
4Introduction+to+Spark.pptx sdfsdfsdfsdfsdf
PDF
Using pySpark with Google Colab & Spark 3.0 preview
PPTX
APACHE SPARK.pptx
PPTX
Intro to Big Data Analytics using Apache Spark and Apache Zeppelin
PDF
A Deep Dive Into Spark
PPTX
Sviluppare applicazioni nell'era dei "Big Data" con Scala e Spark - Mario Car...
PDF
Started with-apache-spark
PPTX
Spark core
PPTX
Sviluppare applicazioni nell'era dei "Big Data" con Scala e Spark - Mario Car...
PPTX
Spark.pptx to knowledge gaining in wdm days ago
PDF
Introduction to Apache Spark / PUT 06.2014
PDF
Apache Spark Tutorial
PDF
Apache Spark in Depth: Core Concepts, Architecture & Internals
PPTX
Boosting big data with apache spark
PPT
Big_data_analytics_NoSql_Module-4_Session
PDF
Apache Spark PDF
PPT
An Introduction to Apache spark with scala
PDF
Fast Data Analytics with Spark and Python
Apache spark linkedin
Apache Spark Fundamentals
4Introduction+to+Spark.pptx sdfsdfsdfsdfsdf
Using pySpark with Google Colab & Spark 3.0 preview
APACHE SPARK.pptx
Intro to Big Data Analytics using Apache Spark and Apache Zeppelin
A Deep Dive Into Spark
Sviluppare applicazioni nell'era dei "Big Data" con Scala e Spark - Mario Car...
Started with-apache-spark
Spark core
Sviluppare applicazioni nell'era dei "Big Data" con Scala e Spark - Mario Car...
Spark.pptx to knowledge gaining in wdm days ago
Introduction to Apache Spark / PUT 06.2014
Apache Spark Tutorial
Apache Spark in Depth: Core Concepts, Architecture & Internals
Boosting big data with apache spark
Big_data_analytics_NoSql_Module-4_Session
Apache Spark PDF
An Introduction to Apache spark with scala
Fast Data Analytics with Spark and Python
Ad

More from Jen Aman (20)

PPTX
Deep Learning and Streaming in Apache Spark 2.x with Matei Zaharia
PDF
Snorkel: Dark Data and Machine Learning with Christopher Ré
PDF
Deep Learning on Apache® Spark™: Workflows and Best Practices
PDF
Deep Learning on Apache® Spark™ : Workflows and Best Practices
PDF
RISELab:Enabling Intelligent Real-Time Decisions
PDF
Spatial Analysis On Histological Images Using Spark
PDF
Massive Simulations In Spark: Distributed Monte Carlo For Global Health Forec...
PDF
A Graph-Based Method For Cross-Entity Threat Detection
PDF
Yggdrasil: Faster Decision Trees Using Column Partitioning In Spark
PDF
Time-Evolving Graph Processing On Commodity Clusters
PDF
Deploying Accelerators At Datacenter Scale Using Spark
PDF
Re-Architecting Spark For Performance Understandability
PDF
Re-Architecting Spark For Performance Understandability
PDF
Low Latency Execution For Apache Spark
PDF
Efficient State Management With Spark 2.0 And Scale-Out Databases
PDF
Livy: A REST Web Service For Apache Spark
PDF
GPU Computing With Apache Spark And Python
PDF
Spark And Cassandra: 2 Fast, 2 Furious
PDF
Building Custom Machine Learning Algorithms With Apache SystemML
PDF
Spark on Mesos
Deep Learning and Streaming in Apache Spark 2.x with Matei Zaharia
Snorkel: Dark Data and Machine Learning with Christopher Ré
Deep Learning on Apache® Spark™: Workflows and Best Practices
Deep Learning on Apache® Spark™ : Workflows and Best Practices
RISELab:Enabling Intelligent Real-Time Decisions
Spatial Analysis On Histological Images Using Spark
Massive Simulations In Spark: Distributed Monte Carlo For Global Health Forec...
A Graph-Based Method For Cross-Entity Threat Detection
Yggdrasil: Faster Decision Trees Using Column Partitioning In Spark
Time-Evolving Graph Processing On Commodity Clusters
Deploying Accelerators At Datacenter Scale Using Spark
Re-Architecting Spark For Performance Understandability
Re-Architecting Spark For Performance Understandability
Low Latency Execution For Apache Spark
Efficient State Management With Spark 2.0 And Scale-Out Databases
Livy: A REST Web Service For Apache Spark
GPU Computing With Apache Spark And Python
Spark And Cassandra: 2 Fast, 2 Furious
Building Custom Machine Learning Algorithms With Apache SystemML
Spark on Mesos

Recently uploaded (20)

PDF
Taxes Foundatisdcsdcsdon Certificate.pdf
PPTX
ALIMENTARY AND BILIARY CONDITIONS 3-1.pptx
PPT
Reliability_Chapter_ presentation 1221.5784
PPTX
IB Computer Science - Internal Assessment.pptx
PPTX
DISORDERS OF THE LIVER, GALLBLADDER AND PANCREASE (1).pptx
PPTX
Introduction to Firewall Analytics - Interfirewall and Transfirewall.pptx
PDF
Mega Projects Data Mega Projects Data
PPTX
The THESIS FINAL-DEFENSE-PRESENTATION.pptx
PPTX
Business Ppt On Nestle.pptx huunnnhhgfvu
PPTX
Introduction to Knowledge Engineering Part 1
PPT
Chapter 3 METAL JOINING.pptnnnnnnnnnnnnn
PPTX
Data_Analytics_and_PowerBI_Presentation.pptx
PPTX
Introduction-to-Cloud-ComputingFinal.pptx
PDF
Data Science Trends & Career Guide---ppt
PDF
Clinical guidelines as a resource for EBP(1).pdf
PDF
Galatica Smart Energy Infrastructure Startup Pitch Deck
PDF
BF and FI - Blockchain, fintech and Financial Innovation Lesson 2.pdf
PPTX
Supervised vs unsupervised machine learning algorithms
PPT
Quality review (1)_presentation of this 21
Taxes Foundatisdcsdcsdon Certificate.pdf
ALIMENTARY AND BILIARY CONDITIONS 3-1.pptx
Reliability_Chapter_ presentation 1221.5784
IB Computer Science - Internal Assessment.pptx
DISORDERS OF THE LIVER, GALLBLADDER AND PANCREASE (1).pptx
Introduction to Firewall Analytics - Interfirewall and Transfirewall.pptx
Mega Projects Data Mega Projects Data
The THESIS FINAL-DEFENSE-PRESENTATION.pptx
Business Ppt On Nestle.pptx huunnnhhgfvu
Introduction to Knowledge Engineering Part 1
Chapter 3 METAL JOINING.pptnnnnnnnnnnnnn
Data_Analytics_and_PowerBI_Presentation.pptx
Introduction-to-Cloud-ComputingFinal.pptx
Data Science Trends & Career Guide---ppt
Clinical guidelines as a resource for EBP(1).pdf
Galatica Smart Energy Infrastructure Startup Pitch Deck
BF and FI - Blockchain, fintech and Financial Innovation Lesson 2.pdf
Supervised vs unsupervised machine learning algorithms
Quality review (1)_presentation of this 21

EclairJS = Node.Js + Apache Spark

  • 1. ECLAIRJS = NODE.JS + APACHE SPARK David Fallside IBM
  • 2. Why EclairJS? • Digital business looking to improve customer interactions, capture “perishable” insights (Forrester), use new data sources • Today’s interactive & user-facing applications often developed in JavaScript using Node.js • npm provides largest (Node.js) package repo (www.modulecounts.com) • Handles very large numbers of simultaneous requests • Compute-intensive workloads handed off to back-end engines • Apache Spark as a back-end engine • Scalable, static & streaming data, Spark SQL, ML analytics, graph engine • But no Spark API for Node.js/JavaScript, hence EclairJS • So let’s look at an EclairJS application …
  • 4. Program Flow Kafka Spark SQL TempTable Spark Node.js Radial Graph UI Airport Selection Flight Data
  • 5. Word Count var spark = require(‘eclairjs'); var sc = new spark.SparkContext("local[*]", "foo"); var file = __dirname + '/dream.txt'; var rdd = sc.textFile(file); var rdd2 = rdd.flatMap(function(sentence) { return sentence.split(" "); }); var rdd3 = rdd2.filter(function(word) { return word.trim().length > 0; }); var rdd4 = rdd3.mapToPair(function(word, Tuple) { return new Tuple(word.toLowerCase(), 1); }, [spark.Tuple]); var rdd5 = rdd4.reduceByKey(function(value1, value2) { return value1 + value2; }); var rdd6 = rdd5.mapToPair(function(tuple, Tuple) { return new Tuple(tuple[1], tuple[0]); }, [spark.Tuple]); var rdd7 = rdd6.sortByKey(false); rdd7.take(10).then(function(val) { console.log("Success:", val); }); Spark Operator
  • 6. EclairJS Stack Node.js Application EclairJS-Node Desktop, etc Web Browser Cluster/Driver Toree* EclairJS-Nashorn Java, Nashorn Spark Context EclairJS-Nashorn Java, Nashorn Spark Executor Jupyter Gateway Jupyter NB Server Cloud/IT Cluster/Worker *Toree in Apache Incubator
  • 7. Notebooks • Notebooks designed for (data) scientists, widely used for data cleaning and transformation, numerical simulation, statistical modeling, etc • Appear in browser as cells, may contain live code, visualizations, formatted text, widgets, etc • Jupyter notebooks have pluggable kernel architecture to enable different languages (jupyter.org) • EclairJS provides JavaScript kernel so data engineers and web developers can try-out code and work with data in notebooks
  • 9. In Closing • EclairJS for web application development in Node.js and JavaScript • For Data Engineers with JavaScript in Notebooks • Project under active development in Github, eclairjs.org • Examples, documentation, getting-started, etc • EclairJS Node and EclairJS Nashorn • Open source, Apache v2 license • Looking for collaborators!