SlideShare a Scribd company logo
Apache Spark:
The Analytics Operating System
Anjul Bhambhri
Vice President, IBM Big Data Engineering
Deep Blue SQL RISC
DNA Transistor Magnetic Tape Linux PC
Fortran DRAM Mainframe Watson
Floppy Disk UPC
Punch Card
IBM: 100 years of (supporting) innovation
The
Analytics
Operating System
Apache Spark
Enhance it! Offer it!
Leverage it!
Spark Technology
Center @ SF
On-prem and on
the cloud
Inside our products
At IBM, We Love Spark!
IBM Cloud Data Services
now featuring Spark is
open for data
IBM is Building on Apache Spark
• IBM Analytics
• IBM Commerce
• IBM Watson
• IBM Research
• IBM Cloud
Quarks from IBM
Announced Feb 2016
• Open-source platform for
building IoT applications
• Light-weight & embeddable
• Integrates with Spark
• Lambda Architecture and Spark enable efficient batch and streaming analytics
• Visualization at every step of data discovery enables better self service
The Weather Company clusters running hot:
 ~30 billion API requests per day
 ~120 million active mobile users
 #3 most active mobile user base
 Billions of events per day (1.3M/sec)
 ~360 PB of traffic daily
 Need to keep data forever
The use case:
Efficient batch + streaming analysis
Self-serve data science
BI / visualization tool support
An IBM Business
Spark for daily weather
Spark in Health Care
Health Care Data Lakes
 Improve how healthcare is delivered
 Collect and combine data from dozens of sources
 Clinical, Operational, Financial
 Inside and outside your enterprise
Benefits
 Better medical outcomes for patients
 Control cost and improve quality
SystemML on Spark
 Predictive Risk Modeling
 Right patient intervention relating to adverse health events
Spark in Telecom
The challenge:
 Improve customer satisfaction rates
 Multiple channels for customer interactions
 Very large data volumes
The need:
 Create a 360 degree view of a customer
 Stitch all interactions across channels –
“Customer Experience Journey”
 Classify interaction sentiment and take
necessary actions
• Spark Streaming brings all the data together
• Spark Core is used to process and transform text and voice data
• Spark MLLib algorithms stitch interactions on a journey and score “sentiment”
• Spark SQL drives interactive queries via visual dashboards
PUB / SUB
MQTT / WebSockets / Flume / Kafka
` ` `
Journey
Dashboards
Interaction & Journey Data
Voice &
Text Dat
a
Apache Spark:
The Analytics Operating System
THANK YOU!

More Related Content

PPTX
Spark Summit Keynote by Shaun Connolly
PPTX
Spark Summit Keynote by Suren Nathan
PPTX
Spark Summit Keynote by Seshu Adunuthula
PDF
Building a Just in Time Data Warehouse by Dan Morris and Jason Pohl
PDF
Spark and the Enterprise by Tony Baer
PDF
2016 Spark Summit East Keynote: Ali Ghodsi and Databricks Community Edition demo
PDF
Mastering Your Customer Data on Apache Spark by Elliott Cordo
PDF
Spark Usage in Enterprise Business Operations
Spark Summit Keynote by Shaun Connolly
Spark Summit Keynote by Suren Nathan
Spark Summit Keynote by Seshu Adunuthula
Building a Just in Time Data Warehouse by Dan Morris and Jason Pohl
Spark and the Enterprise by Tony Baer
2016 Spark Summit East Keynote: Ali Ghodsi and Databricks Community Edition demo
Mastering Your Customer Data on Apache Spark by Elliott Cordo
Spark Usage in Enterprise Business Operations

What's hot (20)

PPTX
Spark Summit presentation by Ken Tsai
PPTX
Driving the On-Demand Economy with Spark and Predictive Analytics
PPTX
Getting It Right Exactly Once: Principles for Streaming Architectures
PDF
Analysing data analytics use cases to understand big data platform
PDF
Operationalizing Machine Learning at Scale at Starbucks
PPTX
Virtualizing Analytics with Apache Spark: Keynote by Arsalan Tavakoli
PPTX
Real-Time, Geospatial, Maps by Neil Dahlke
PDF
Unlocking Value in Device Data Using Spark: Spark Summit East talk by John La...
PDF
Winning the On-Demand Economy with Spark and Predictive Analytics
PPTX
Real-Time Geospatial Intelligence at Scale
PPTX
Virtualizing Analytics with Apache Spark: Keynote by Arsalan Tavakoli
PPTX
In-Memory Computing Webcast. Market Predictions 2017
PDF
The Fast Path to Building Operational Applications with Spark
PPTX
From Spark to Ignition: Fueling Your Business on Real-Time Analytics
PDF
How to Rebuild an End-to-End ML Pipeline with Databricks and Upwork with Than...
PPTX
Real-Time Analytics with MemSQL and Spark
PDF
Scaling Production Machine Learning Pipelines with Databricks
PPTX
Snaplogic Live: Big Data in Motion
PDF
Life is but a Stream
PPTX
CTO View: Driving the On-Demand Economy with Predictive Analytics
Spark Summit presentation by Ken Tsai
Driving the On-Demand Economy with Spark and Predictive Analytics
Getting It Right Exactly Once: Principles for Streaming Architectures
Analysing data analytics use cases to understand big data platform
Operationalizing Machine Learning at Scale at Starbucks
Virtualizing Analytics with Apache Spark: Keynote by Arsalan Tavakoli
Real-Time, Geospatial, Maps by Neil Dahlke
Unlocking Value in Device Data Using Spark: Spark Summit East talk by John La...
Winning the On-Demand Economy with Spark and Predictive Analytics
Real-Time Geospatial Intelligence at Scale
Virtualizing Analytics with Apache Spark: Keynote by Arsalan Tavakoli
In-Memory Computing Webcast. Market Predictions 2017
The Fast Path to Building Operational Applications with Spark
From Spark to Ignition: Fueling Your Business on Real-Time Analytics
How to Rebuild an End-to-End ML Pipeline with Databricks and Upwork with Than...
Real-Time Analytics with MemSQL and Spark
Scaling Production Machine Learning Pipelines with Databricks
Snaplogic Live: Big Data in Motion
Life is but a Stream
CTO View: Driving the On-Demand Economy with Predictive Analytics
Ad

Viewers also liked (20)

PDF
Spark and the Future of Advanced Analytics by Thomas Dinsmore
PDF
Spark Summit Keynote with Ken Tsai
PPTX
Improvements in Hadoop Security
PPTX
How Apache Spark Is Helping Tame the Wild West of Wi-Fi
PDF
Big Data Everywhere Chicago: Apache Spark Plus Many Other Frameworks -- How S...
PDF
Unified Big Data Processing with Apache Spark
PDF
หนังสือภาษาไทย Spark Internal
PPTX
BDM26: Spark Summit 2014 Debriefing
PPTX
BDM32: AdamCloud Project - Part II
PDF
QCon2016--Drive Best Spark Performance on AI
PDF
Diagnosing Open-Source Community Health with Spark-(William Benton, Red Hat)
PDF
Vital.AI Creating Intelligent Apps
PDF
Lambda at Weather Scale by Robbie Strickland
KEY
Geo Analytics Tutorial - Where 2.0 2011
PDF
Real-Time Anomoly Detection with Spark MLib, Akka and Cassandra by Natalino Busa
PDF
Spark Summit EU talk by Brij Bhushan Ravat
PDF
Dynamic Community Detection for Large-scale e-Commerce data with Spark Stream...
PPTX
Apache Ranger
PDF
Vital AI MetaQL: Queries Across NoSQL, SQL, Sparql, and Spark
PDF
Spark Summit EU talk by Ross Lawley
Spark and the Future of Advanced Analytics by Thomas Dinsmore
Spark Summit Keynote with Ken Tsai
Improvements in Hadoop Security
How Apache Spark Is Helping Tame the Wild West of Wi-Fi
Big Data Everywhere Chicago: Apache Spark Plus Many Other Frameworks -- How S...
Unified Big Data Processing with Apache Spark
หนังสือภาษาไทย Spark Internal
BDM26: Spark Summit 2014 Debriefing
BDM32: AdamCloud Project - Part II
QCon2016--Drive Best Spark Performance on AI
Diagnosing Open-Source Community Health with Spark-(William Benton, Red Hat)
Vital.AI Creating Intelligent Apps
Lambda at Weather Scale by Robbie Strickland
Geo Analytics Tutorial - Where 2.0 2011
Real-Time Anomoly Detection with Spark MLib, Akka and Cassandra by Natalino Busa
Spark Summit EU talk by Brij Bhushan Ravat
Dynamic Community Detection for Large-scale e-Commerce data with Spark Stream...
Apache Ranger
Vital AI MetaQL: Queries Across NoSQL, SQL, Sparql, and Spark
Spark Summit EU talk by Ross Lawley
Ad

Similar to Spark Summit Presentation by Anjul Bhambhri (20)

PDF
Spark Summit EU: IBM Keynote
PDF
Luciano Resende's keynote at Apache big data conference
PDF
Building iot applications with Apache Spark and Apache Bahir
PPTX
IBM Smarter Analytics
PDF
DIY Analytics with Apache Spark
PDF
20150617 spark meetup zagreb
PDF
Started with-apache-spark
PDF
Spark | IBM
PPTX
A short introduction to Spark and its benefits
PDF
Writing Apache Spark and Apache Flink Applications Using Apache Bahir
PPTX
Apache Spark – The New Enterprise Backbone for ETL, Batch Processing and Real...
PDF
NRB - BE MAINFRAME DAY 2017 - Data spark and the data federation
 
PDF
NRB - LUXEMBOURG MAINFRAME DAY 2017 - Data Spark and the Data Federation
 
PDF
IBM and Apache Spark
PDF
Experiences in Delivering Spark as a Service
POTX
Introduction to pyspark new
PPTX
Machine Learning with Apache Spark
PPTX
Introduction to spark
PDF
2016 August POWER Up Your Insights - IBM System Summit Mumbai
PDF
Introduction to the Spark MLLib Toolkit in IBM Streams V4.1
Spark Summit EU: IBM Keynote
Luciano Resende's keynote at Apache big data conference
Building iot applications with Apache Spark and Apache Bahir
IBM Smarter Analytics
DIY Analytics with Apache Spark
20150617 spark meetup zagreb
Started with-apache-spark
Spark | IBM
A short introduction to Spark and its benefits
Writing Apache Spark and Apache Flink Applications Using Apache Bahir
Apache Spark – The New Enterprise Backbone for ETL, Batch Processing and Real...
NRB - BE MAINFRAME DAY 2017 - Data spark and the data federation
 
NRB - LUXEMBOURG MAINFRAME DAY 2017 - Data Spark and the Data Federation
 
IBM and Apache Spark
Experiences in Delivering Spark as a Service
Introduction to pyspark new
Machine Learning with Apache Spark
Introduction to spark
2016 August POWER Up Your Insights - IBM System Summit Mumbai
Introduction to the Spark MLLib Toolkit in IBM Streams V4.1

More from Spark Summit (20)

PDF
FPGA-Based Acceleration Architecture for Spark SQL Qi Xie and Quanfu Wang
PDF
VEGAS: The Missing Matplotlib for Scala/Apache Spark with DB Tsai and Roger M...
PDF
Apache Spark Structured Streaming Helps Smart Manufacturing with Xiaochang Wu
PDF
Improving Traffic Prediction Using Weather Data with Ramya Raghavendra
PDF
A Tale of Two Graph Frameworks on Spark: GraphFrames and Tinkerpop OLAP Artem...
PDF
No More Cumbersomeness: Automatic Predictive Modeling on Apache Spark Marcin ...
PDF
Apache Spark and Tensorflow as a Service with Jim Dowling
PDF
Apache Spark and Tensorflow as a Service with Jim Dowling
PDF
MMLSpark: Lessons from Building a SparkML-Compatible Machine Learning Library...
PDF
Next CERN Accelerator Logging Service with Jakub Wozniak
PDF
Powering a Startup with Apache Spark with Kevin Kim
PDF
Improving Traffic Prediction Using Weather Datawith Ramya Raghavendra
PDF
Hiding Apache Spark Complexity for Fast Prototyping of Big Data Applications—...
PDF
How Nielsen Utilized Databricks for Large-Scale Research and Development with...
PDF
Spline: Apache Spark Lineage not Only for the Banking Industry with Marek Nov...
PDF
Goal Based Data Production with Sim Simeonov
PDF
Preventing Revenue Leakage and Monitoring Distributed Systems with Machine Le...
PDF
Getting Ready to Use Redis with Apache Spark with Dvir Volk
PDF
Deduplication and Author-Disambiguation of Streaming Records via Supervised M...
PDF
MatFast: In-Memory Distributed Matrix Computation Processing and Optimization...
FPGA-Based Acceleration Architecture for Spark SQL Qi Xie and Quanfu Wang
VEGAS: The Missing Matplotlib for Scala/Apache Spark with DB Tsai and Roger M...
Apache Spark Structured Streaming Helps Smart Manufacturing with Xiaochang Wu
Improving Traffic Prediction Using Weather Data with Ramya Raghavendra
A Tale of Two Graph Frameworks on Spark: GraphFrames and Tinkerpop OLAP Artem...
No More Cumbersomeness: Automatic Predictive Modeling on Apache Spark Marcin ...
Apache Spark and Tensorflow as a Service with Jim Dowling
Apache Spark and Tensorflow as a Service with Jim Dowling
MMLSpark: Lessons from Building a SparkML-Compatible Machine Learning Library...
Next CERN Accelerator Logging Service with Jakub Wozniak
Powering a Startup with Apache Spark with Kevin Kim
Improving Traffic Prediction Using Weather Datawith Ramya Raghavendra
Hiding Apache Spark Complexity for Fast Prototyping of Big Data Applications—...
How Nielsen Utilized Databricks for Large-Scale Research and Development with...
Spline: Apache Spark Lineage not Only for the Banking Industry with Marek Nov...
Goal Based Data Production with Sim Simeonov
Preventing Revenue Leakage and Monitoring Distributed Systems with Machine Le...
Getting Ready to Use Redis with Apache Spark with Dvir Volk
Deduplication and Author-Disambiguation of Streaming Records via Supervised M...
MatFast: In-Memory Distributed Matrix Computation Processing and Optimization...

Recently uploaded (20)

PPTX
Leprosy and NLEP programme community medicine
PDF
[EN] Industrial Machine Downtime Prediction
PPTX
The THESIS FINAL-DEFENSE-PRESENTATION.pptx
PDF
Transcultural that can help you someday.
PPTX
modul_python (1).pptx for professional and student
PPTX
Managing Community Partner Relationships
PDF
Galatica Smart Energy Infrastructure Startup Pitch Deck
PDF
22.Patil - Early prediction of Alzheimer’s disease using convolutional neural...
PPTX
Pilar Kemerdekaan dan Identi Bangsa.pptx
PDF
Optimise Shopper Experiences with a Strong Data Estate.pdf
PPTX
climate analysis of Dhaka ,Banglades.pptx
PPTX
Database Infoormation System (DBIS).pptx
PPTX
Microsoft-Fabric-Unifying-Analytics-for-the-Modern-Enterprise Solution.pptx
PDF
REAL ILLUMINATI AGENT IN KAMPALA UGANDA CALL ON+256765750853/0705037305
PDF
Mega Projects Data Mega Projects Data
PPTX
iec ppt-1 pptx icmr ppt on rehabilitation.pptx
PDF
168300704-gasification-ppt.pdfhghhhsjsjhsuxush
PPTX
01_intro xxxxxxxxxxfffffffffffaaaaaaaaaaafg
PPT
Predictive modeling basics in data cleaning process
PPTX
SAP 2 completion done . PRESENTATION.pptx
Leprosy and NLEP programme community medicine
[EN] Industrial Machine Downtime Prediction
The THESIS FINAL-DEFENSE-PRESENTATION.pptx
Transcultural that can help you someday.
modul_python (1).pptx for professional and student
Managing Community Partner Relationships
Galatica Smart Energy Infrastructure Startup Pitch Deck
22.Patil - Early prediction of Alzheimer’s disease using convolutional neural...
Pilar Kemerdekaan dan Identi Bangsa.pptx
Optimise Shopper Experiences with a Strong Data Estate.pdf
climate analysis of Dhaka ,Banglades.pptx
Database Infoormation System (DBIS).pptx
Microsoft-Fabric-Unifying-Analytics-for-the-Modern-Enterprise Solution.pptx
REAL ILLUMINATI AGENT IN KAMPALA UGANDA CALL ON+256765750853/0705037305
Mega Projects Data Mega Projects Data
iec ppt-1 pptx icmr ppt on rehabilitation.pptx
168300704-gasification-ppt.pdfhghhhsjsjhsuxush
01_intro xxxxxxxxxxfffffffffffaaaaaaaaaaafg
Predictive modeling basics in data cleaning process
SAP 2 completion done . PRESENTATION.pptx

Spark Summit Presentation by Anjul Bhambhri

  • 1. Apache Spark: The Analytics Operating System Anjul Bhambhri Vice President, IBM Big Data Engineering
  • 2. Deep Blue SQL RISC DNA Transistor Magnetic Tape Linux PC Fortran DRAM Mainframe Watson Floppy Disk UPC Punch Card IBM: 100 years of (supporting) innovation
  • 4. Enhance it! Offer it! Leverage it! Spark Technology Center @ SF On-prem and on the cloud Inside our products At IBM, We Love Spark! IBM Cloud Data Services now featuring Spark is open for data
  • 5. IBM is Building on Apache Spark • IBM Analytics • IBM Commerce • IBM Watson • IBM Research • IBM Cloud Quarks from IBM Announced Feb 2016 • Open-source platform for building IoT applications • Light-weight & embeddable • Integrates with Spark
  • 6. • Lambda Architecture and Spark enable efficient batch and streaming analytics • Visualization at every step of data discovery enables better self service The Weather Company clusters running hot:  ~30 billion API requests per day  ~120 million active mobile users  #3 most active mobile user base  Billions of events per day (1.3M/sec)  ~360 PB of traffic daily  Need to keep data forever The use case: Efficient batch + streaming analysis Self-serve data science BI / visualization tool support An IBM Business Spark for daily weather
  • 7. Spark in Health Care Health Care Data Lakes  Improve how healthcare is delivered  Collect and combine data from dozens of sources  Clinical, Operational, Financial  Inside and outside your enterprise Benefits  Better medical outcomes for patients  Control cost and improve quality SystemML on Spark  Predictive Risk Modeling  Right patient intervention relating to adverse health events
  • 8. Spark in Telecom The challenge:  Improve customer satisfaction rates  Multiple channels for customer interactions  Very large data volumes The need:  Create a 360 degree view of a customer  Stitch all interactions across channels – “Customer Experience Journey”  Classify interaction sentiment and take necessary actions • Spark Streaming brings all the data together • Spark Core is used to process and transform text and voice data • Spark MLLib algorithms stitch interactions on a journey and score “sentiment” • Spark SQL drives interactive queries via visual dashboards PUB / SUB MQTT / WebSockets / Flume / Kafka ` ` ` Journey Dashboards Interaction & Journey Data Voice & Text Dat a
  • 9. Apache Spark: The Analytics Operating System THANK YOU!