SlideShare a Scribd company logo
Apache Spark:
The Analytics Operating System
Anjul Bhambhri
Vice President, IBM Big Data Engineering
Deep Blue SQL RISC
DNA Transistor Magnetic Tape Linux PC
Fortran DRAM Mainframe Watson
Floppy Disk UPC
Punch Card
IBM: 100 years of (supporting) innovation
The
Analytics
Operating System
Apache Spark
Enhance it! Offer it!
Leverage it!
Spark Technology
Center @ SF
On-prem and on
the cloud
Inside our products
At IBM, We Love Spark!
IBM Cloud Data Services
now featuring Spark is
open for data
IBM is Building on Apache Spark
• IBM Analytics
• IBM Commerce
• IBM Watson
• IBM Research
• IBM Cloud
Quarks from IBM
Announced Feb 2016
• Open-source platform for
building IoT applications
• Light-weight & embeddable
• Integrates with Spark
• Lambda Architecture and Spark enable efficient batch and streaming analytics
• Visualization at every step of data discovery enables better self service
The Weather Company clusters running hot:
 ~30 billion API requests per day
 ~120 million active mobile users
 #3 most active mobile user base
 Billions of events per day (1.3M/sec)
 ~360 PB of traffic daily
 Need to keep data forever
The use case:
Efficient batch + streaming analysis
Self-serve data science
BI / visualization tool support
An IBM Business
Spark for daily weather
Spark in Health Care
Health Care Data Lakes
 Improve how healthcare is delivered
 Collect and combine data from dozens of sources
 Clinical, Operational, Financial
 Inside and outside your enterprise
Benefits
 Better medical outcomes for patients
 Control cost and improve quality
SystemML on Spark
 Predictive Risk Modeling
 Right patient intervention relating to adverse health events
Spark in Telecom
The challenge:
 Improve customer satisfaction rates
 Multiple channels for customer interactions
 Very large data volumes
The need:
 Create a 360 degree view of a customer
 Stitch all interactions across channels –
“Customer Experience Journey”
 Classify interaction sentiment and take
necessary actions
• Spark Streaming brings all the data together
• Spark Core is used to process and transform text and voice data
• Spark MLLib algorithms stitch interactions on a journey and score “sentiment”
• Spark SQL drives interactive queries via visual dashboards
PUB / SUB
MQTT / WebSockets / Flume / Kafka
` ` `
Journey
Dashboards
Interaction & Journey Data
Voice &
Text Dat
a
Apache Spark:
The Analytics Operating System
THANK YOU!

More Related Content

PDF
2016 Spark Summit East Keynote: Matei Zaharia
PDF
Apache® Spark™ MLlib: From Quick Start to Scikit-Learn
PDF
MLLeap, or How to Productionize Data Science Workflows Using Spark by Mikha...
PDF
Interactive Visualization of Streaming Data Powered by Spark by Ruhollah Farc...
PDF
An Introduction to Sparkling Water by Michal Malohlava
PDF
MLeap: Release Spark ML Pipelines
PPTX
Zeppelin at Twitter
PDF
Insights Without Tradeoffs: Using Structured Streaming
2016 Spark Summit East Keynote: Matei Zaharia
Apache® Spark™ MLlib: From Quick Start to Scikit-Learn
MLLeap, or How to Productionize Data Science Workflows Using Spark by Mikha...
Interactive Visualization of Streaming Data Powered by Spark by Ruhollah Farc...
An Introduction to Sparkling Water by Michal Malohlava
MLeap: Release Spark ML Pipelines
Zeppelin at Twitter
Insights Without Tradeoffs: Using Structured Streaming

What's hot (20)

PPTX
Spline: Data Lineage For Spark Structured Streaming
PPTX
Presto: Distributed sql query engine
PDF
Spark Summit EU talk by Yiannis Gkoufas
PDF
Sydney Spark Meetup - September 2015
PPTX
Spline 2 - Vision and Architecture Overview
PDF
Writing Continuous Applications with Structured Streaming PySpark API
PDF
Distributed ML in Apache Spark
PDF
Sydney Apache Spark Meetup - Spark Natural Language Processing
PDF
Spark Summit EU talk by Stephan Kessler
PDF
Extending Apache Spark APIs Without Going Near Spark Source or a Compiler wi...
PDF
Apache spark y cómo lo usamos en nuestros proyectos
PDF
Apache HBase Workshop
PDF
Productionizing H2O Models with Apache Spark with Jakub Hava and Michal Maloh...
PPTX
Future of data visualization
PDF
Tuning ML Models: Scaling, Workflows, and Architecture
PDF
Spark Summit San Francisco 2016 - Matei Zaharia Keynote: Apache Spark 2.0
PDF
 Kafka Streams VS Spark Structured Streaming - Modern Stream Processing Engin...
PDF
Building a Data Pipeline from Scratch - Joe Crobak
PDF
Spark Summit EU talk by Dean Wampler
PDF
How We Optimize Spark SQL Jobs With parallel and sync IO
Spline: Data Lineage For Spark Structured Streaming
Presto: Distributed sql query engine
Spark Summit EU talk by Yiannis Gkoufas
Sydney Spark Meetup - September 2015
Spline 2 - Vision and Architecture Overview
Writing Continuous Applications with Structured Streaming PySpark API
Distributed ML in Apache Spark
Sydney Apache Spark Meetup - Spark Natural Language Processing
Spark Summit EU talk by Stephan Kessler
Extending Apache Spark APIs Without Going Near Spark Source or a Compiler wi...
Apache spark y cómo lo usamos en nuestros proyectos
Apache HBase Workshop
Productionizing H2O Models with Apache Spark with Jakub Hava and Michal Maloh...
Future of data visualization
Tuning ML Models: Scaling, Workflows, and Architecture
Spark Summit San Francisco 2016 - Matei Zaharia Keynote: Apache Spark 2.0
 Kafka Streams VS Spark Structured Streaming - Modern Stream Processing Engin...
Building a Data Pipeline from Scratch - Joe Crobak
Spark Summit EU talk by Dean Wampler
How We Optimize Spark SQL Jobs With parallel and sync IO
Ad

Viewers also liked (16)

PPTX
Spark Summit Presentation by Anjul Bhambhri
PPTX
Personal Data Law Update, Kazakhstan, 2015
PPT
Social Psychology comic presentation slides - fnbe 0315
DOC
NCTBS_TorontoReview
PDF
Ly do toi tin
PDF
Ca vs mba
PDF
R comamnder pdf
PPTX
[ AULA 1 LV ] O CORTIÇO, ALUÍSIO AZEVEDO
ODT
Eras norman autoevaluacion
PPTX
Feb 24 CCCOER Advisory Mtg
PDF
new paper for khulna
PDF
Avelli MCS_pitch
DOCX
Proyecto de informatica
PDF
Excel referencias
PDF
Качество исследуемых лекарственных препаратов для терапии соматическими клетками
PDF
Neu khong chi nho an dien cua dct
Spark Summit Presentation by Anjul Bhambhri
Personal Data Law Update, Kazakhstan, 2015
Social Psychology comic presentation slides - fnbe 0315
NCTBS_TorontoReview
Ly do toi tin
Ca vs mba
R comamnder pdf
[ AULA 1 LV ] O CORTIÇO, ALUÍSIO AZEVEDO
Eras norman autoevaluacion
Feb 24 CCCOER Advisory Mtg
new paper for khulna
Avelli MCS_pitch
Proyecto de informatica
Excel referencias
Качество исследуемых лекарственных препаратов для терапии соматическими клетками
Neu khong chi nho an dien cua dct
Ad

Similar to Keynote at spark summit east anjul (20)

PDF
Spark Summit EU: IBM Keynote
PDF
Data Democracy: Journey to User-Facing Analytics - Pulsar Summit SF 2022
PDF
Data-Driven Transformation: Leveraging Big Data at Showtime with Apache Spark
PDF
Real-Time Analytics with Confluent and MemSQL
PPTX
Keine Angst vorm Dinosaurier: Mainframe-Integration und -Offloading mit Confl...
PDF
Webinar - Big Data: Let's SMACK - Jorg Schad
PDF
Apache Kafka as Event Streaming Platform for Microservice Architectures
PDF
Big Data Analytics Platforms by KTH and RISE SICS
PPTX
Apache Spark – The New Enterprise Backbone for ETL, Batch Processing and Real...
PDF
Fast Data – Fast Cars: Wie Apache Kafka die Datenwelt revolutioniert
PDF
Apache Kafka vs. Integration Middleware (MQ, ETL, ESB)
PDF
Turn Data Into Actionable Insights - StampedeCon 2016
PDF
Time's Up! Getting Value from Big Data Now
PPTX
SQL Saturday Redmond The Power Platform
PPTX
Apache Spark Streaming -Real time web server log analytics
PPTX
Getting Started with Splunk Enterprise
PPTX
Thing you didn't know you could do in Spark
PDF
Unlocking value with event-driven architecture by Confluent
PDF
IMCSummit 2015 - Day 1 IT Business Track - From Spark to Ignition
PPTX
Spark Summit EU: IBM Keynote
Data Democracy: Journey to User-Facing Analytics - Pulsar Summit SF 2022
Data-Driven Transformation: Leveraging Big Data at Showtime with Apache Spark
Real-Time Analytics with Confluent and MemSQL
Keine Angst vorm Dinosaurier: Mainframe-Integration und -Offloading mit Confl...
Webinar - Big Data: Let's SMACK - Jorg Schad
Apache Kafka as Event Streaming Platform for Microservice Architectures
Big Data Analytics Platforms by KTH and RISE SICS
Apache Spark – The New Enterprise Backbone for ETL, Batch Processing and Real...
Fast Data – Fast Cars: Wie Apache Kafka die Datenwelt revolutioniert
Apache Kafka vs. Integration Middleware (MQ, ETL, ESB)
Turn Data Into Actionable Insights - StampedeCon 2016
Time's Up! Getting Value from Big Data Now
SQL Saturday Redmond The Power Platform
Apache Spark Streaming -Real time web server log analytics
Getting Started with Splunk Enterprise
Thing you didn't know you could do in Spark
Unlocking value with event-driven architecture by Confluent
IMCSummit 2015 - Day 1 IT Business Track - From Spark to Ignition

Recently uploaded (20)

PDF
Architecting across the Boundaries of two Complex Domains - Healthcare & Tech...
PDF
Profit Center Accounting in SAP S/4HANA, S4F28 Col11
PPTX
SOPHOS-XG Firewall Administrator PPT.pptx
PPTX
Group 1 Presentation -Planning and Decision Making .pptx
PDF
TokAI - TikTok AI Agent : The First AI Application That Analyzes 10,000+ Vira...
PDF
Network Security Unit 5.pdf for BCA BBA.
PDF
Reach Out and Touch Someone: Haptics and Empathic Computing
PDF
Mushroom cultivation and it's methods.pdf
PPTX
OMC Textile Division Presentation 2021.pptx
PPTX
Digital-Transformation-Roadmap-for-Companies.pptx
PDF
Advanced methodologies resolving dimensionality complications for autism neur...
PDF
7 ChatGPT Prompts to Help You Define Your Ideal Customer Profile.pdf
PDF
Encapsulation_ Review paper, used for researhc scholars
PPTX
Machine Learning_overview_presentation.pptx
PDF
NewMind AI Weekly Chronicles - August'25-Week II
PPTX
Spectroscopy.pptx food analysis technology
PDF
August Patch Tuesday
PDF
Mobile App Security Testing_ A Comprehensive Guide.pdf
PDF
MIND Revenue Release Quarter 2 2025 Press Release
PDF
Unlocking AI with Model Context Protocol (MCP)
Architecting across the Boundaries of two Complex Domains - Healthcare & Tech...
Profit Center Accounting in SAP S/4HANA, S4F28 Col11
SOPHOS-XG Firewall Administrator PPT.pptx
Group 1 Presentation -Planning and Decision Making .pptx
TokAI - TikTok AI Agent : The First AI Application That Analyzes 10,000+ Vira...
Network Security Unit 5.pdf for BCA BBA.
Reach Out and Touch Someone: Haptics and Empathic Computing
Mushroom cultivation and it's methods.pdf
OMC Textile Division Presentation 2021.pptx
Digital-Transformation-Roadmap-for-Companies.pptx
Advanced methodologies resolving dimensionality complications for autism neur...
7 ChatGPT Prompts to Help You Define Your Ideal Customer Profile.pdf
Encapsulation_ Review paper, used for researhc scholars
Machine Learning_overview_presentation.pptx
NewMind AI Weekly Chronicles - August'25-Week II
Spectroscopy.pptx food analysis technology
August Patch Tuesday
Mobile App Security Testing_ A Comprehensive Guide.pdf
MIND Revenue Release Quarter 2 2025 Press Release
Unlocking AI with Model Context Protocol (MCP)

Keynote at spark summit east anjul

  • 1. Apache Spark: The Analytics Operating System Anjul Bhambhri Vice President, IBM Big Data Engineering
  • 2. Deep Blue SQL RISC DNA Transistor Magnetic Tape Linux PC Fortran DRAM Mainframe Watson Floppy Disk UPC Punch Card IBM: 100 years of (supporting) innovation
  • 4. Enhance it! Offer it! Leverage it! Spark Technology Center @ SF On-prem and on the cloud Inside our products At IBM, We Love Spark! IBM Cloud Data Services now featuring Spark is open for data
  • 5. IBM is Building on Apache Spark • IBM Analytics • IBM Commerce • IBM Watson • IBM Research • IBM Cloud Quarks from IBM Announced Feb 2016 • Open-source platform for building IoT applications • Light-weight & embeddable • Integrates with Spark
  • 6. • Lambda Architecture and Spark enable efficient batch and streaming analytics • Visualization at every step of data discovery enables better self service The Weather Company clusters running hot:  ~30 billion API requests per day  ~120 million active mobile users  #3 most active mobile user base  Billions of events per day (1.3M/sec)  ~360 PB of traffic daily  Need to keep data forever The use case: Efficient batch + streaming analysis Self-serve data science BI / visualization tool support An IBM Business Spark for daily weather
  • 7. Spark in Health Care Health Care Data Lakes  Improve how healthcare is delivered  Collect and combine data from dozens of sources  Clinical, Operational, Financial  Inside and outside your enterprise Benefits  Better medical outcomes for patients  Control cost and improve quality SystemML on Spark  Predictive Risk Modeling  Right patient intervention relating to adverse health events
  • 8. Spark in Telecom The challenge:  Improve customer satisfaction rates  Multiple channels for customer interactions  Very large data volumes The need:  Create a 360 degree view of a customer  Stitch all interactions across channels – “Customer Experience Journey”  Classify interaction sentiment and take necessary actions • Spark Streaming brings all the data together • Spark Core is used to process and transform text and voice data • Spark MLLib algorithms stitch interactions on a journey and score “sentiment” • Spark SQL drives interactive queries via visual dashboards PUB / SUB MQTT / WebSockets / Flume / Kafka ` ` ` Journey Dashboards Interaction & Journey Data Voice & Text Dat a
  • 9. Apache Spark: The Analytics Operating System THANK YOU!