SlideShare a Scribd company logo
Apache Spark:
The Analytics Operating System
Anjul Bhambhri
Vice President, IBM Big Data Engineering
Deep Blue SQL RISC
DNA Transistor Magnetic Tape Linux PC
Fortran DRAM Mainframe Watson
Floppy Disk UPC
Punch Card
IBM: 100 years of (supporting) innovation
The
Analytics
Operating System
Apache Spark
Enhance it! Offer it!
Leverage it!
Spark Technology
Center @ SF
On-prem and on
the cloud
Inside our products
At IBM, We Love Spark!
IBM Cloud Data Services
now featuring Spark is
open for data
IBM is Building on Apache Spark
• IBM Analytics
• IBM Commerce
• IBM Watson
• IBM Research
• IBM Cloud
Quarks from IBM
Announced Feb 2016
• Open-source platform for
building IoT applications
• Light-weight & embeddable
• Integrates with Spark
• Lambda Architecture and Spark enable efficient batch and streaming analytics
• Visualization at every step of data discovery enables better self service
The Weather Company clusters running hot:
 ~30 billion API requests per day
 ~120 million active mobile users
 #3 most active mobile user base
 Billions of events per day (1.3M/sec)
 ~360 PB of traffic daily
 Need to keep data forever
The use case:
Efficient batch + streaming analysis
Self-serve data science
BI / visualization tool support
An IBM Business
Spark for daily weather
Spark in Health Care
Health Care Data Lakes
 Improve how healthcare is delivered
 Collect and combine data from dozens of sources
 Clinical, Operational, Financial
 Inside and outside your enterprise
Benefits
 Better medical outcomes for patients
 Control cost and improve quality
SystemML on Spark
 Predictive Risk Modeling
 Right patient intervention relating to adverse health events
Spark in Telecom
The challenge:
 Improve customer satisfaction rates
 Multiple channels for customer interactions
 Very large data volumes
The need:
 Create a 360 degree view of a customer
 Stitch all interactions across channels –
“Customer Experience Journey”
 Classify interaction sentiment and take
necessary actions
• Spark Streaming brings all the data together
• Spark Core is used to process and transform text and voice data
• Spark MLLib algorithms stitch interactions on a journey and score “sentiment”
• Spark SQL drives interactive queries via visual dashboards
PUB / SUB
MQTT / WebSockets / Flume / Kafka
` ` `
Journey
Dashboards
Interaction & Journey Data
Voice &
Text Dat
a
Apache Spark:
The Analytics Operating System
THANK YOU!

More Related Content

PPTX
Spark Summit Keynote by Seshu Adunuthula
PDF
Spark and the Enterprise by Tony Baer
PDF
Spark Summit EU: IBM Keynote
PPTX
Spark Summit Keynote by Suren Nathan
PDF
Big Data Expo 2015 - Microsoft Transform you data into intelligent action
PDF
Using Apache Spark for Intelligent Services by Alexis Roos
PPTX
Virtualizing Analytics with Apache Spark: Keynote by Arsalan Tavakoli
PDF
Systems of Intelligence: The Biggest Change in Enterprise Applications in 50 ...
Spark Summit Keynote by Seshu Adunuthula
Spark and the Enterprise by Tony Baer
Spark Summit EU: IBM Keynote
Spark Summit Keynote by Suren Nathan
Big Data Expo 2015 - Microsoft Transform you data into intelligent action
Using Apache Spark for Intelligent Services by Alexis Roos
Virtualizing Analytics with Apache Spark: Keynote by Arsalan Tavakoli
Systems of Intelligence: The Biggest Change in Enterprise Applications in 50 ...

What's hot (20)

PDF
Memrise presentation @ London Snowplow meetup
PPTX
ArcGIS + sap hana analytics webinar
PDF
Simply Business - Near Real Time Event Processing
PDF
Achieving Massive Concurrency & Sub-second Query Latency on Cloud Warehouses ...
PDF
Unlocking Value in Device Data Using Spark: Spark Summit East talk by John La...
PDF
Managing the Machine Learning Lifecycle with MLflow
PDF
Prakash_Wagle_Resume
PPTX
Simply Business and Snowplow - Multichannel Attribution Analysis
PPTX
Scala Jday 2014
PDF
Introducing Sauna - Decisioning and response platform from Snowplow
PDF
Snowplow - Evolve your analytics stack with your business
PDF
Best Practices for Streaming IoT Data with MQTT and Apache Kafka
PDF
Snowplow: open source game analytics powered by AWS
PDF
The culture trip snowplow implementation
PDF
Snowplow presentation for Amsterdam Meetup #3
PDF
JAZOON'13 - Kai Waehner - Hadoop Integration
PPTX
Software Development in the Cloud
PDF
Enterprise search solutions
PDF
DataStax & O'Reilly Media: Large Scale Data Analytics with Spark and Cassandr...
PDF
Data Infrastructure at Flipkart (VLDB 2016)
Memrise presentation @ London Snowplow meetup
ArcGIS + sap hana analytics webinar
Simply Business - Near Real Time Event Processing
Achieving Massive Concurrency & Sub-second Query Latency on Cloud Warehouses ...
Unlocking Value in Device Data Using Spark: Spark Summit East talk by John La...
Managing the Machine Learning Lifecycle with MLflow
Prakash_Wagle_Resume
Simply Business and Snowplow - Multichannel Attribution Analysis
Scala Jday 2014
Introducing Sauna - Decisioning and response platform from Snowplow
Snowplow - Evolve your analytics stack with your business
Best Practices for Streaming IoT Data with MQTT and Apache Kafka
Snowplow: open source game analytics powered by AWS
The culture trip snowplow implementation
Snowplow presentation for Amsterdam Meetup #3
JAZOON'13 - Kai Waehner - Hadoop Integration
Software Development in the Cloud
Enterprise search solutions
DataStax & O'Reilly Media: Large Scale Data Analytics with Spark and Cassandr...
Data Infrastructure at Flipkart (VLDB 2016)
Ad

Viewers also liked (18)

PDF
Temporal Operators For Spark Streaming And Its Application For Office365 Serv...
PDF
Spark Meetup TensorFrames
PDF
Yggdrasil: Faster Decision Trees Using Column Partitioning In Spark
PDF
Massive Simulations In Spark: Distributed Monte Carlo For Global Health Forec...
PDF
Heterogeneous Workflows With Spark At Netflix
PDF
Scalable And Incremental Data Profiling With Spark
PDF
Spark on Mesos
PDF
High-Performance Python On Spark
PDF
GPU Computing With Apache Spark And Python
PPTX
Disrupting Big Data with Apache Spark in the Cloud
PDF
CaffeOnSpark: Deep Learning On Spark Cluster
PDF
Netflix - Productionizing Spark On Yarn For ETL At Petabyte Scale
PDF
MLeap: Productionize Data Science Workflows Using Spark
PDF
Low Latency Execution For Apache Spark
PDF
Scalable Deep Learning Platform On Spark In Baidu
PDF
Big Data in Production: Lessons from Running in the Cloud
PDF
Re-Architecting Spark For Performance Understandability
PDF
Building Realtime Data Pipelines with Kafka Connect and Spark Streaming
Temporal Operators For Spark Streaming And Its Application For Office365 Serv...
Spark Meetup TensorFrames
Yggdrasil: Faster Decision Trees Using Column Partitioning In Spark
Massive Simulations In Spark: Distributed Monte Carlo For Global Health Forec...
Heterogeneous Workflows With Spark At Netflix
Scalable And Incremental Data Profiling With Spark
Spark on Mesos
High-Performance Python On Spark
GPU Computing With Apache Spark And Python
Disrupting Big Data with Apache Spark in the Cloud
CaffeOnSpark: Deep Learning On Spark Cluster
Netflix - Productionizing Spark On Yarn For ETL At Petabyte Scale
MLeap: Productionize Data Science Workflows Using Spark
Low Latency Execution For Apache Spark
Scalable Deep Learning Platform On Spark In Baidu
Big Data in Production: Lessons from Running in the Cloud
Re-Architecting Spark For Performance Understandability
Building Realtime Data Pipelines with Kafka Connect and Spark Streaming
Ad

Similar to Spark Summit East Keynote by Anjul Bhambhri (20)

PDF
Data Democracy: Journey to User-Facing Analytics - Pulsar Summit SF 2022
PDF
Data-Driven Transformation: Leveraging Big Data at Showtime with Apache Spark
PDF
Real-Time Analytics with Confluent and MemSQL
PPTX
Keine Angst vorm Dinosaurier: Mainframe-Integration und -Offloading mit Confl...
PDF
Webinar - Big Data: Let's SMACK - Jorg Schad
PDF
Apache Kafka as Event Streaming Platform for Microservice Architectures
PDF
Big Data Analytics Platforms by KTH and RISE SICS
PPTX
Apache Spark – The New Enterprise Backbone for ETL, Batch Processing and Real...
PDF
Fast Data – Fast Cars: Wie Apache Kafka die Datenwelt revolutioniert
PDF
Apache Kafka vs. Integration Middleware (MQ, ETL, ESB)
PDF
Turn Data Into Actionable Insights - StampedeCon 2016
PDF
Time's Up! Getting Value from Big Data Now
PPTX
SQL Saturday Redmond The Power Platform
PPTX
Apache Spark Streaming -Real time web server log analytics
PPTX
Getting Started with Splunk Enterprise
PPTX
Thing you didn't know you could do in Spark
PDF
Unlocking value with event-driven architecture by Confluent
PDF
IMCSummit 2015 - Day 1 IT Business Track - From Spark to Ignition
PPTX
PDF
Path to the future #4 - Ingestão, processamento e análise de dados em tempo real
Data Democracy: Journey to User-Facing Analytics - Pulsar Summit SF 2022
Data-Driven Transformation: Leveraging Big Data at Showtime with Apache Spark
Real-Time Analytics with Confluent and MemSQL
Keine Angst vorm Dinosaurier: Mainframe-Integration und -Offloading mit Confl...
Webinar - Big Data: Let's SMACK - Jorg Schad
Apache Kafka as Event Streaming Platform for Microservice Architectures
Big Data Analytics Platforms by KTH and RISE SICS
Apache Spark – The New Enterprise Backbone for ETL, Batch Processing and Real...
Fast Data – Fast Cars: Wie Apache Kafka die Datenwelt revolutioniert
Apache Kafka vs. Integration Middleware (MQ, ETL, ESB)
Turn Data Into Actionable Insights - StampedeCon 2016
Time's Up! Getting Value from Big Data Now
SQL Saturday Redmond The Power Platform
Apache Spark Streaming -Real time web server log analytics
Getting Started with Splunk Enterprise
Thing you didn't know you could do in Spark
Unlocking value with event-driven architecture by Confluent
IMCSummit 2015 - Day 1 IT Business Track - From Spark to Ignition
Path to the future #4 - Ingestão, processamento e análise de dados em tempo real

More from Jen Aman (20)

PPTX
Deep Learning and Streaming in Apache Spark 2.x with Matei Zaharia
PDF
Snorkel: Dark Data and Machine Learning with Christopher Ré
PDF
Deep Learning on Apache® Spark™: Workflows and Best Practices
PDF
Deep Learning on Apache® Spark™ : Workflows and Best Practices
PDF
RISELab:Enabling Intelligent Real-Time Decisions
PDF
Spatial Analysis On Histological Images Using Spark
PDF
A Graph-Based Method For Cross-Entity Threat Detection
PDF
Time-Evolving Graph Processing On Commodity Clusters
PDF
Deploying Accelerators At Datacenter Scale Using Spark
PDF
Re-Architecting Spark For Performance Understandability
PDF
Efficient State Management With Spark 2.0 And Scale-Out Databases
PDF
Livy: A REST Web Service For Apache Spark
PDF
Spark And Cassandra: 2 Fast, 2 Furious
PDF
Building Custom Machine Learning Algorithms With Apache SystemML
PDF
Elasticsearch And Apache Lucene For Apache Spark And MLlib
PDF
Spark at Bloomberg: Dynamically Composable Analytics
PDF
Spark Uber Development Kit
PDF
EclairJS = Node.Js + Apache Spark
PDF
Spark: Interactive To Production
PDF
Scaling Machine Learning To Billions Of Parameters
Deep Learning and Streaming in Apache Spark 2.x with Matei Zaharia
Snorkel: Dark Data and Machine Learning with Christopher Ré
Deep Learning on Apache® Spark™: Workflows and Best Practices
Deep Learning on Apache® Spark™ : Workflows and Best Practices
RISELab:Enabling Intelligent Real-Time Decisions
Spatial Analysis On Histological Images Using Spark
A Graph-Based Method For Cross-Entity Threat Detection
Time-Evolving Graph Processing On Commodity Clusters
Deploying Accelerators At Datacenter Scale Using Spark
Re-Architecting Spark For Performance Understandability
Efficient State Management With Spark 2.0 And Scale-Out Databases
Livy: A REST Web Service For Apache Spark
Spark And Cassandra: 2 Fast, 2 Furious
Building Custom Machine Learning Algorithms With Apache SystemML
Elasticsearch And Apache Lucene For Apache Spark And MLlib
Spark at Bloomberg: Dynamically Composable Analytics
Spark Uber Development Kit
EclairJS = Node.Js + Apache Spark
Spark: Interactive To Production
Scaling Machine Learning To Billions Of Parameters

Recently uploaded (20)

PDF
Introduction to the R Programming Language
PDF
Business Analytics and business intelligence.pdf
PPTX
AI Strategy room jwfjksfksfjsjsjsjsjfsjfsj
PPTX
New ISO 27001_2022 standard and the changes
PDF
annual-report-2024-2025 original latest.
PPTX
importance of Data-Visualization-in-Data-Science. for mba studnts
PDF
Data Engineering Interview Questions & Answers Batch Processing (Spark, Hadoo...
PDF
OneRead_20250728_1808.pdfhdhddhshahwhwwjjaaja
PPTX
Database Infoormation System (DBIS).pptx
PPTX
Acceptance and paychological effects of mandatory extra coach I classes.pptx
PPTX
retention in jsjsksksksnbsndjddjdnFPD.pptx
PPTX
01_intro xxxxxxxxxxfffffffffffaaaaaaaaaaafg
PDF
Microsoft Core Cloud Services powerpoint
PDF
Systems Analysis and Design, 12th Edition by Scott Tilley Test Bank.pdf
PPTX
SAP 2 completion done . PRESENTATION.pptx
PPTX
Market Analysis -202507- Wind-Solar+Hybrid+Street+Lights+for+the+North+Amer...
PPTX
STERILIZATION AND DISINFECTION-1.ppthhhbx
PDF
Votre score augmente si vous choisissez une catégorie et que vous rédigez une...
PPT
lectureusjsjdhdsjjshdshshddhdhddhhd1.ppt
PPTX
Qualitative Qantitative and Mixed Methods.pptx
Introduction to the R Programming Language
Business Analytics and business intelligence.pdf
AI Strategy room jwfjksfksfjsjsjsjsjfsjfsj
New ISO 27001_2022 standard and the changes
annual-report-2024-2025 original latest.
importance of Data-Visualization-in-Data-Science. for mba studnts
Data Engineering Interview Questions & Answers Batch Processing (Spark, Hadoo...
OneRead_20250728_1808.pdfhdhddhshahwhwwjjaaja
Database Infoormation System (DBIS).pptx
Acceptance and paychological effects of mandatory extra coach I classes.pptx
retention in jsjsksksksnbsndjddjdnFPD.pptx
01_intro xxxxxxxxxxfffffffffffaaaaaaaaaaafg
Microsoft Core Cloud Services powerpoint
Systems Analysis and Design, 12th Edition by Scott Tilley Test Bank.pdf
SAP 2 completion done . PRESENTATION.pptx
Market Analysis -202507- Wind-Solar+Hybrid+Street+Lights+for+the+North+Amer...
STERILIZATION AND DISINFECTION-1.ppthhhbx
Votre score augmente si vous choisissez une catégorie et que vous rédigez une...
lectureusjsjdhdsjjshdshshddhdhddhhd1.ppt
Qualitative Qantitative and Mixed Methods.pptx

Spark Summit East Keynote by Anjul Bhambhri

  • 1. Apache Spark: The Analytics Operating System Anjul Bhambhri Vice President, IBM Big Data Engineering
  • 2. Deep Blue SQL RISC DNA Transistor Magnetic Tape Linux PC Fortran DRAM Mainframe Watson Floppy Disk UPC Punch Card IBM: 100 years of (supporting) innovation
  • 4. Enhance it! Offer it! Leverage it! Spark Technology Center @ SF On-prem and on the cloud Inside our products At IBM, We Love Spark! IBM Cloud Data Services now featuring Spark is open for data
  • 5. IBM is Building on Apache Spark • IBM Analytics • IBM Commerce • IBM Watson • IBM Research • IBM Cloud Quarks from IBM Announced Feb 2016 • Open-source platform for building IoT applications • Light-weight & embeddable • Integrates with Spark
  • 6. • Lambda Architecture and Spark enable efficient batch and streaming analytics • Visualization at every step of data discovery enables better self service The Weather Company clusters running hot:  ~30 billion API requests per day  ~120 million active mobile users  #3 most active mobile user base  Billions of events per day (1.3M/sec)  ~360 PB of traffic daily  Need to keep data forever The use case: Efficient batch + streaming analysis Self-serve data science BI / visualization tool support An IBM Business Spark for daily weather
  • 7. Spark in Health Care Health Care Data Lakes  Improve how healthcare is delivered  Collect and combine data from dozens of sources  Clinical, Operational, Financial  Inside and outside your enterprise Benefits  Better medical outcomes for patients  Control cost and improve quality SystemML on Spark  Predictive Risk Modeling  Right patient intervention relating to adverse health events
  • 8. Spark in Telecom The challenge:  Improve customer satisfaction rates  Multiple channels for customer interactions  Very large data volumes The need:  Create a 360 degree view of a customer  Stitch all interactions across channels – “Customer Experience Journey”  Classify interaction sentiment and take necessary actions • Spark Streaming brings all the data together • Spark Core is used to process and transform text and voice data • Spark MLLib algorithms stitch interactions on a journey and score “sentiment” • Spark SQL drives interactive queries via visual dashboards PUB / SUB MQTT / WebSockets / Flume / Kafka ` ` ` Journey Dashboards Interaction & Journey Data Voice & Text Dat a
  • 9. Apache Spark: The Analytics Operating System THANK YOU!