SlideShare a Scribd company logo
© 2015 IBM Corporation
How Spark Enables the Internet of Things:
Efficient Integration of Multiple Spark
Components for Smart City Use Cases
Paula Ta-Shma
IBM Research
paula@il.ibm.com
Joint work with:
Adnan Akbar, University of Surrey
Michael Factor, IBM Research
Guy Hadash, IBM Research
Juan Sancho, ATOS
© 2015 IBM Corporation2
The Evolution of Data Collection
Internet of
Things
© 2015 IBM Corporation3
2005 2012 2017
The IoT market will grow to
$1.7 trillion in 2020 (IDC)
By 2020 the number of networked devices
will be 30 billion (IDC), more than 4 times
the entire global population
IoT : The Biggest Big Data
GlobalDataVolumeinExabytes
2005 2012 2017
© 2015 IBM Corporation4
EMT Madrid Bus Company Needs to Make Decisions
According to Current and Predicted Future Traffic State
 The Problem
– EMT needs to staff control rooms where employees manually analyze Madrid traffic sensor output.
This can be slow and costly.
 Objective
– Improve customer satisfaction and reduce costs by responding more efficiently and quickly to real-
time traffic problems
 Approach
– Monitor data from up to 3000 sensors. React by rerouting buses, modifying traffic lights, etc., based
upon knowledge derived from historical data
Today Tomorrow
© 2015 IBM Corporation5
1. Collect historical time series data
– Collect data from devices
– Aggregate into objects
– Index and/or partition
Generic IoT Architecture – Data Flow
Secor
IoT
Swift
© 2015 IBM Corporation6
2. Learn patterns in data
– May be time/location dependent
– Generate thresholds, classifiers etc.
Generic IoT Architecture – Data Flow
Secor
Swift
© 2015 IBM Corporation7
IoT
3. Apply what was learned on
real time data stream
– Take action
Generic IoT Architecture – Data Flow
Secor
CEP
Swift
© 2015 IBM Corporation8
How Spark Enables the Internet of Things: Efficient Integration of Multiple Spark
Components for Smart City Use Cases
IoT
Generic IoT Architecture – Data Flow
CEP
Secor
Swift
Green Flows: Real time
Purple Flows: Batch
© 2015 IBM Corporation9
Aim: Collect historical timeseries data for analysis
– Continuously collect data from up to 3000 Madrid council traffic sensors via web service
- Data includes traffic speeds and intensities, updated every 5 mins
– Push the messages to Kafka
– Use Secor to aggregate multiple messages into a single Swift object
- According to policy, e.g., every 60 mins
- Possibly partition the data, e.g. according to date
- Convert to Parquet format
- Annotate with metadata, e.g., min/max speed, start/end time
– Index Swift objects according to their metadata using ElasticSearch
Secor
Swift
IoT Architecture – Madrid Traffic – Ingestion Flow
IoT
© 2015 IBM Corporation10
IoT Architecture – Madrid Traffic – Data Access
Aim: Access data efficiently and cost
effectively
– Store IoT data in OpenStack Swift object
storage
- Open source, low cost deployment, and
highly scalable
– Parquet data is accessible via Spark SQL
– Optimized predicate pushdown
- Custom Spark SQL external data source
driver
- Uses object metadata indexes
- Searches for Swift objects whose min/max
values overlap requested ranges
Get all data for morning traffic:
SELECT codigo, intensidad, velocidad FROM
madridtraffic
WHERE tf >= '08:00:00' AND tf <= '12:00:00'
Brute force method
13245 Swift requests
Optimized predicate pushdown
616 Swift requests
21.5 times improvement
Swift
© 2015 IBM Corporation11
IoT Architecture – Madrid Traffic – Machine Learning
Aim: Learn to differentiate between ‘good’ and
‘bad’ traffic
– Depends on context
- Time (morning/evening), Day (weekday/weekend)
- Location
– Use Spark MLlib k-means clustering
– Produce threshold values for real-time decision making
– Re-run algorithm when quality of clusters decreases
- Can use silhouette index to measure quality
Swift
© 2015 IBM Corporation12
IoT Architecture – Madrid Traffic – Machine Learning
Event Detection:
• Use Spark MLlib k-means
clustering to separate
data into 2 clusters
• Find the midpoint between
the 2 cluster centres
• Use this midpoint to
generate the thresholds
• Repeat for each context
e.g. time period (morning,
afternoon, evening, night)
Anomaly Detection:
• Use a single cluster and
define an anomaly to be
further than a certain
distance from the cluster
centre
Morning Traffic on Weekdays
© 2015 IBM Corporation13
IoT Architecture – Madrid Traffic –
Real Time Decision Making
Aim: Respond in real time to traffic conditions
– Use Complex Event Processing (CEP) approach
- Rule based
- Process events record by record
- CEP rules are typically defined manually but in many
cases it is difficult to get them right
- We automate this process and make it smart
- uCEP has a small footprint, can be run at the edge
CEP
IoT
Work in Progress
Proactive approach:
• Use Spark streaming
linear regression to
predict traffic behavior
(e.g. speed, intensity)
for near future
• Apply CEP on
predicted data
• Respond pro-actively
to predicted events
such as traffic
congestion
– e.g. EMT can
proactively re-
route buses
© 2015 IBM Corporation14
Demo
© 2015 IBM Corporation15
Our Architecture Applies to Many IoT Use Cases
 Energy/utilities
– Anomaly detection
- Pipe leakage
- Appliance malfunction
– Occupancy detection
 Healthcare
– Healthcare patient
monitoring/alert/response
 Insurance
– Driver behavior and location
monitoring
 Transportation
– Connected vehicles, engine
diagnostics, automated service
scheduling
 Logistics
– Goods tracking, sensitive
goods management
© 2015 IBM Corporation
Data
Sources
Apache
Spark
Node-RED
Secor
Message
Bus
Data
Storage
Data
Analytics
Data
Visualization
Freeboard Dashboard
Object
Storage
16
MQTT
The Madrid Traffic Use Case on IBM Bluemix
Madrid Traffic Sensors
Joint work with Naeem Altaf and team
© 2015 IBM Corporation17
Thank You !
© 2015 IBM Corporation18
Backup
© 2015 IBM Corporation19
COSMOS
 Funding: EU FP7 at level of 2PY x 3 years
 Started: Sept 2013
 Coordinator: ATOS
 Technical partners: IBM, NTUA, Univ Surrey, Siemens, ATOS
 Use Case Partners: Hildebrand/Camden, EMT Madrid Bus Transport/Madrid
Council, III Taiwan – Smart Cities use cases
 Project Vision: Enable ‘things’ to interact with each other based on shared
experience, trust, reputation etc.
© 2015 IBM Corporation20
IBM Bluemix Data Analytics for IoT Architecture
© 2015 IBM Corporation21
 What is it?
– Apache Kafka is a high throughput distributed publish/subscribe messaging system.
– Secor is an open source tool developed by Pinterest, which aggregates Kafka messages
and saves as an S3 object.
 What extensions were needed?
– Support for OpenStack Swift as a Secor target. We also added support for Parquet
format and annotating objects with metadata search to support indexing.
 What is the value of integration with Swift?
– Enables bringing new data and applications to Swift which is an open source solution.
Parquet and metadata search enable improved performance for batch analytics.
 Status
– We contributed OpenStack Swift support to the Secor community and it is now part of
Secor.
Secor
Kafka + Secor
© 2015 IBM Corporation22
Parquet
 What is it?
– A column based semi-structured, schema-based storage format supported by Hadoop
and Spark. Enables column-wise compression and projection pushdown.
 What integration is needed?
– Since Swift is now part of the Hadoop ecosystem, no additional integration is needed.
Data in Swift can be stored in Apache Parquet format, inheriting associated advantages.
 Status
– Spark SQL supports storing tabular data in Parquet format in Hadoop compatible storage
systems such as Swift.
© 2015 IBM Corporation23
elasticsearch
 What is it?
– A distributed, scalable, real-time search and analytics engine, built on Apache Lucene.
 What integration is needed?
– Index object metadata allowing search for objects by attributes.
 What is the value of integration with Swift
– Use search to select objects for further processing, e.g., relevant objects for analytics.
- Note that S3 does not yet have native search according to metadata.
 Status
– The IBM SoftLayer object service includes a basic implementation of metadata search;
At IBM Research, we added extensions such as data type support and range searches.
Power of data. Simplicity of design. Speed of innovation.
IBM Spark
For up-to-date information and news
about the Spark and the Spark Technology Center,
Sign up for our newsletter
at www.spark.tc

More Related Content

PPTX
How Spark Enables the Internet of Things- Paula Ta-Shma
PDF
Event Streaming Architecture for Industry 4.0 - Abdelkrim Hadjidj & Jan Kuni...
PDF
Spark Streaming and IoT by Mike Freedman
PDF
Automated Production Ready ML at Scale
PDF
Real-Time Analytics and Actions Across Large Data Sets with Apache Spark
PDF
Flink for Everyone: Self Service Data Analytics with StreamPipes - Philipp Ze...
PPTX
[Strata] Sparkta
PDF
Spark and Hadoop at Production Scale-(Anil Gadre, MapR)
How Spark Enables the Internet of Things- Paula Ta-Shma
Event Streaming Architecture for Industry 4.0 - Abdelkrim Hadjidj & Jan Kuni...
Spark Streaming and IoT by Mike Freedman
Automated Production Ready ML at Scale
Real-Time Analytics and Actions Across Large Data Sets with Apache Spark
Flink for Everyone: Self Service Data Analytics with StreamPipes - Philipp Ze...
[Strata] Sparkta
Spark and Hadoop at Production Scale-(Anil Gadre, MapR)

What's hot (20)

PDF
Breaking Down Analytical and Computational Barriers Across the Energy Industr...
PDF
Real-Time Fraud Detection at Scale—Integrating Real-Time Deep-Link Graph Anal...
PDF
Data Warehousing with Spark Streaming at Zalando
PPTX
Data Science lifecycle with Apache Zeppelin and Spark by Moonsoo Lee
PDF
Time Series Analysis Using an Event Streaming Platform
PDF
Headaches and Breakthroughs in Building Continuous Applications
PDF
Regulatory Reporting of Asset Trading Using Apache Spark-(Sudipto Shankar Das...
PPTX
Self-Service Analytics on Hadoop: Lessons Learned
PDF
Deep Learning at Scale
PPTX
Evolving Beyond the Data Lake: A Story of Wind and Rain
PDF
Cloud Experience: Data-driven Applications Made Simple and Fast
PDF
Power Your Delta Lake with Streaming Transactional Changes
PDF
Hadoop and Spark-Perfect Together-(Arun C. Murthy, Hortonworks)
PPTX
Ronan Corkery, kdb+ developer at Kx Systems: “Kdb+: How Wall Street Tech can ...
PPTX
Zero Downtime App Deployment using Hadoop
PDF
Improving Traffic Prediction Using Weather Data with Ramya Raghavendra
PDF
Event Driven Architecture: Mistakes, I've made a few...
PDF
Spark at Airbnb
PDF
Unifying Streaming and Historical Telemetry Data For Real-time Performance Re...
PPTX
Speed layer : Real time views in LAMBDA architecture
Breaking Down Analytical and Computational Barriers Across the Energy Industr...
Real-Time Fraud Detection at Scale—Integrating Real-Time Deep-Link Graph Anal...
Data Warehousing with Spark Streaming at Zalando
Data Science lifecycle with Apache Zeppelin and Spark by Moonsoo Lee
Time Series Analysis Using an Event Streaming Platform
Headaches and Breakthroughs in Building Continuous Applications
Regulatory Reporting of Asset Trading Using Apache Spark-(Sudipto Shankar Das...
Self-Service Analytics on Hadoop: Lessons Learned
Deep Learning at Scale
Evolving Beyond the Data Lake: A Story of Wind and Rain
Cloud Experience: Data-driven Applications Made Simple and Fast
Power Your Delta Lake with Streaming Transactional Changes
Hadoop and Spark-Perfect Together-(Arun C. Murthy, Hortonworks)
Ronan Corkery, kdb+ developer at Kx Systems: “Kdb+: How Wall Street Tech can ...
Zero Downtime App Deployment using Hadoop
Improving Traffic Prediction Using Weather Data with Ramya Raghavendra
Event Driven Architecture: Mistakes, I've made a few...
Spark at Airbnb
Unifying Streaming and Historical Telemetry Data For Real-time Performance Re...
Speed layer : Real time views in LAMBDA architecture
Ad

Viewers also liked (20)

PPTX
SSN2013 Demo: tablet based visualization of transport data with SPARQLStream
PDF
Atos ecarga brochure EN
DOC
CV - P.A. Shenton
PPTX
Implementing a Smart City through a stepwise approach
PDF
iris magazine
PDF
OpenStack Summit Austin 2016 v1.3
PDF
HPNFVの取組みとMWC2015 – OpenStack最新情報セミナー 2015年4月
PDF
Ubiwhere's Annual Report 2016 - Volume 1
PPTX
FIWARE for Smart Industry
PPTX
Spark Streaming and Expert Systems
PDF
Data Pioneers - Roland Haeve (Atos Nederland) - Big data in organisaties
PDF
Effective IoT System on Openstack
PDF
Luciano Resende's keynote at Apache big data conference
PDF
How mentoring can help you start contributing to open source
PDF
Smart City Technologies in Beijing
PDF
Writing Apache Spark and Apache Flink Applications Using Apache Bahir
PDF
SystemML - Declarative Machine Learning
PPTX
IoT in agri-food
PPTX
Spark Streaming the Industrial IoT
PPTX
Real Time Data Processing Using Spark Streaming
SSN2013 Demo: tablet based visualization of transport data with SPARQLStream
Atos ecarga brochure EN
CV - P.A. Shenton
Implementing a Smart City through a stepwise approach
iris magazine
OpenStack Summit Austin 2016 v1.3
HPNFVの取組みとMWC2015 – OpenStack最新情報セミナー 2015年4月
Ubiwhere's Annual Report 2016 - Volume 1
FIWARE for Smart Industry
Spark Streaming and Expert Systems
Data Pioneers - Roland Haeve (Atos Nederland) - Big data in organisaties
Effective IoT System on Openstack
Luciano Resende's keynote at Apache big data conference
How mentoring can help you start contributing to open source
Smart City Technologies in Beijing
Writing Apache Spark and Apache Flink Applications Using Apache Bahir
SystemML - Declarative Machine Learning
IoT in agri-food
Spark Streaming the Industrial IoT
Real Time Data Processing Using Spark Streaming
Ad

Similar to How Spark Enables the Internet of Things: Efficient Integration of Multiple Spark Components for Smart City Use Cases (20)

PPTX
COSMOS Data Analytics Architecture
PPT
Ibm iot overview
PDF
Getting insights from IoT data with Apache Spark and Apache Bahir
PPTX
Internet of Things & Big Data
PPTX
Introduction to ibm internet of things foundation
PDF
Driving IT: Internet of Things
PPTX
Using Watson to build Cognitive IoT Apps on Bluemix
 
PPTX
What happens in the Innovation of Things?
PPTX
Streaming Analytics for IoT with Apache Spark
PDF
Internet of Things (IoT) and Big Data
PDF
The Internet of Things - IBM
PDF
IOT DATA MANAGEMENT REQUIREMENTS AND ARCHITECTURE OF IOT.pdf
PDF
IOT_MODULE_4.pd easy to understand notes
PDF
Streaming Sensor Data Slides_Virender
PPT
Internet of Things and IBM
PPTX
Unit-1_Artificial Intelligence & Internet of Things
PPTX
IoT Data as Service with Hadoop
PPTX
A modern IoT data processing toolbox
PDF
Real-time DeepLearning on IoT Sensor Data
PDF
Building iot applications with Apache Spark and Apache Bahir
COSMOS Data Analytics Architecture
Ibm iot overview
Getting insights from IoT data with Apache Spark and Apache Bahir
Internet of Things & Big Data
Introduction to ibm internet of things foundation
Driving IT: Internet of Things
Using Watson to build Cognitive IoT Apps on Bluemix
 
What happens in the Innovation of Things?
Streaming Analytics for IoT with Apache Spark
Internet of Things (IoT) and Big Data
The Internet of Things - IBM
IOT DATA MANAGEMENT REQUIREMENTS AND ARCHITECTURE OF IOT.pdf
IOT_MODULE_4.pd easy to understand notes
Streaming Sensor Data Slides_Virender
Internet of Things and IBM
Unit-1_Artificial Intelligence & Internet of Things
IoT Data as Service with Hadoop
A modern IoT data processing toolbox
Real-time DeepLearning on IoT Sensor Data
Building iot applications with Apache Spark and Apache Bahir

More from sparktc (13)

PDF
Apache Spark™ Applications the Easy Way - Pierre Borckmans
PPTX
Hyperparameter Optimization - Sven Hafeneger
PDF
Data Science Hub & the Data Science Community - Philippe Van Impe
PDF
Data Science and Beer - Kris peeters
PDF
Holden Karau - Spark ML for Custom Models
PDF
Creating an end-to-end Recommender System with Apache Spark and Elasticsearch...
PDF
DeepLearning4J and Spark: Successes and Challenges - François Garillot
PDF
DeepLearning4J and Spark: Successes and Challenges - François Garillot
PPTX
Building Custom
Machine Learning Algorithms
with Apache SystemML
PPTX
The Internet of Everywhere — How The Weather Company Scales
PPTX
GPU Support in Spark and GPU/CPU Mixed Resource Scheduling at Production Scale
PDF
STC Design - Engage
PDF
Spark Summit EU: IBM Keynote
Apache Spark™ Applications the Easy Way - Pierre Borckmans
Hyperparameter Optimization - Sven Hafeneger
Data Science Hub & the Data Science Community - Philippe Van Impe
Data Science and Beer - Kris peeters
Holden Karau - Spark ML for Custom Models
Creating an end-to-end Recommender System with Apache Spark and Elasticsearch...
DeepLearning4J and Spark: Successes and Challenges - François Garillot
DeepLearning4J and Spark: Successes and Challenges - François Garillot
Building Custom
Machine Learning Algorithms
with Apache SystemML
The Internet of Everywhere — How The Weather Company Scales
GPU Support in Spark and GPU/CPU Mixed Resource Scheduling at Production Scale
STC Design - Engage
Spark Summit EU: IBM Keynote

Recently uploaded (20)

PPTX
Programs and apps: productivity, graphics, security and other tools
PDF
Encapsulation_ Review paper, used for researhc scholars
PPTX
ACSFv1EN-58255 AWS Academy Cloud Security Foundations.pptx
PDF
Peak of Data & AI Encore- AI for Metadata and Smarter Workflows
PPT
Teaching material agriculture food technology
PPTX
20250228 LYD VKU AI Blended-Learning.pptx
PDF
Architecting across the Boundaries of two Complex Domains - Healthcare & Tech...
PPTX
VMware vSphere Foundation How to Sell Presentation-Ver1.4-2-14-2024.pptx
PDF
MIND Revenue Release Quarter 2 2025 Press Release
PDF
Chapter 3 Spatial Domain Image Processing.pdf
PPTX
MYSQL Presentation for SQL database connectivity
PPTX
Big Data Technologies - Introduction.pptx
PPTX
KOM of Painting work and Equipment Insulation REV00 update 25-dec.pptx
PDF
Assigned Numbers - 2025 - Bluetooth® Document
PDF
Network Security Unit 5.pdf for BCA BBA.
PPTX
Machine Learning_overview_presentation.pptx
PPTX
A Presentation on Artificial Intelligence
PDF
Spectral efficient network and resource selection model in 5G networks
PDF
Per capita expenditure prediction using model stacking based on satellite ima...
PPT
“AI and Expert System Decision Support & Business Intelligence Systems”
Programs and apps: productivity, graphics, security and other tools
Encapsulation_ Review paper, used for researhc scholars
ACSFv1EN-58255 AWS Academy Cloud Security Foundations.pptx
Peak of Data & AI Encore- AI for Metadata and Smarter Workflows
Teaching material agriculture food technology
20250228 LYD VKU AI Blended-Learning.pptx
Architecting across the Boundaries of two Complex Domains - Healthcare & Tech...
VMware vSphere Foundation How to Sell Presentation-Ver1.4-2-14-2024.pptx
MIND Revenue Release Quarter 2 2025 Press Release
Chapter 3 Spatial Domain Image Processing.pdf
MYSQL Presentation for SQL database connectivity
Big Data Technologies - Introduction.pptx
KOM of Painting work and Equipment Insulation REV00 update 25-dec.pptx
Assigned Numbers - 2025 - Bluetooth® Document
Network Security Unit 5.pdf for BCA BBA.
Machine Learning_overview_presentation.pptx
A Presentation on Artificial Intelligence
Spectral efficient network and resource selection model in 5G networks
Per capita expenditure prediction using model stacking based on satellite ima...
“AI and Expert System Decision Support & Business Intelligence Systems”

How Spark Enables the Internet of Things: Efficient Integration of Multiple Spark Components for Smart City Use Cases

  • 1. © 2015 IBM Corporation How Spark Enables the Internet of Things: Efficient Integration of Multiple Spark Components for Smart City Use Cases Paula Ta-Shma IBM Research [email protected] Joint work with: Adnan Akbar, University of Surrey Michael Factor, IBM Research Guy Hadash, IBM Research Juan Sancho, ATOS
  • 2. © 2015 IBM Corporation2 The Evolution of Data Collection Internet of Things
  • 3. © 2015 IBM Corporation3 2005 2012 2017 The IoT market will grow to $1.7 trillion in 2020 (IDC) By 2020 the number of networked devices will be 30 billion (IDC), more than 4 times the entire global population IoT : The Biggest Big Data GlobalDataVolumeinExabytes 2005 2012 2017
  • 4. © 2015 IBM Corporation4 EMT Madrid Bus Company Needs to Make Decisions According to Current and Predicted Future Traffic State  The Problem – EMT needs to staff control rooms where employees manually analyze Madrid traffic sensor output. This can be slow and costly.  Objective – Improve customer satisfaction and reduce costs by responding more efficiently and quickly to real- time traffic problems  Approach – Monitor data from up to 3000 sensors. React by rerouting buses, modifying traffic lights, etc., based upon knowledge derived from historical data Today Tomorrow
  • 5. © 2015 IBM Corporation5 1. Collect historical time series data – Collect data from devices – Aggregate into objects – Index and/or partition Generic IoT Architecture – Data Flow Secor IoT Swift
  • 6. © 2015 IBM Corporation6 2. Learn patterns in data – May be time/location dependent – Generate thresholds, classifiers etc. Generic IoT Architecture – Data Flow Secor Swift
  • 7. © 2015 IBM Corporation7 IoT 3. Apply what was learned on real time data stream – Take action Generic IoT Architecture – Data Flow Secor CEP Swift
  • 8. © 2015 IBM Corporation8 How Spark Enables the Internet of Things: Efficient Integration of Multiple Spark Components for Smart City Use Cases IoT Generic IoT Architecture – Data Flow CEP Secor Swift Green Flows: Real time Purple Flows: Batch
  • 9. © 2015 IBM Corporation9 Aim: Collect historical timeseries data for analysis – Continuously collect data from up to 3000 Madrid council traffic sensors via web service - Data includes traffic speeds and intensities, updated every 5 mins – Push the messages to Kafka – Use Secor to aggregate multiple messages into a single Swift object - According to policy, e.g., every 60 mins - Possibly partition the data, e.g. according to date - Convert to Parquet format - Annotate with metadata, e.g., min/max speed, start/end time – Index Swift objects according to their metadata using ElasticSearch Secor Swift IoT Architecture – Madrid Traffic – Ingestion Flow IoT
  • 10. © 2015 IBM Corporation10 IoT Architecture – Madrid Traffic – Data Access Aim: Access data efficiently and cost effectively – Store IoT data in OpenStack Swift object storage - Open source, low cost deployment, and highly scalable – Parquet data is accessible via Spark SQL – Optimized predicate pushdown - Custom Spark SQL external data source driver - Uses object metadata indexes - Searches for Swift objects whose min/max values overlap requested ranges Get all data for morning traffic: SELECT codigo, intensidad, velocidad FROM madridtraffic WHERE tf >= '08:00:00' AND tf <= '12:00:00' Brute force method 13245 Swift requests Optimized predicate pushdown 616 Swift requests 21.5 times improvement Swift
  • 11. © 2015 IBM Corporation11 IoT Architecture – Madrid Traffic – Machine Learning Aim: Learn to differentiate between ‘good’ and ‘bad’ traffic – Depends on context - Time (morning/evening), Day (weekday/weekend) - Location – Use Spark MLlib k-means clustering – Produce threshold values for real-time decision making – Re-run algorithm when quality of clusters decreases - Can use silhouette index to measure quality Swift
  • 12. © 2015 IBM Corporation12 IoT Architecture – Madrid Traffic – Machine Learning Event Detection: • Use Spark MLlib k-means clustering to separate data into 2 clusters • Find the midpoint between the 2 cluster centres • Use this midpoint to generate the thresholds • Repeat for each context e.g. time period (morning, afternoon, evening, night) Anomaly Detection: • Use a single cluster and define an anomaly to be further than a certain distance from the cluster centre Morning Traffic on Weekdays
  • 13. © 2015 IBM Corporation13 IoT Architecture – Madrid Traffic – Real Time Decision Making Aim: Respond in real time to traffic conditions – Use Complex Event Processing (CEP) approach - Rule based - Process events record by record - CEP rules are typically defined manually but in many cases it is difficult to get them right - We automate this process and make it smart - uCEP has a small footprint, can be run at the edge CEP IoT Work in Progress Proactive approach: • Use Spark streaming linear regression to predict traffic behavior (e.g. speed, intensity) for near future • Apply CEP on predicted data • Respond pro-actively to predicted events such as traffic congestion – e.g. EMT can proactively re- route buses
  • 14. © 2015 IBM Corporation14 Demo
  • 15. © 2015 IBM Corporation15 Our Architecture Applies to Many IoT Use Cases  Energy/utilities – Anomaly detection - Pipe leakage - Appliance malfunction – Occupancy detection  Healthcare – Healthcare patient monitoring/alert/response  Insurance – Driver behavior and location monitoring  Transportation – Connected vehicles, engine diagnostics, automated service scheduling  Logistics – Goods tracking, sensitive goods management
  • 16. © 2015 IBM Corporation Data Sources Apache Spark Node-RED Secor Message Bus Data Storage Data Analytics Data Visualization Freeboard Dashboard Object Storage 16 MQTT The Madrid Traffic Use Case on IBM Bluemix Madrid Traffic Sensors Joint work with Naeem Altaf and team
  • 17. © 2015 IBM Corporation17 Thank You !
  • 18. © 2015 IBM Corporation18 Backup
  • 19. © 2015 IBM Corporation19 COSMOS  Funding: EU FP7 at level of 2PY x 3 years  Started: Sept 2013  Coordinator: ATOS  Technical partners: IBM, NTUA, Univ Surrey, Siemens, ATOS  Use Case Partners: Hildebrand/Camden, EMT Madrid Bus Transport/Madrid Council, III Taiwan – Smart Cities use cases  Project Vision: Enable ‘things’ to interact with each other based on shared experience, trust, reputation etc.
  • 20. © 2015 IBM Corporation20 IBM Bluemix Data Analytics for IoT Architecture
  • 21. © 2015 IBM Corporation21  What is it? – Apache Kafka is a high throughput distributed publish/subscribe messaging system. – Secor is an open source tool developed by Pinterest, which aggregates Kafka messages and saves as an S3 object.  What extensions were needed? – Support for OpenStack Swift as a Secor target. We also added support for Parquet format and annotating objects with metadata search to support indexing.  What is the value of integration with Swift? – Enables bringing new data and applications to Swift which is an open source solution. Parquet and metadata search enable improved performance for batch analytics.  Status – We contributed OpenStack Swift support to the Secor community and it is now part of Secor. Secor Kafka + Secor
  • 22. © 2015 IBM Corporation22 Parquet  What is it? – A column based semi-structured, schema-based storage format supported by Hadoop and Spark. Enables column-wise compression and projection pushdown.  What integration is needed? – Since Swift is now part of the Hadoop ecosystem, no additional integration is needed. Data in Swift can be stored in Apache Parquet format, inheriting associated advantages.  Status – Spark SQL supports storing tabular data in Parquet format in Hadoop compatible storage systems such as Swift.
  • 23. © 2015 IBM Corporation23 elasticsearch  What is it? – A distributed, scalable, real-time search and analytics engine, built on Apache Lucene.  What integration is needed? – Index object metadata allowing search for objects by attributes.  What is the value of integration with Swift – Use search to select objects for further processing, e.g., relevant objects for analytics. - Note that S3 does not yet have native search according to metadata.  Status – The IBM SoftLayer object service includes a basic implementation of metadata search; At IBM Research, we added extensions such as data type support and range searches.
  • 24. Power of data. Simplicity of design. Speed of innovation. IBM Spark For up-to-date information and news about the Spark and the Spark Technology Center, Sign up for our newsletter at www.spark.tc