SlideShare a Scribd company logo
Scalable Realtime
Analytics with
declarative, SQL like,
Complex Event
Processing Scripts
Srinath Perera
Director, Research WSO2
Apache Member
(@srinath_perera)
srinath@wso2.com
(Batch) Analytics
Scientists are doing this for 25 year with
MPI (1991) on special Hardware
Took off with Google’s MapReduce
paper (2004), Apache Hadoop, Hive and
whole eco system created.
It was successful, So we are here!!
But, processing takes time.
Value of Some Insights degrade Fast!
For some usecases ( e.g. stock markets, traffic, surveillance, patient
monitoring) the value of insights degrade very quickly with time.
- E.g. stock markets and speed of light
We need technology that can produce
outputs fast
- Static Queries, but need very fast output
(Alerts, Realtime control)
- Dynamic and Interactive Queries ( Data
exploration)
History
Realtime Analytics are not new either!!
- Active Databases (2000+)
- Stream processing (Aurora, Borealis (2005+)
and later Storm)
- Distributed Streaming Operators (e.g.
Database research topic around 2005)
- CEP vendor roadmap ( from
http://www.complexevents.com/2014/12/03/cep-
tooling-market-survey-2014/)
Scalable Realtime Analytics with declarative SQL like Complex Event Processing Scripts
Realtime AnalyticsTools
I. Stream Processing
Program a set of processors and wire them up, data flows though
the graph.
A middleware framework handles data flow, distribution, and fault
tolerance (e.g. Apache Storm, Samza)
Processors may be in the same machine or multiple machines
II. Complex Event Processing
III. Micro Batch
Process data in small batches, and
then combine results for final results
(e.g. Spark)
Works for simple aggregates, but
tricky to do this for complex
operations (e.g. Event Sequences)
Can do it with MapReduce as well if
the deadlines are not too tight.
IV. OLAP Style In Memory Computing
Usually done to support interactive
queries
Index data to make them them
readily accessible so you can respond
to queries fast. (e.g. Apache Drill)
Tools like Druid, VoltDB and SAP
Hana can do this with all data in
memory to make things really fast.
Realtime Analytics Patterns
Simple counting (e.g. failure count)
Counting with Windows ( e.g. failure count every hour)
Preprocessing: filtering, transformations (e.g. data cleanup)
Alerts , thresholds (e.g. Alarm on high temperature)
Data Correlation, Detect missing events, detecting erroneous data
(e.g. detecting failed sensors)
Joining event streams (e.g. detect a hit on soccer ball)
Merge with data in a database, collect, update data conditionally
Realtime Analytics Patterns (contd.)
Detecting Event Sequence Patterns (e.g. small transaction followed
by large transaction)
Tracking - follow some related entity’s state in space, time etc. (e.g.
location of airline baggage, vehicle, tracking wild life)
 Detect trends – Rise, turn, fall, Outliers, Complex trends like triple
bottom etc., (e.g. algorithmic trading, SLA, load balancing)
Learning a Model (e.g. Predictive maintenance)
Predicting next value and corrective actions (e.g. automated car)
Apache Hive
A SQL like data processing language
Since many understand SQL, Hive
made large scale data processing Big
Data accessible to many
Expressive, short, and sweet.
Define core operations that covers 90%
of problems
Lets experts dig in when they like!
(Batch Processing, Hive)
(Realtime Analytics, X)
What is X?
CEP = SQL for Realtime Analytics
Easy to follow from SQL
Expressive, short, and sweet.
Define core operations that covers 90% of
problems
Lets experts dig in when they like!
Lets look at the core operations.
Operators: Filters
Assume a temperature stream
Here weather:convertFtoC() is a
user defined function. They are
used to extend the language.
define stream TempStream (ts long, temp double);
from TempratureStream [weather:convertFtoC(temp) > 30.0)
and roomNo != 2043]
select roomNo, temp
insert into HotRoomsStream ;
Usecases:
- Alerts , thresholds (e.g. Alarm on
high temperature)
- Preprocessing: filtering,
transformations (e.g. data cleanup)
Operators:Windows and Aggregation
Support many window types
- Batch Windows, Sliding windows, Custom windows
Usecases
- Simple counting (e.g. failure count)
- Counting with Windows ( e.g. failure count every hour)
from TempratureStream#window.time(1 min)
select roomNo, avg(temp) as avgTemp
insert into HotRoomsStream ;
Operators: Patterns
Models a followed by relation: e.g.
event A followed by event B
Very powerful tool for tracking
and detecting patterns
from every (a1 = TempratureStream)
-> a2 = TempratureStream [temp > a1.temp + 5 ]
within 1 day
select a2.ts as ts, a2.temp – a1.temp as diff
insert into HotDayAlertStream;
Usecases
- Detecting Event Sequence Patterns
- Tracking
- Detect trends
Operators: Joins
Join two data streams based on a condition and windows
Usecases
- Data Correlation, Detect missing events, detecting erroneous data
- Joining event streams
from TempStream[temp > 30.0]#window.time(1 min) as T
join RegulatorStream[isOn == false]#window.length(1) as R on
T.roomNo == R.roomNo
select T.roomNo, R.deviceID, ‘start’ as action insert into
RegulatorActionStream
Operators:Access Data from the Disk
Event tables allow users to map a database to a window and join a
data stream with the window
Usecases
- Merge with data in a database, collect, update data conditionally
define stream TempStream (ts long, temp double);
define table HistTempTable(day long, avgT double);
from TempStream #window.length(1) join OldTempTable
on getDayOfYear(ts) == HistTempTable.day && ts > avgT
select ts, temp
insert into PurchaseUserStream ;
Revisit Patterns
Predictive Analytics
 Build models and use them with
WSO2 CEP, BAM and ESB using
upcoming WSO2 Machine Learner
Product ( 2015 Q2)
 Build model using R, export them as
PMML, and use within WSO2 CEP
 Call R Scripts from CEP queries
 Regression and Anomaly Detection
Operators in CEP
Case Study: Realtime Soccer Analysis
Watch at: https://www.youtube.com/watch?v=nRI6buQ0NOM
TFLTraffic Analysis
Built using TFL
( Transport for
London) open data
feeds.
http://goo.gl/04tX6k
http://goo.gl/9xNiCm
Great, Does it Scale?
Idea 1: Network of CEP Nodes
For scaling, we arrange CEP
processing nodes in a graph like with
stream processing.
The Graph can be implemented
using an stream processing engine
like Apache Storm
Idea II: Compile SQL like Queries to a
Network of CEP Nodes
from TempStream[temp > 33]
insert into HighTempStream;
from HighTempStream#window(1h)
select max(temp)as max
insert into HourlyMaxTempStream;

How do We partition the Data to scale
up the Analysis?
Lets follow MapReduce
Map Reduce does not scale itself, it asks users to break
the problem to many small independent problems.
Idea III: Let the Users specify Parallelism
Language include parallel constructs:
partitions, pipelines, distributed
operators
Assign each partition to a different
node, and partition the data accordingly
define partition on TempStream.region {
from TempStream[temp > 33]
insert into HighTempStream;
}
from HighTempStream#window(1h)
select max(temp)as max
insert into HourlyMaxTempStream;
Handling Ordering
When the data processed in
parallel, output might be generated
out of order.
Due to lack of a global time, we
cannot trigger windows and other
time sensitive constructs
Solution: the current time needs to
be propagated though the graph
Putting EverythingTogether
WSO2 CEP & Big Data Platform
CEP = SQL for Realtime Analytics
Easy to follow from SQL
Expressive, short, sweet and fast!!
Define core operations that covers 90% of
problems
Lets experts dig in when they like!
And it Scales!!
Questions?
Visit us at Booth 1025http://wso2.com/landing/strata-
hadoop-world-ca-2015/
Ad

Recommended

Solving DEBS Grand Challenge with WSO2 CEP
Solving DEBS Grand Challenge with WSO2 CEP
Srinath Perera
 
ACM DEBS Grand Challenge: Continuous Analytics on Geospatial Data Streams wit...
ACM DEBS Grand Challenge: Continuous Analytics on Geospatial Data Streams wit...
Srinath Perera
 
Distributed Stream Processing - Spark Summit East 2017
Distributed Stream Processing - Spark Summit East 2017
Petr Zapletal
 
Learning From the Past: Automated Rule Generation for CEP - DEBS 2014
Learning From the Past: Automated Rule Generation for CEP - DEBS 2014
Alessandro Margara
 
Chris Hillman – Beyond Mapreduce Scientific Data Processing in Real-time
Chris Hillman – Beyond Mapreduce Scientific Data Processing in Real-time
Flink Forward
 
Realtime Risk Management Using Kafka, Python, and Spark Streaming by Nick Evans
Realtime Risk Management Using Kafka, Python, and Spark Streaming by Nick Evans
Spark Summit
 
Scalable Event Processing with WSO2CEP @ WSO2Con2015eu
Scalable Event Processing with WSO2CEP @ WSO2Con2015eu
Sriskandarajah Suhothayan
 
Distributed Real-Time Stream Processing: Why and How 2.0
Distributed Real-Time Stream Processing: Why and How 2.0
Petr Zapletal
 
So you think you can stream.pptx
So you think you can stream.pptx
Prakash Chockalingam
 
Advanced Streaming Analytics with Apache Flink and Apache Kafka, Stephan Ewen
Advanced Streaming Analytics with Apache Flink and Apache Kafka, Stephan Ewen
confluent
 
Need for Time series Database
Need for Time series Database
Pramit Choudhary
 
Apache Beam (incubating)
Apache Beam (incubating)
Apache Apex
 
Reactive mistakes reactive nyc
Reactive mistakes reactive nyc
Petr Zapletal
 
Spark streaming: Best Practices
Spark streaming: Best Practices
Prakash Chockalingam
 
Apache Flink: API, runtime, and project roadmap
Apache Flink: API, runtime, and project roadmap
Kostas Tzoumas
 
Databricks clusters in autopilot mode
Databricks clusters in autopilot mode
Prakash Chockalingam
 
Spark Streaming into context
Spark Streaming into context
David Martínez Rego
 
Streaming SQL (at FlinkForward, Berlin, 2016/09/12)
Streaming SQL (at FlinkForward, Berlin, 2016/09/12)
Julian Hyde
 
QConSF 2014 talk on Netflix Mantis, a stream processing system
QConSF 2014 talk on Netflix Mantis, a stream processing system
Danny Yuan
 
Always On: Building Highly Available Applications on Cassandra
Always On: Building Highly Available Applications on Cassandra
Robbie Strickland
 
Building large-scale analytics platform with Storm, Kafka and Cassandra - NYC...
Building large-scale analytics platform with Storm, Kafka and Cassandra - NYC...
Alexey Kharlamov
 
Predictive Maintenance with Deep Learning and Apache Flink
Predictive Maintenance with Deep Learning and Apache Flink
Dongwon Kim
 
Spark Summit EU talk by Qifan Pu
Spark Summit EU talk by Qifan Pu
Spark Summit
 
Virtual Flink Forward 2020: Cogynt: Flink without code - Samantha Chan, Aslam...
Virtual Flink Forward 2020: Cogynt: Flink without code - Samantha Chan, Aslam...
Flink Forward
 
New Analytics Toolbox DevNexus 2015
New Analytics Toolbox DevNexus 2015
Robbie Strickland
 
Cassandra as event sourced journal for big data analytics
Cassandra as event sourced journal for big data analytics
Anirvan Chakraborty
 
Arbitrary Stateful Aggregations using Structured Streaming in Apache Spark
Arbitrary Stateful Aggregations using Structured Streaming in Apache Spark
Databricks
 
Apache Beam: A unified model for batch and stream processing data
Apache Beam: A unified model for batch and stream processing data
DataWorks Summit/Hadoop Summit
 
ACM DEBS 2015: Realtime Streaming Analytics Patterns
ACM DEBS 2015: Realtime Streaming Analytics Patterns
Srinath Perera
 
Introduction to Large Scale Data Analysis with WSO2 Analytics Platform
Introduction to Large Scale Data Analysis with WSO2 Analytics Platform
Srinath Perera
 

More Related Content

What's hot (20)

So you think you can stream.pptx
So you think you can stream.pptx
Prakash Chockalingam
 
Advanced Streaming Analytics with Apache Flink and Apache Kafka, Stephan Ewen
Advanced Streaming Analytics with Apache Flink and Apache Kafka, Stephan Ewen
confluent
 
Need for Time series Database
Need for Time series Database
Pramit Choudhary
 
Apache Beam (incubating)
Apache Beam (incubating)
Apache Apex
 
Reactive mistakes reactive nyc
Reactive mistakes reactive nyc
Petr Zapletal
 
Spark streaming: Best Practices
Spark streaming: Best Practices
Prakash Chockalingam
 
Apache Flink: API, runtime, and project roadmap
Apache Flink: API, runtime, and project roadmap
Kostas Tzoumas
 
Databricks clusters in autopilot mode
Databricks clusters in autopilot mode
Prakash Chockalingam
 
Spark Streaming into context
Spark Streaming into context
David Martínez Rego
 
Streaming SQL (at FlinkForward, Berlin, 2016/09/12)
Streaming SQL (at FlinkForward, Berlin, 2016/09/12)
Julian Hyde
 
QConSF 2014 talk on Netflix Mantis, a stream processing system
QConSF 2014 talk on Netflix Mantis, a stream processing system
Danny Yuan
 
Always On: Building Highly Available Applications on Cassandra
Always On: Building Highly Available Applications on Cassandra
Robbie Strickland
 
Building large-scale analytics platform with Storm, Kafka and Cassandra - NYC...
Building large-scale analytics platform with Storm, Kafka and Cassandra - NYC...
Alexey Kharlamov
 
Predictive Maintenance with Deep Learning and Apache Flink
Predictive Maintenance with Deep Learning and Apache Flink
Dongwon Kim
 
Spark Summit EU talk by Qifan Pu
Spark Summit EU talk by Qifan Pu
Spark Summit
 
Virtual Flink Forward 2020: Cogynt: Flink without code - Samantha Chan, Aslam...
Virtual Flink Forward 2020: Cogynt: Flink without code - Samantha Chan, Aslam...
Flink Forward
 
New Analytics Toolbox DevNexus 2015
New Analytics Toolbox DevNexus 2015
Robbie Strickland
 
Cassandra as event sourced journal for big data analytics
Cassandra as event sourced journal for big data analytics
Anirvan Chakraborty
 
Arbitrary Stateful Aggregations using Structured Streaming in Apache Spark
Arbitrary Stateful Aggregations using Structured Streaming in Apache Spark
Databricks
 
Apache Beam: A unified model for batch and stream processing data
Apache Beam: A unified model for batch and stream processing data
DataWorks Summit/Hadoop Summit
 
Advanced Streaming Analytics with Apache Flink and Apache Kafka, Stephan Ewen
Advanced Streaming Analytics with Apache Flink and Apache Kafka, Stephan Ewen
confluent
 
Need for Time series Database
Need for Time series Database
Pramit Choudhary
 
Apache Beam (incubating)
Apache Beam (incubating)
Apache Apex
 
Reactive mistakes reactive nyc
Reactive mistakes reactive nyc
Petr Zapletal
 
Apache Flink: API, runtime, and project roadmap
Apache Flink: API, runtime, and project roadmap
Kostas Tzoumas
 
Databricks clusters in autopilot mode
Databricks clusters in autopilot mode
Prakash Chockalingam
 
Streaming SQL (at FlinkForward, Berlin, 2016/09/12)
Streaming SQL (at FlinkForward, Berlin, 2016/09/12)
Julian Hyde
 
QConSF 2014 talk on Netflix Mantis, a stream processing system
QConSF 2014 talk on Netflix Mantis, a stream processing system
Danny Yuan
 
Always On: Building Highly Available Applications on Cassandra
Always On: Building Highly Available Applications on Cassandra
Robbie Strickland
 
Building large-scale analytics platform with Storm, Kafka and Cassandra - NYC...
Building large-scale analytics platform with Storm, Kafka and Cassandra - NYC...
Alexey Kharlamov
 
Predictive Maintenance with Deep Learning and Apache Flink
Predictive Maintenance with Deep Learning and Apache Flink
Dongwon Kim
 
Spark Summit EU talk by Qifan Pu
Spark Summit EU talk by Qifan Pu
Spark Summit
 
Virtual Flink Forward 2020: Cogynt: Flink without code - Samantha Chan, Aslam...
Virtual Flink Forward 2020: Cogynt: Flink without code - Samantha Chan, Aslam...
Flink Forward
 
New Analytics Toolbox DevNexus 2015
New Analytics Toolbox DevNexus 2015
Robbie Strickland
 
Cassandra as event sourced journal for big data analytics
Cassandra as event sourced journal for big data analytics
Anirvan Chakraborty
 
Arbitrary Stateful Aggregations using Structured Streaming in Apache Spark
Arbitrary Stateful Aggregations using Structured Streaming in Apache Spark
Databricks
 
Apache Beam: A unified model for batch and stream processing data
Apache Beam: A unified model for batch and stream processing data
DataWorks Summit/Hadoop Summit
 

Viewers also liked (20)

ACM DEBS 2015: Realtime Streaming Analytics Patterns
ACM DEBS 2015: Realtime Streaming Analytics Patterns
Srinath Perera
 
Introduction to Large Scale Data Analysis with WSO2 Analytics Platform
Introduction to Large Scale Data Analysis with WSO2 Analytics Platform
Srinath Perera
 
ICTER 2014 Invited Talk: Large Scale Data Processing in the Real World: from ...
ICTER 2014 Invited Talk: Large Scale Data Processing in the Real World: from ...
Srinath Perera
 
Analyzing a Soccer Game with WSO2 CEP
Analyzing a Soccer Game with WSO2 CEP
Srinath Perera
 
fluent-plugin-norikra #fluentdcasual
fluent-plugin-norikra #fluentdcasual
SATOSHI TAGOMORI
 
Sql queires
Sql queires
MohitKumar1985
 
Role of Analytics in Digital Business
Role of Analytics in Digital Business
Srinath Perera
 
Complex Event Processing - A brief overview
Complex Event Processing - A brief overview
István Dávid
 
RethinkDB on Oracle Linux
RethinkDB on Oracle Linux
Johan Louwers
 
Introduction to WSO2 Analytics Platform: 2016 Q2 Update
Introduction to WSO2 Analytics Platform: 2016 Q2 Update
Srinath Perera
 
Value stream analysis sample exercise
Value stream analysis sample exercise
Jeremy Jay V. Lim, MBB, PMP
 
Best practice bi_design_bestpracticesv_1_5
Best practice bi_design_bestpracticesv_1_5
rajibzzaman
 
realtime- transaction Processing System
realtime- transaction Processing System
Rashmi Agale
 
Sql 99 and_some_techniques
Sql 99 and_some_techniques
Alexey Kiselyov
 
SQL Commands
SQL Commands
Divyank Jindal
 
How to Build Real-Time Streaming Analytics with an In-memory, Scale-out SQL D...
How to Build Real-Time Streaming Analytics with an In-memory, Scale-out SQL D...
VoltDB
 
Detecting Opportunities and Threats with Complex Event Processing: Case St...
Detecting Opportunities and Threats with Complex Event Processing: Case St...
Tim Bass
 
Real time applications using the R Language
Real time applications using the R Language
Lou Bajuk
 
Temporal Operators For Spark Streaming And Its Application For Office365 Serv...
Temporal Operators For Spark Streaming And Its Application For Office365 Serv...
Jen Aman
 
Sql task answers
Sql task answers
Nawaz Sk
 
ACM DEBS 2015: Realtime Streaming Analytics Patterns
ACM DEBS 2015: Realtime Streaming Analytics Patterns
Srinath Perera
 
Introduction to Large Scale Data Analysis with WSO2 Analytics Platform
Introduction to Large Scale Data Analysis with WSO2 Analytics Platform
Srinath Perera
 
ICTER 2014 Invited Talk: Large Scale Data Processing in the Real World: from ...
ICTER 2014 Invited Talk: Large Scale Data Processing in the Real World: from ...
Srinath Perera
 
Analyzing a Soccer Game with WSO2 CEP
Analyzing a Soccer Game with WSO2 CEP
Srinath Perera
 
fluent-plugin-norikra #fluentdcasual
fluent-plugin-norikra #fluentdcasual
SATOSHI TAGOMORI
 
Role of Analytics in Digital Business
Role of Analytics in Digital Business
Srinath Perera
 
Complex Event Processing - A brief overview
Complex Event Processing - A brief overview
István Dávid
 
RethinkDB on Oracle Linux
RethinkDB on Oracle Linux
Johan Louwers
 
Introduction to WSO2 Analytics Platform: 2016 Q2 Update
Introduction to WSO2 Analytics Platform: 2016 Q2 Update
Srinath Perera
 
Best practice bi_design_bestpracticesv_1_5
Best practice bi_design_bestpracticesv_1_5
rajibzzaman
 
realtime- transaction Processing System
realtime- transaction Processing System
Rashmi Agale
 
Sql 99 and_some_techniques
Sql 99 and_some_techniques
Alexey Kiselyov
 
How to Build Real-Time Streaming Analytics with an In-memory, Scale-out SQL D...
How to Build Real-Time Streaming Analytics with an In-memory, Scale-out SQL D...
VoltDB
 
Detecting Opportunities and Threats with Complex Event Processing: Case St...
Detecting Opportunities and Threats with Complex Event Processing: Case St...
Tim Bass
 
Real time applications using the R Language
Real time applications using the R Language
Lou Bajuk
 
Temporal Operators For Spark Streaming And Its Application For Office365 Serv...
Temporal Operators For Spark Streaming And Its Application For Office365 Serv...
Jen Aman
 
Sql task answers
Sql task answers
Nawaz Sk
 
Ad

Similar to Scalable Realtime Analytics with declarative SQL like Complex Event Processing Scripts (20)

DEBS 2015 Tutorial : Patterns for Realtime Streaming Analytics
DEBS 2015 Tutorial : Patterns for Realtime Streaming Analytics
Sriskandarajah Suhothayan
 
Streaming Analytics and Internet of Things - Geesara Prathap
Streaming Analytics and Internet of Things - Geesara Prathap
WithTheBest
 
Moving Towards a Streaming Architecture
Moving Towards a Streaming Architecture
Gabriele Modena
 
Big Data Day LA 2016/ Big Data Track - Portable Stream and Batch Processing w...
Big Data Day LA 2016/ Big Data Track - Portable Stream and Batch Processing w...
Data Con LA
 
Spark streaming
Spark streaming
Venkateswaran Kandasamy
 
Streaming analytics state of the art
Streaming analytics state of the art
Stavros Kontopoulos
 
strata_spark_streaming.ppt
strata_spark_streaming.ppt
rveiga100
 
Go Observability (in practice)
Go Observability (in practice)
Eran Levy
 
Let's get to know the Data Streaming
Let's get to know the Data Streaming
Knoldus Inc.
 
Microsoft SQL Server - StreamInsight Overview Presentation
Microsoft SQL Server - StreamInsight Overview Presentation
Microsoft Private Cloud
 
Serverless London 2019 FaaS composition using Kafka and CloudEvents
Serverless London 2019 FaaS composition using Kafka and CloudEvents
Neil Avery
 
Streaming SQL
Streaming SQL
Julian Hyde
 
Streaming SQL
Streaming SQL
DataWorks Summit/Hadoop Summit
 
Time Series Analysis Using an Event Streaming Platform
Time Series Analysis Using an Event Streaming Platform
Dr. Mirko Kämpf
 
Time Series Analysis… using an Event Streaming Platform
Time Series Analysis… using an Event Streaming Platform
confluent
 
1INTRODUCTION.pptx.pdf
1INTRODUCTION.pptx.pdf
KshitijTiwari44
 
Streaming SQL
Streaming SQL
Julian Hyde
 
Trivento summercamp fast data 9/9/2016
Trivento summercamp fast data 9/9/2016
Stavros Kontopoulos
 
Don't Cross The Streams - Data Streaming And Apache Flink
Don't Cross The Streams - Data Streaming And Apache Flink
John Gorman (BSc, CISSP)
 
Stream Processing Overview
Stream Processing Overview
Maycon Viana Bordin
 
DEBS 2015 Tutorial : Patterns for Realtime Streaming Analytics
DEBS 2015 Tutorial : Patterns for Realtime Streaming Analytics
Sriskandarajah Suhothayan
 
Streaming Analytics and Internet of Things - Geesara Prathap
Streaming Analytics and Internet of Things - Geesara Prathap
WithTheBest
 
Moving Towards a Streaming Architecture
Moving Towards a Streaming Architecture
Gabriele Modena
 
Big Data Day LA 2016/ Big Data Track - Portable Stream and Batch Processing w...
Big Data Day LA 2016/ Big Data Track - Portable Stream and Batch Processing w...
Data Con LA
 
Streaming analytics state of the art
Streaming analytics state of the art
Stavros Kontopoulos
 
strata_spark_streaming.ppt
strata_spark_streaming.ppt
rveiga100
 
Go Observability (in practice)
Go Observability (in practice)
Eran Levy
 
Let's get to know the Data Streaming
Let's get to know the Data Streaming
Knoldus Inc.
 
Microsoft SQL Server - StreamInsight Overview Presentation
Microsoft SQL Server - StreamInsight Overview Presentation
Microsoft Private Cloud
 
Serverless London 2019 FaaS composition using Kafka and CloudEvents
Serverless London 2019 FaaS composition using Kafka and CloudEvents
Neil Avery
 
Time Series Analysis Using an Event Streaming Platform
Time Series Analysis Using an Event Streaming Platform
Dr. Mirko Kämpf
 
Time Series Analysis… using an Event Streaming Platform
Time Series Analysis… using an Event Streaming Platform
confluent
 
Trivento summercamp fast data 9/9/2016
Trivento summercamp fast data 9/9/2016
Stavros Kontopoulos
 
Don't Cross The Streams - Data Streaming And Apache Flink
Don't Cross The Streams - Data Streaming And Apache Flink
John Gorman (BSc, CISSP)
 
Ad

More from Srinath Perera (20)

Book: Software Architecture and Decision-Making
Book: Software Architecture and Decision-Making
Srinath Perera
 
Data science Applications in the Enterprise
Data science Applications in the Enterprise
Srinath Perera
 
An Introduction to APIs
An Introduction to APIs
Srinath Perera
 
An Introduction to Blockchain for Finance Professionals
An Introduction to Blockchain for Finance Professionals
Srinath Perera
 
AI in the Real World: Challenges, and Risks and how to handle them?
AI in the Real World: Challenges, and Risks and how to handle them?
Srinath Perera
 
Healthcare + AI: Use cases & Challenges
Healthcare + AI: Use cases & Challenges
Srinath Perera
 
How would AI shape Future Integrations?
How would AI shape Future Integrations?
Srinath Perera
 
The Role of Blockchain in Future Integrations
The Role of Blockchain in Future Integrations
Srinath Perera
 
Future of Serverless
Future of Serverless
Srinath Perera
 
Blockchain: Where are we? Where are we going?
Blockchain: Where are we? Where are we going?
Srinath Perera
 
Few thoughts about Future of Blockchain
Few thoughts about Future of Blockchain
Srinath Perera
 
A Visual Canvas for Judging New Technologies
A Visual Canvas for Judging New Technologies
Srinath Perera
 
Privacy in Bigdata Era
Privacy in Bigdata Era
Srinath Perera
 
Blockchain, Impact, Challenges, and Risks
Blockchain, Impact, Challenges, and Risks
Srinath Perera
 
Today's Technology and Emerging Technology Landscape
Today's Technology and Emerging Technology Landscape
Srinath Perera
 
An Emerging Technologies Timeline
An Emerging Technologies Timeline
Srinath Perera
 
The Rise of Streaming SQL and Evolution of Streaming Applications
The Rise of Streaming SQL and Evolution of Streaming Applications
Srinath Perera
 
Analytics and AI: The Good, the Bad and the Ugly
Analytics and AI: The Good, the Bad and the Ugly
Srinath Perera
 
Transforming a Business Through Analytics
Transforming a Business Through Analytics
Srinath Perera
 
SoC Keynote:The State of the Art in Integration Technology
SoC Keynote:The State of the Art in Integration Technology
Srinath Perera
 
Book: Software Architecture and Decision-Making
Book: Software Architecture and Decision-Making
Srinath Perera
 
Data science Applications in the Enterprise
Data science Applications in the Enterprise
Srinath Perera
 
An Introduction to APIs
An Introduction to APIs
Srinath Perera
 
An Introduction to Blockchain for Finance Professionals
An Introduction to Blockchain for Finance Professionals
Srinath Perera
 
AI in the Real World: Challenges, and Risks and how to handle them?
AI in the Real World: Challenges, and Risks and how to handle them?
Srinath Perera
 
Healthcare + AI: Use cases & Challenges
Healthcare + AI: Use cases & Challenges
Srinath Perera
 
How would AI shape Future Integrations?
How would AI shape Future Integrations?
Srinath Perera
 
The Role of Blockchain in Future Integrations
The Role of Blockchain in Future Integrations
Srinath Perera
 
Blockchain: Where are we? Where are we going?
Blockchain: Where are we? Where are we going?
Srinath Perera
 
Few thoughts about Future of Blockchain
Few thoughts about Future of Blockchain
Srinath Perera
 
A Visual Canvas for Judging New Technologies
A Visual Canvas for Judging New Technologies
Srinath Perera
 
Privacy in Bigdata Era
Privacy in Bigdata Era
Srinath Perera
 
Blockchain, Impact, Challenges, and Risks
Blockchain, Impact, Challenges, and Risks
Srinath Perera
 
Today's Technology and Emerging Technology Landscape
Today's Technology and Emerging Technology Landscape
Srinath Perera
 
An Emerging Technologies Timeline
An Emerging Technologies Timeline
Srinath Perera
 
The Rise of Streaming SQL and Evolution of Streaming Applications
The Rise of Streaming SQL and Evolution of Streaming Applications
Srinath Perera
 
Analytics and AI: The Good, the Bad and the Ugly
Analytics and AI: The Good, the Bad and the Ugly
Srinath Perera
 
Transforming a Business Through Analytics
Transforming a Business Through Analytics
Srinath Perera
 
SoC Keynote:The State of the Art in Integration Technology
SoC Keynote:The State of the Art in Integration Technology
Srinath Perera
 

Recently uploaded (20)

Lesson-3_Program-Outcomes-and-Student-Learning-Outcomes_For-Students.pdf
Lesson-3_Program-Outcomes-and-Student-Learning-Outcomes_For-Students.pdf
SarahMaeDuallo
 
Attendance Presentation Project Excel.pptx
Attendance Presentation Project Excel.pptx
s2025266191
 
Camuflaje Tipos Características Militar 2025.ppt
Camuflaje Tipos Características Militar 2025.ppt
e58650738
 
Microsoft Power BI - Advanced Certificate for Business Intelligence using Pow...
Microsoft Power BI - Advanced Certificate for Business Intelligence using Pow...
Prasenjit Debnath
 
Presentation by Tariq & Mohammed (1).pptx
Presentation by Tariq & Mohammed (1).pptx
AbooddSandoqaa
 
Measurecamp Copenhagen - Consent Context
Measurecamp Copenhagen - Consent Context
Human37
 
Crafting-Research-Recommendations Grade 12.pptx
Crafting-Research-Recommendations Grade 12.pptx
DaryllWhere
 
Model Evaluation & Visualisation part of a series of intro modules for data ...
Model Evaluation & Visualisation part of a series of intro modules for data ...
brandonlee626749
 
Communication_Skills_Class10_Visual.pptx
Communication_Skills_Class10_Visual.pptx
namanrastogi70555
 
NVIDIA Triton Inference Server, a game-changing platform for deploying AI mod...
NVIDIA Triton Inference Server, a game-changing platform for deploying AI mod...
Tamanna36
 
最新版美国约翰霍普金斯大学毕业证(JHU毕业证书)原版定制
最新版美国约翰霍普金斯大学毕业证(JHU毕业证书)原版定制
Taqyea
 
最新版意大利米兰大学毕业证(UNIMI毕业证书)原版定制
最新版意大利米兰大学毕业证(UNIMI毕业证书)原版定制
taqyea
 
Residential Zone 4 for industrial village
Residential Zone 4 for industrial village
MdYasinArafat13
 
最新版美国威斯康星大学河城分校毕业证(UWRF毕业证书)原版定制
最新版美国威斯康星大学河城分校毕业证(UWRF毕业证书)原版定制
taqyea
 
Indigo dyeing Presentation (2).pptx as dye
Indigo dyeing Presentation (2).pptx as dye
shreeroop1335
 
PPT1_CB_VII_CS_Ch3_FunctionsandChartsinCalc.ppsx
PPT1_CB_VII_CS_Ch3_FunctionsandChartsinCalc.ppsx
animaroy81
 
NASA ESE Study Results v4 05.29.2020.pptx
NASA ESE Study Results v4 05.29.2020.pptx
CiroAlejandroCamacho
 
lecture12.pdf Introduction to bioinformatics
lecture12.pdf Introduction to bioinformatics
SergeyTsygankov6
 
@Reset-Password.pptx presentakh;kenvtion
@Reset-Password.pptx presentakh;kenvtion
MarkLariosa1
 
英国毕业证范本利物浦约翰摩尔斯大学成绩单底纹防伪LJMU学生证办理学历认证
英国毕业证范本利物浦约翰摩尔斯大学成绩单底纹防伪LJMU学生证办理学历认证
taqyed
 
Lesson-3_Program-Outcomes-and-Student-Learning-Outcomes_For-Students.pdf
Lesson-3_Program-Outcomes-and-Student-Learning-Outcomes_For-Students.pdf
SarahMaeDuallo
 
Attendance Presentation Project Excel.pptx
Attendance Presentation Project Excel.pptx
s2025266191
 
Camuflaje Tipos Características Militar 2025.ppt
Camuflaje Tipos Características Militar 2025.ppt
e58650738
 
Microsoft Power BI - Advanced Certificate for Business Intelligence using Pow...
Microsoft Power BI - Advanced Certificate for Business Intelligence using Pow...
Prasenjit Debnath
 
Presentation by Tariq & Mohammed (1).pptx
Presentation by Tariq & Mohammed (1).pptx
AbooddSandoqaa
 
Measurecamp Copenhagen - Consent Context
Measurecamp Copenhagen - Consent Context
Human37
 
Crafting-Research-Recommendations Grade 12.pptx
Crafting-Research-Recommendations Grade 12.pptx
DaryllWhere
 
Model Evaluation & Visualisation part of a series of intro modules for data ...
Model Evaluation & Visualisation part of a series of intro modules for data ...
brandonlee626749
 
Communication_Skills_Class10_Visual.pptx
Communication_Skills_Class10_Visual.pptx
namanrastogi70555
 
NVIDIA Triton Inference Server, a game-changing platform for deploying AI mod...
NVIDIA Triton Inference Server, a game-changing platform for deploying AI mod...
Tamanna36
 
最新版美国约翰霍普金斯大学毕业证(JHU毕业证书)原版定制
最新版美国约翰霍普金斯大学毕业证(JHU毕业证书)原版定制
Taqyea
 
最新版意大利米兰大学毕业证(UNIMI毕业证书)原版定制
最新版意大利米兰大学毕业证(UNIMI毕业证书)原版定制
taqyea
 
Residential Zone 4 for industrial village
Residential Zone 4 for industrial village
MdYasinArafat13
 
最新版美国威斯康星大学河城分校毕业证(UWRF毕业证书)原版定制
最新版美国威斯康星大学河城分校毕业证(UWRF毕业证书)原版定制
taqyea
 
Indigo dyeing Presentation (2).pptx as dye
Indigo dyeing Presentation (2).pptx as dye
shreeroop1335
 
PPT1_CB_VII_CS_Ch3_FunctionsandChartsinCalc.ppsx
PPT1_CB_VII_CS_Ch3_FunctionsandChartsinCalc.ppsx
animaroy81
 
NASA ESE Study Results v4 05.29.2020.pptx
NASA ESE Study Results v4 05.29.2020.pptx
CiroAlejandroCamacho
 
lecture12.pdf Introduction to bioinformatics
lecture12.pdf Introduction to bioinformatics
SergeyTsygankov6
 
@Reset-Password.pptx presentakh;kenvtion
@Reset-Password.pptx presentakh;kenvtion
MarkLariosa1
 
英国毕业证范本利物浦约翰摩尔斯大学成绩单底纹防伪LJMU学生证办理学历认证
英国毕业证范本利物浦约翰摩尔斯大学成绩单底纹防伪LJMU学生证办理学历认证
taqyed
 

Scalable Realtime Analytics with declarative SQL like Complex Event Processing Scripts

  • 1. Scalable Realtime Analytics with declarative, SQL like, Complex Event Processing Scripts Srinath Perera Director, Research WSO2 Apache Member (@srinath_perera) [email protected]
  • 2. (Batch) Analytics Scientists are doing this for 25 year with MPI (1991) on special Hardware Took off with Google’s MapReduce paper (2004), Apache Hadoop, Hive and whole eco system created. It was successful, So we are here!! But, processing takes time.
  • 3. Value of Some Insights degrade Fast! For some usecases ( e.g. stock markets, traffic, surveillance, patient monitoring) the value of insights degrade very quickly with time. - E.g. stock markets and speed of light We need technology that can produce outputs fast - Static Queries, but need very fast output (Alerts, Realtime control) - Dynamic and Interactive Queries ( Data exploration)
  • 4. History Realtime Analytics are not new either!! - Active Databases (2000+) - Stream processing (Aurora, Borealis (2005+) and later Storm) - Distributed Streaming Operators (e.g. Database research topic around 2005) - CEP vendor roadmap ( from http://www.complexevents.com/2014/12/03/cep- tooling-market-survey-2014/)
  • 7. I. Stream Processing Program a set of processors and wire them up, data flows though the graph. A middleware framework handles data flow, distribution, and fault tolerance (e.g. Apache Storm, Samza) Processors may be in the same machine or multiple machines
  • 8. II. Complex Event Processing
  • 9. III. Micro Batch Process data in small batches, and then combine results for final results (e.g. Spark) Works for simple aggregates, but tricky to do this for complex operations (e.g. Event Sequences) Can do it with MapReduce as well if the deadlines are not too tight.
  • 10. IV. OLAP Style In Memory Computing Usually done to support interactive queries Index data to make them them readily accessible so you can respond to queries fast. (e.g. Apache Drill) Tools like Druid, VoltDB and SAP Hana can do this with all data in memory to make things really fast.
  • 11. Realtime Analytics Patterns Simple counting (e.g. failure count) Counting with Windows ( e.g. failure count every hour) Preprocessing: filtering, transformations (e.g. data cleanup) Alerts , thresholds (e.g. Alarm on high temperature) Data Correlation, Detect missing events, detecting erroneous data (e.g. detecting failed sensors) Joining event streams (e.g. detect a hit on soccer ball) Merge with data in a database, collect, update data conditionally
  • 12. Realtime Analytics Patterns (contd.) Detecting Event Sequence Patterns (e.g. small transaction followed by large transaction) Tracking - follow some related entity’s state in space, time etc. (e.g. location of airline baggage, vehicle, tracking wild life)  Detect trends – Rise, turn, fall, Outliers, Complex trends like triple bottom etc., (e.g. algorithmic trading, SLA, load balancing) Learning a Model (e.g. Predictive maintenance) Predicting next value and corrective actions (e.g. automated car)
  • 13. Apache Hive A SQL like data processing language Since many understand SQL, Hive made large scale data processing Big Data accessible to many Expressive, short, and sweet. Define core operations that covers 90% of problems Lets experts dig in when they like!
  • 14. (Batch Processing, Hive) (Realtime Analytics, X) What is X?
  • 15. CEP = SQL for Realtime Analytics Easy to follow from SQL Expressive, short, and sweet. Define core operations that covers 90% of problems Lets experts dig in when they like! Lets look at the core operations.
  • 16. Operators: Filters Assume a temperature stream Here weather:convertFtoC() is a user defined function. They are used to extend the language. define stream TempStream (ts long, temp double); from TempratureStream [weather:convertFtoC(temp) > 30.0) and roomNo != 2043] select roomNo, temp insert into HotRoomsStream ; Usecases: - Alerts , thresholds (e.g. Alarm on high temperature) - Preprocessing: filtering, transformations (e.g. data cleanup)
  • 17. Operators:Windows and Aggregation Support many window types - Batch Windows, Sliding windows, Custom windows Usecases - Simple counting (e.g. failure count) - Counting with Windows ( e.g. failure count every hour) from TempratureStream#window.time(1 min) select roomNo, avg(temp) as avgTemp insert into HotRoomsStream ;
  • 18. Operators: Patterns Models a followed by relation: e.g. event A followed by event B Very powerful tool for tracking and detecting patterns from every (a1 = TempratureStream) -> a2 = TempratureStream [temp > a1.temp + 5 ] within 1 day select a2.ts as ts, a2.temp – a1.temp as diff insert into HotDayAlertStream; Usecases - Detecting Event Sequence Patterns - Tracking - Detect trends
  • 19. Operators: Joins Join two data streams based on a condition and windows Usecases - Data Correlation, Detect missing events, detecting erroneous data - Joining event streams from TempStream[temp > 30.0]#window.time(1 min) as T join RegulatorStream[isOn == false]#window.length(1) as R on T.roomNo == R.roomNo select T.roomNo, R.deviceID, ‘start’ as action insert into RegulatorActionStream
  • 20. Operators:Access Data from the Disk Event tables allow users to map a database to a window and join a data stream with the window Usecases - Merge with data in a database, collect, update data conditionally define stream TempStream (ts long, temp double); define table HistTempTable(day long, avgT double); from TempStream #window.length(1) join OldTempTable on getDayOfYear(ts) == HistTempTable.day && ts > avgT select ts, temp insert into PurchaseUserStream ;
  • 22. Predictive Analytics  Build models and use them with WSO2 CEP, BAM and ESB using upcoming WSO2 Machine Learner Product ( 2015 Q2)  Build model using R, export them as PMML, and use within WSO2 CEP  Call R Scripts from CEP queries  Regression and Anomaly Detection Operators in CEP
  • 23. Case Study: Realtime Soccer Analysis Watch at: https://www.youtube.com/watch?v=nRI6buQ0NOM
  • 24. TFLTraffic Analysis Built using TFL ( Transport for London) open data feeds. http://goo.gl/04tX6k http://goo.gl/9xNiCm
  • 25. Great, Does it Scale?
  • 26. Idea 1: Network of CEP Nodes For scaling, we arrange CEP processing nodes in a graph like with stream processing. The Graph can be implemented using an stream processing engine like Apache Storm
  • 27. Idea II: Compile SQL like Queries to a Network of CEP Nodes from TempStream[temp > 33] insert into HighTempStream; from HighTempStream#window(1h) select max(temp)as max insert into HourlyMaxTempStream; 
  • 28. How do We partition the Data to scale up the Analysis? Lets follow MapReduce Map Reduce does not scale itself, it asks users to break the problem to many small independent problems.
  • 29. Idea III: Let the Users specify Parallelism Language include parallel constructs: partitions, pipelines, distributed operators Assign each partition to a different node, and partition the data accordingly define partition on TempStream.region { from TempStream[temp > 33] insert into HighTempStream; } from HighTempStream#window(1h) select max(temp)as max insert into HourlyMaxTempStream;
  • 30. Handling Ordering When the data processed in parallel, output might be generated out of order. Due to lack of a global time, we cannot trigger windows and other time sensitive constructs Solution: the current time needs to be propagated though the graph
  • 32. WSO2 CEP & Big Data Platform
  • 33. CEP = SQL for Realtime Analytics Easy to follow from SQL Expressive, short, sweet and fast!! Define core operations that covers 90% of problems Lets experts dig in when they like! And it Scales!!
  • 34. Questions? Visit us at Booth 1025http://wso2.com/landing/strata- hadoop-world-ca-2015/