SlideShare a Scribd company logo
Query-Driven Descriptive Analytics for IoT
and Edge Computing
Moysis Symeonides*, Demetris Trihinas✝, Zacharias Georgiou*,
George Pallis*, Marios D. Dikaiakos*
IEEE International Conference on Cloud Engineering (IC2E 2019)
*Department of Computer Science
University of Cyprus
✝Department of Computer Science
University of Nicosia
D. Trihinas
trihinas@cs.ucy.ac.cy
Laboratory for
Internet Computing
StreamSight - IC2E 2019
Distributed Data Processing Engines
2
2
● Frameworks like Hadoop and Spark are contributing to the democratization
of big data analytics by hiding the complexity related to:
○ Machine communication and resource management -> dealing with the
underlying infrastructure.
○ Task scheduling and supervision for analytic jobs.
○ Fault tolerance for both the infrastructure and execution state.
○ Monitoring and logging.
○ ...
D. Trihinas
trihinas@cs.ucy.ac.cy
Laboratory for
Internet Computing
StreamSight - IC2E 2019
● Transforming the physical world into an information system.
● 3.6 Billion IoT devices are being used daily1 with these devices projected
to generate 500 ZB of data2 by the end of the year (2019).
The Internet of Things
3
● It only seems “natural” that IoT services offload analytic jobs to the cloud
for data processing.
● But… IoT services usually come with near real-time requirements and
moving data “centrally” for processing penalizes analytics timeliness.
[1] Next big things in IoT predictions for 2020, ITPro, 2018
[2] Global Cloud Index, Cisco, 2018
Analytic Insights
IoT services
D. Trihinas
trihinas@cs.ucy.ac.cy
Laboratory for
Internet Computing
StreamSight - IC2E 2019
Edge Computing… Saving IoT Analytics
4
Cloud
Analytic Insights
IoT services
The “Edge”
● Data processing now possible in place -or within- local network.
○ Shorter response times for latency critical IoT services.
○ More efficient processing by offloading “centralized” components.
● Possible because hardware for mobile/fog/edge is scaling-up1.
● But… bandwidth and battery capacity NOT scaling at same rate2.
[1] EdgeIoT: Mobile Edge Computing for the Internet of Things, X. Sun et al, IEEE Communications, 2016.
[2] Low-Cost Approximate and Adaptive Monitoring Techniques for the Internet of Things, D. Trihinas et al., IEEE, Trans. on Services Computing, 2018.
D. Trihinas
trihinas@cs.ucy.ac.cy
Laboratory for
Internet Computing
StreamSight - IC2E 2019
IoT Analytics Over the Edge
5
Cloud
Analytic Insights
IoT services
The “Edge”
How to process enormous volumes of streaming data at
the edge to provide query-driven analytic insights while
also minimizing response times?
D. Trihinas
trihinas@cs.ucy.ac.cy
Laboratory for
Internet Computing
StreamSight - IC2E 2019 6
Query-Driven Analytics
Abstractions required for modelling knowledge extraction from data streams
Challenge 1: Expressing (ad-hoc) analytic queries
● One must have specific knowledge of the programming model of the
underlying processing engine.
...
...
Compute the average of
a metric using a 60s
sliding window
● Queries are bounded to the underlying processing engine (query portability).
D. Trihinas
trihinas@cs.ucy.ac.cy
Laboratory for
Internet Computing
StreamSight - IC2E 2019 7
Query-Driven Analytics
● A naive “edge” deployment can impose compute and communication
penalties for intermediate recomputations and data exchange.
Challenge 2: Geo-distributed deployments are the norm
for IoT services not the exception
dnR1 =
data exchange
and computation
R1R2 =
result exchange
...d1
dnR2 = d1
dnR1 = ...d1
Naive Deployment
...
Re-using intermediate results
...+ ...+
● Network bandwidth between geo-distributed entities is far from uniform.
Pixida: Optimizing data parallel jobs in wide-area data analytics , K. Kloudas, VLDB, 2015.
Mechanisms to avoid data movement and recomputations are needed
D. Trihinas
trihinas@cs.ucy.ac.cy
Laboratory for
Internet Computing
StreamSight - IC2E 2019
Outline of Today’s Talk
8
8
● IoT analytics over geo-distributed topologies.
● Abstract query model for query-driven IoT analytics.
● The StreamSight Framework
○ Query plan compilation.
○ Edge computing improvements.
○ Experimentation.
● Future research directions and open research questions.
D. Trihinas
trihinas@cs.ucy.ac.cy
Laboratory for
Internet Computing
StreamSight - IC2E 2019 9
Abstract Query Model
● Queries are applied on metric streams with the
intent to derive insights.
● Insights can be reused-transformed-composed with
other metric streams to create new insights.
<bus_id, bus99>,
<bus_delay, 5>
<bus_region, NW>
...
Metric
Record
Metric
Stream
D. Trihinas
trihinas@cs.ucy.ac.cy
Laboratory for
Internet Computing
StreamSight - IC2E 2019 10
Abstract Query Model
Insight = COMPUTE <Expression> EVERY <Interval> [WITH Optimizations>]
COMPUTE
➢The composition, transformation and aggregation of multiple metric
streams (e.g., expression, composite, aggregate).
EVERY
➢Denotes the interval the expression is evaluated and can be a time
interval (e.g., every 1min) or tuple-based (e.g., every 1000 records).
WITH
➢Optional statement for capturing user-defined optimizations and
constraints for data streams and edge topologies.
Metric Stream
Expression Insight Stream
Metric Stream
...
EVERY
D. Trihinas
trihinas@cs.ucy.ac.cy
Laboratory for
Internet Computing
StreamSight - IC2E 2019 11
Smart City Bus Network
edge server
● Buses equipped with GPS tracking devices emitting updates to respected
local edge server of the current region it is navigating through.
● Bus updates include: bus id, location coordinates, operating city region, an
estimation of the current bus route delay, etc.
● Inspired by Dublin smart city bus network.
D. Trihinas
trihinas@cs.ucy.ac.cy
Laboratory for
Internet Computing
StreamSight - IC2E 2019 12
Insight Operations
1. Window Operations: Aggregation of values within a time period
COMPUTE
ARITHMETIC_MEAN(bus_delay, 10 MINUTES)
EVERY 5 SECONDS
Raw metric stream Time periodAggregate
Time Interval
COMPUTE
ARITHMETIC_MEAN(bus_delay, 10 MINUTES)
BY city_segment EVERY 5 SECONDS
Group by a metric key
Examples of Aggregates: sum, count, sdev, median, percentile,etc.
Apache Spark
14 ops
Apache Spark
15 Ops
D. Trihinas
trihinas@cs.ucy.ac.cy
Laboratory for
Internet Computing
StreamSight - IC2E 2019 13
Insight Operations
2. Temporal Compositions: Compositions with different time windows
COMPUTE (
ARITHMETIC_MEAN(bus_delay, 10 MINUTES)
/
ARITHMEIC_MEAN (bus_delay, 60 MINUTES)
) EVERY 5 SECONDS
3. Accumulated Compositions: Updates on previously computed data
COMPUTE EWMA[0.85](passengers) BY bus_stop EVERY 1 TUPLE
Examples: running_mean, running_max, running_sdev, etc.
Apache Spark
32 ops
Apache Spark
24 ops
D. Trihinas
trihinas@cs.ucy.ac.cy
Laboratory for
Internet Computing
StreamSight - IC2E 2019
COMPUTE bus_delay
WHEN > ( RUNNING_MEAN(bus_delay) + 3 * RUNNING_SDEV(bus_delay) )
BY city_segment EVERY 5 SECONDS;
14
Insight Operations
4. Hybrid Compositions: Combing window and accumulated operations
COMPUTE (
ARITHMETIC_MEAN( bus_delay, 10 MINUTES)
-
EWMA[0.65]( bus_delay)
) BY city_segment EVERY 5 SECONDS
5. Filtered Compositions: Filter input and output streams
Window Operation
Accumulated Operation
Filter Predicate
Apache Spark
34 ops
Apache Spark
41 ops
D. Trihinas
trihinas@cs.ucy.ac.cy
Laboratory for
Internet Computing
StreamSight - IC2E 2019 15
Collaborative Edge Services
● Infrastructures of multiple stakeholders that are
geographically distributed
● Inspired by publically available data from:
○ the New York transportation authority,
○ the Dublin smart city bus network and
○ Uber
● Endorsed with real-time weather data from open-
access meteorological stations
● Companies, Employees and Clients can easily
submit their queries
D. Trihinas
trihinas@cs.ucy.ac.cy
Laboratory for
Internet Computing
StreamSight - IC2E 2019 16
Collaborative Edge Services
COMPUTE vehicleID
FROM (taxis, car_sharing)
WHEN GEOHASH[10](cusLoc) == GEOHASH[10](vehLoc)
EVERY 1 MINUTES
Geo-analytic Queries Travel app user interested
in finding closest taxis or
car-sharing vehicles.
Multiple Sources
The city segment with least
number of vehicles in a
15min sliding window
when the temperature
drops below 10◦C
COMPUTE MIN(
COUNT(buses, 15 MINUTES) BY city_segment +
COUNT(taxis, 15 MINUTES) BY city_segment +
COUNT(sharing, 15 MINUTES) BY city_segment
) WHEN temperature <= 10
EVERY 10 MINUTES
COMPUTE TOP_K[5] (
MEAN(total_amount, 1 MONTH)-
MEAN(total_amount, 1 MONTH, 1 MONTH )
) BY city_segment EVERY 1 HOURS
The top-5 city areas based
on current and previous
month average amount.
1 MONTH offset
Data-driven suggestions
D. Trihinas
trihinas@cs.ucy.ac.cy
Laboratory for
Internet Computing
StreamSight - IC2E 2019
Outline of Today’s Talk
17
17
● IoT analytics over geo-distributed topologies.
● Abstract query model for query-driven IoT analytics.
● The StreamSight Framework
○ Query plan compilation.
○ Edge computing improvements.
○ Experimentation.
● Future research directions and open research questions.
D. Trihinas
trihinas@cs.ucy.ac.cy
Laboratory for
Internet Computing
StreamSight - IC2E 2019 18
Specification, compilation, and execution of streaming IoT analytic
queries on distributed processing engines optimized for edge computing
environments.
StreamSight Framework
StreamSight: A Query-Driven Framework for Streaming Analytics in Edge Computing. Z. Georgiou et al, IEEE/ACM UCC, 2018.
Currently Supporting
Future Adapters
...
D. Trihinas
trihinas@cs.ucy.ac.cy
Laboratory for
Internet Computing
StreamSight - IC2E 2019
COMPUTE
ARITHMETIC_MEAN( bus_delay, 10 MINUTES)
BY city_segment EVERY 5 SECONDS
19
Query Model Translation
● Nodes correspond to a
grammar rule of the language
● Leaves are the tokens and
symbols of the language
Insight Description
Abstract Syntax Tree
● Parser performs early validation to verify syntactic correctness of query.
D. Trihinas
trihinas@cs.ucy.ac.cy
Laboratory for
Internet Computing
StreamSight - IC2E 2019
● Constructs Query Execution Plan, assembling
the pipeline of stream operations from the
AST representation.
20
Compilation Phase
● A recursive algorithm traverses the AST
● Each node is mapped to a stream operation
of the underlying processing engine
Abstract Syntax Tree
● Naive AST Mapping... extremely inefficient by
ignoring geo-distributed nature of edge realms
○ Unnecessary intermediate re-computations
○ Increased data movement
● AST must acknowledge these.
...
...
D. Trihinas
trihinas@cs.ucy.ac.cy
Laboratory for
Internet Computing
StreamSight - IC2E 2019
-
System Optimizations
21
Reusing intermediate results
● StreamSight caches and broadcasts across worker nodes expressions,
composites and results to reduce unnecessary re-computations.
Insight 1: Calculate current average bus_delay Insight 2: Calculate the ratio between current
and last hour bus_delay
D. Trihinas
trihinas@cs.ucy.ac.cy
Laboratory for
Internet Computing
StreamSight - IC2E 2019
User Optimizations
22
[1] ApproxIoT: Approximate Analytics for Edge Computing, Z. Wen et al, ICDCS, 2018
Sampling enables the execution of an insight description on a portion of the
streamed measurements for approximate but in time answers (k <<N)
● Uniform Sampling
● Weighted Hierarchical Reservoir Sampling (WHRS)1
● Applies on the fly reservoir + stratified sampling
StreamSight allows the user to prioritize insights
● On high-load influx or network uncertainties critical queries are not
delayed while less important are queued.
D. Trihinas
trihinas@cs.ucy.ac.cy
Laboratory for
Internet Computing
StreamSight - IC2E 2019 23
User Optimizations
COMPUTE MAX(taxis_fare_amount, 60 MINUTES)
BY city_segment EVERY 1 MINUTES
WITH SALIENCE 1 Priority Higher is better
Sampling with Error Margin & Confidence:
COMPUTE
ARITHMETIC_MEAN(taxi_passengers, 10 MINUTES)
EVERY 30 SECONDS
WITH MAX_ERROR 0.05 AND CONFIDENCE 0.95
Error upper bound Confidence Interval
COMPUTE ARITHMETIC_MEAN(bus_delay, 60 MINUTES)
BY stop_id EVERY 5 MINUTES
WITH SALIENCE 1 AND SAMPLE 0.2
Prioritization On high-load influx
critical queries are not
delayed
Uniform Sampling Query execution on a
portion of the data
stream
Query execution with
bounded error
guarantees for sampling
D. Trihinas
trihinas@cs.ucy.ac.cy
Laboratory for
Internet Computing
StreamSight - IC2E 2019 24
User Optimizations
COMPUTE COUNT(taxis)
BY city_segment
EVERY 1 SECONDS
WITH ALLOW ON DEDICATED[5]
Dedicated Execution
Number of Dedicated
Nodes
COMPUTE
PEWMA[0.5](bus_delay) BY bus_id
EVERY 30 SECONDS
WITH MAX_ERROR 0.05 AND CONFIDENCE 0.95
AND AWARENESS ON COMPUTATIONS Try to minimize the
Computations
Try to maximize the
Accuracy
Awareness on Computations
Accuracy Aware Execution
COMPUTE
PEWMA[0.5](bus_delay) BY bus_id
EVERY 30 SECONDS
WITH MAX_ERROR 0.05 AND CONFIDENCE 0.95
AND AWARENESS ON ACCURACY
Execution of crucial
queries on dedicated
Nodes
Minimize the computation
footprint of execution for
less significant queries but
at the same time keep the
error less than 5%
Only in high influx periods
sacrifice a portion of the
accuracy but keep the error
less than 5%
D. Trihinas
trihinas@cs.ucy.ac.cy
Laboratory for
Internet Computing
StreamSight - IC2E 2019 25
Evaluation
D. Trihinas
trihinas@cs.ucy.ac.cy
Laboratory for
Internet Computing
StreamSight - IC2E 2019 26
Dublin Bus Workload
Real-World Datasets
● Dublin Smart City Buses Network[1]
○ 968 Buses (Jan 2014)
○ 16 metrics/record, including: bus_id, bus_delay, city_segment
○ Used 7 insights of actual interest for Bus operators
[1] Dublin, “Smart City ITS,” https://data.smartdublin.ie/, 2018
16 Edge servers
● 1 vCPU, 1GB MEM, 2↑ 16↓ Mbps
Evaluation Metric
● Batch Processing Time
Unstable
System
Stable
System
➢StreamSight achieved x1.4 speedup over the baseline
➢StreamSight+WHRS achieved x4.3 speedup over the baseline
D. Trihinas
trihinas@cs.ucy.ac.cy
Laboratory for
Internet Computing
StreamSight - IC2E 2019 27
Re-usage of Intermediate Results
● Dublin Bus Workload
● Average Processing Time ( Fixed Input rate 700 req/s )
StreamSight DOES NOT
incur a performance
overhead
Baseline configuration failed
D. Trihinas
trihinas@cs.ucy.ac.cy
Laboratory for
Internet Computing
StreamSight - IC2E 2019
Outline of Today’s Talk
28
28
● IoT analytics over geo-distributed topologies.
● Abstract query model for query-driven IoT analytics.
● The StreamSight Framework
○ Query plan compilation.
○ Edge computing improvements.
○ Experimentation.
● Future research directions and open research questions.
D. Trihinas
trihinas@cs.ucy.ac.cy
Laboratory for
Internet Computing
StreamSight - IC2E 2019 29
● Same composition across different insights - different queries but with common
operators.
● Same operators across different compositions - e.g., MEAN, is composed from a
SUM divided by a COUNT. If either SUM or COUNT available then reuse them.
● Same composition across different offsets1
● Re-use insights across users - involves tracking shared results across deployments
and users, privacy protection, etc. (possibly use of blockchain?)
COMPUTE
ARITHMETIC_MEAN(consumption, 10 MINUTES)/
ARITHMETIC_MEAN(consumption, 10 MINUTES, 10 MINUTES)
EVERY 15 MINUTES
we can cache and reuse the
composition for 10 minutes
Reusage of Intermediate Results
[1] SlickDeque: High Throughput and Low Latency Incremental Sliding-Window Aggregation. A. Shein et al, EDBT, 2018.
D. Trihinas
trihinas@cs.ucy.ac.cy
Laboratory for
Internet Computing
StreamSight - IC2E 2019 30
● Query model operators: DEDICATED, SALIENCE, ALLOW ON, AWARENESS, etc.
● Still… fog-device-user mobility and network uncertainties affect IoT services
QoS, cost, and energy consumption.
● Analytics job scheduling requires “intelligent” consideration of data placement
when orchestrating dynamic IoT services.
● Ignoring this can result in IoT services placed for optimal responsiveness but
failing to guarantee timely insight refreshment.
Query Execution Placement
ADMin: Adaptive Monitoring Dissemination for the Internet of Things, D. Trihinas et al., IEEE,INFOCOM, 2017.
D. Trihinas
trihinas@cs.ucy.ac.cy
Laboratory for
Internet Computing
StreamSight - IC2E 2019 31
● Moving to the “edge” means not only are data sources diverse but possibly
even the data processing engines.
● These engines must “speak” the same language.
● Open specification vs federation layer?
Multiple and Heterogeneous Data
Processing Engines
OpenFog Consortium and OpenEdge Initiative
D. Trihinas
trihinas@cs.ucy.ac.cy
Laboratory for
Internet Computing
StreamSight - IC2E 2019 32
● Do we always need to actually compute the answer on the entire data?
○ Sampling…
○ Yes, but we need bounded approximations… and these approximations must
be computed efficiently across geo-distributed environments.
■ Beware… substituting one computation with another must be beneficial in
terms of performance (e.g., multivariate and dependent metrics)1.
● Do we always need to actually compute the answer?
○ or... can we use a bounded approximation on recent history be satisfactory2.
Data-less Query Execution
[1] ATMoN: Adapting the ”Temporality” in Large-Scale Dynamic Networks, D Trihinas et al, IEEE ICDCS, 2018.
[2] Towards intelligent distributed data systems for scalable efficient and accurate analytics, P. Triantafyllou et al, IEEE ICDCS, 2018.
D. Trihinas
trihinas@cs.ucy.ac.cy
Laboratory for
Internet Computing
StreamSight - IC2E 2019 33
● Query model provides provisions for data confidentiality, restricted access
control and data movement constraints across geo-locations.
● Offloading sensitive data to the cloud hinders man-in-the-middle attacks… on
the other hand… processing “in place” hinders attacks (e.g., DDoS) on “easier”
attacking planes (e.g., low-power IoT devices).
● Query model NOT enough… geo-distributed analytics requires task scheduling
algorithms to acknowledge privacy-aware compute… How to do this efficiently?
Security & Privacy
COMPUTE patient_stream
EVERY 5 MINUTES
WITH ALLOW
WHEN MEAN( heart_beat, 1 MINUTES ) >= 190
AND doctor_id IN (doctor_ids)
AND region == clinic_region
Evaluation
Rule
D. Trihinas
trihinas@cs.ucy.ac.cy
Laboratory for
Internet Computing
StreamSight - IC2E 2019
Conclusion
34
● Abstract query model for query-driven IoT analytics
○ Use cases (smart city, energy, health, microservices) illustrating value of the query model.
● A prototype framework called StreamSight
○ A framework for the specification, compilation, and execution of streaming analytic
queries on the “Edge” .
○ Optimizations:
■ Intermediate results
■ User-optimizations
○ StreamSight can achieve up to 4.3x speedup compared to a naively deployment.
● Many open research challenges for geo-distributed and query-driven
analytics in edge/fog topologies.
Reduce compute and network load on
the Edge
D. Trihinas
trihinas@cs.ucy.ac.cy
Laboratory for
Internet Computing
StreamSight - IC2E 2019
THANK YOU
This work is partially
supported by the European
Commission in terms of
Unicorn 731846 H2020 project
(H2020-ICT-2016-1)
Download StreamSight at: https://github.com/UCY-
LINC-LAB/StreamSight.git
35
D. Trihinas
trihinas@cs.ucy.ac.cy
Laboratory for
Internet Computing
StreamSight - IC2E 2019 36
Energy Consumption in Micro-DCs
● Micro-DCs, also denoted as Green-DCs, powered by:
○ National electricity providers and
○ Photovoltaic power harvesting stations placed near to the DCs
● A wide range of sensors are placed in all datacenter racks and the
photovoltaic stations which generates measurements like:
○ Temperature and Energy consumption per Data Center, per Rack or per
Node
○ Energy generation per Photovoltaic Panel
○ Weather data from station like humidity, wind, temperature etc
● Inspired by ENEDI project http://enedi.eu
ENEDI: Energy Saving in Datacenters, Tryfonos et al, IEEE Global IoT, 2018.
D. Trihinas
trihinas@cs.ucy.ac.cy
Laboratory for
Internet Computing
StreamSight - IC2E 2019
ProcessingTime(s)
37
Insight Prioritization
● Dublin Bus Workload
● Average Processing Time (fixed workload)
● 1 Insight with high priority and 3 insights with low priority
Non prioritized queries are
queued
Introduced artificial latency (x2) between worker nodes
Prioritized insight
experiences no delay

More Related Content

PPTX
Composable Energy Modeling for ML-Driven Drone Applications
PDF
Low-Cost Approximate and Adaptive Techniques for the Internet of Things
PDF
Exascale Computing for Autonomous Driving
PDF
IRJET - Realization of Power Optimised Carry Skip Adder using AOI Logic
PDF
第12回 配信講義 計算科学技術特論A(2021)
PDF
Energy and latency aware application
PDF
SX Aurora TSUBASA (Vector Engine) a Brand-new Vector Supercomputing power in...
PDF
Binding CIM and Modelica for Consistent Power System Dynamic Model Exchange a...
Composable Energy Modeling for ML-Driven Drone Applications
Low-Cost Approximate and Adaptive Techniques for the Internet of Things
Exascale Computing for Autonomous Driving
IRJET - Realization of Power Optimised Carry Skip Adder using AOI Logic
第12回 配信講義 計算科学技術特論A(2021)
Energy and latency aware application
SX Aurora TSUBASA (Vector Engine) a Brand-new Vector Supercomputing power in...
Binding CIM and Modelica for Consistent Power System Dynamic Model Exchange a...

What's hot (20)

PDF
Energy-Efficient Virtual Machines Placement - SBRC2014
PDF
Integrated Model Discovery and Self-Adaptation of Robots
PDF
IRJET- Collaborative Task Execution for Application as a General Topology in ...
PDF
Software Based calculations of Electrical Machine Design
PDF
Design of area and power efficient half adder using transmission gate
PDF
IRJET - Wavelet based Image Fusion using FPGA for Biomedical Application
PDF
Static Energy Prediction in Software: A Worst-Case Scenario Approach
PDF
A04230105
PPTX
HPC with Clouds and Cloud Technologies
PDF
EFFINET - Initial Presentation
PDF
Implementation of an arithmetic logic using area efficient carry lookahead adder
PDF
HSO: A Hybrid Swarm Optimization Algorithm for Reducing Energy Consumption in...
PPTX
A Study of Virtual Machine Placement Optimization in Data Centers (CLOSER'2017)
PDF
Estimation of Optimized Energy and Latency Constraint for Task Allocation in ...
PDF
A SURVEY - COMPARISON OF MULTIPLIERS USING DIFFERENT LOGIC STYLE
PPTX
A Highly Parallel Semi-Dataflow FPGA Architecture for Large-Scale N-Body Simu...
PDF
cug2011-praveen
PPTX
Distance and Time Based Node Selection for Probabilistic Coverage in People-C...
PDF
Analysis of Impact of Graph Theory in Computer Application
PDF
Mobility insights at Swisscom - Understanding collective mobility in Switzerland
Energy-Efficient Virtual Machines Placement - SBRC2014
Integrated Model Discovery and Self-Adaptation of Robots
IRJET- Collaborative Task Execution for Application as a General Topology in ...
Software Based calculations of Electrical Machine Design
Design of area and power efficient half adder using transmission gate
IRJET - Wavelet based Image Fusion using FPGA for Biomedical Application
Static Energy Prediction in Software: A Worst-Case Scenario Approach
A04230105
HPC with Clouds and Cloud Technologies
EFFINET - Initial Presentation
Implementation of an arithmetic logic using area efficient carry lookahead adder
HSO: A Hybrid Swarm Optimization Algorithm for Reducing Energy Consumption in...
A Study of Virtual Machine Placement Optimization in Data Centers (CLOSER'2017)
Estimation of Optimized Energy and Latency Constraint for Task Allocation in ...
A SURVEY - COMPARISON OF MULTIPLIERS USING DIFFERENT LOGIC STYLE
A Highly Parallel Semi-Dataflow FPGA Architecture for Large-Scale N-Body Simu...
cug2011-praveen
Distance and Time Based Node Selection for Probabilistic Coverage in People-C...
Analysis of Impact of Graph Theory in Computer Application
Mobility insights at Swisscom - Understanding collective mobility in Switzerland
Ad

Similar to StreamSight - Query-Driven Descriptive Analytics for IoT and Edge Computing (20)

PDF
StreamSight: A Query-Driven Framework Extending Streaming IoT Analytics to th...
PDF
IOT_MODULE_4.pd easy to understand notes
PPT
Intelligent Data Processing for the Internet of Things
PDF
WSO2Con ASIA 2016: IoT Analytics
PPTX
Internet of Things & Big Data
PPTX
Building a Data Analytics PaaS for Smart Cities
PDF
Sensing the world with Data of Things
PDF
Sensing the world with data of things
PPT
Internet of Things and Large-scale Data Analytics
PDF
IoT Analytics
PDF
STEAM++ AN EXTENSIBLE END-TO-END FRAMEWORK FOR DEVELOPING IOT DATA PROCESSING...
PDF
Steam++ An Extensible End-to-end Framework for Developing IoT Data Processing...
PDF
Adding Edge Data to Your AI and Analytics Strategy
PDF
Real time Analytics in IoT - Marcel Lattmann Codit Switzerland @.NET Day 2019
PPT
Smart Cities and Data Analytics: Challenges and Opportunities
PPTX
Io t research_arpanpal_iem
PPT
Internet of Things and Data Analytics for Smart Cities and eHealth
PPT
What makes smart cities “Smart”?
PPT
Physical-Cyber-Social Data Analytics & Smart City Applications
PPT
Dynamic Data Analytics for the Internet of Things: Challenges and Opportunities
StreamSight: A Query-Driven Framework Extending Streaming IoT Analytics to th...
IOT_MODULE_4.pd easy to understand notes
Intelligent Data Processing for the Internet of Things
WSO2Con ASIA 2016: IoT Analytics
Internet of Things & Big Data
Building a Data Analytics PaaS for Smart Cities
Sensing the world with Data of Things
Sensing the world with data of things
Internet of Things and Large-scale Data Analytics
IoT Analytics
STEAM++ AN EXTENSIBLE END-TO-END FRAMEWORK FOR DEVELOPING IOT DATA PROCESSING...
Steam++ An Extensible End-to-end Framework for Developing IoT Data Processing...
Adding Edge Data to Your AI and Analytics Strategy
Real time Analytics in IoT - Marcel Lattmann Codit Switzerland @.NET Day 2019
Smart Cities and Data Analytics: Challenges and Opportunities
Io t research_arpanpal_iem
Internet of Things and Data Analytics for Smart Cities and eHealth
What makes smart cities “Smart”?
Physical-Cyber-Social Data Analytics & Smart City Applications
Dynamic Data Analytics for the Internet of Things: Challenges and Opportunities
Ad

More from Demetris Trihinas (16)

PDF
Rapidly Testing ML-Driven Drone Applications - The FlockAI Framework
PPTX
Towards Energy and Carbon Footprint and Testing for AI-driven IoT Services
PDF
Telling a Story – or Even Propaganda – Through Data Visualization
PDF
Machine Learning Introduction
PPTX
Απεικόνιση και Αλληλεπίδραση Δεδομένων Μεγάλου Όγκου με Διαδραστικούς Χάρτες
PDF
The Data Science Process: From Mining Raw Data to Story Visualization
PDF
From Mining Raw Data to Story Visualization
PDF
Designing Scalable and Secure Microservices by Embracing DevOps-as-a-Service ...
PPTX
Low-Cost Approximate and Adaptive Monitoring Techniques for the Internet of T...
PPTX
Adam - Adaptive Monitoring in 5min
PPTX
Low-Cost Adaptive Monitoring Techniques for the Internet of Things
PPTX
AdaM: an Adaptive Monitoring Framework for Sampling and Filtering on IoT Devices
PPTX
Find A Project
PPTX
Cloud Elasticity and the CELAR Project
PDF
[ccgrid2014] JCatascopia: Monitoring Elastically Adaptive Applications in the...
PPTX
[SummerSoc 2014] Monitoring Elastic Cloud Services
Rapidly Testing ML-Driven Drone Applications - The FlockAI Framework
Towards Energy and Carbon Footprint and Testing for AI-driven IoT Services
Telling a Story – or Even Propaganda – Through Data Visualization
Machine Learning Introduction
Απεικόνιση και Αλληλεπίδραση Δεδομένων Μεγάλου Όγκου με Διαδραστικούς Χάρτες
The Data Science Process: From Mining Raw Data to Story Visualization
From Mining Raw Data to Story Visualization
Designing Scalable and Secure Microservices by Embracing DevOps-as-a-Service ...
Low-Cost Approximate and Adaptive Monitoring Techniques for the Internet of T...
Adam - Adaptive Monitoring in 5min
Low-Cost Adaptive Monitoring Techniques for the Internet of Things
AdaM: an Adaptive Monitoring Framework for Sampling and Filtering on IoT Devices
Find A Project
Cloud Elasticity and the CELAR Project
[ccgrid2014] JCatascopia: Monitoring Elastically Adaptive Applications in the...
[SummerSoc 2014] Monitoring Elastic Cloud Services

Recently uploaded (20)

PPTX
Introduction to Information and Communication Technology
PPTX
INTERNET------BASICS-------UPDATED PPT PRESENTATION
PDF
💰 𝐔𝐊𝐓𝐈 𝐊𝐄𝐌𝐄𝐍𝐀𝐍𝐆𝐀𝐍 𝐊𝐈𝐏𝐄𝐑𝟒𝐃 𝐇𝐀𝐑𝐈 𝐈𝐍𝐈 𝟐𝟎𝟐𝟓 💰
PPTX
Funds Management Learning Material for Beg
PDF
Testing WebRTC applications at scale.pdf
PPTX
artificial intelligence overview of it and more
PDF
Sims 4 Historia para lo sims 4 para jugar
PDF
Vigrab.top – Online Tool for Downloading and Converting Social Media Videos a...
PPT
tcp ip networks nd ip layering assotred slides
PDF
Automated vs Manual WooCommerce to Shopify Migration_ Pros & Cons.pdf
PPTX
presentation_pfe-universite-molay-seltan.pptx
PDF
Slides PDF The World Game (s) Eco Economic Epochs.pdf
PDF
WebRTC in SignalWire - troubleshooting media negotiation
PDF
Best Practices for Testing and Debugging Shopify Third-Party API Integrations...
PDF
Cloud-Scale Log Monitoring _ Datadog.pdf
PPTX
PptxGenJS_Demo_Chart_20250317130215833.pptx
PDF
Unit-1 introduction to cyber security discuss about how to secure a system
PPTX
Introuction about WHO-FIC in ICD-10.pptx
PPTX
Digital Literacy And Online Safety on internet
PPTX
CHE NAA, , b,mn,mblblblbljb jb jlb ,j , ,C PPT.pptx
Introduction to Information and Communication Technology
INTERNET------BASICS-------UPDATED PPT PRESENTATION
💰 𝐔𝐊𝐓𝐈 𝐊𝐄𝐌𝐄𝐍𝐀𝐍𝐆𝐀𝐍 𝐊𝐈𝐏𝐄𝐑𝟒𝐃 𝐇𝐀𝐑𝐈 𝐈𝐍𝐈 𝟐𝟎𝟐𝟓 💰
Funds Management Learning Material for Beg
Testing WebRTC applications at scale.pdf
artificial intelligence overview of it and more
Sims 4 Historia para lo sims 4 para jugar
Vigrab.top – Online Tool for Downloading and Converting Social Media Videos a...
tcp ip networks nd ip layering assotred slides
Automated vs Manual WooCommerce to Shopify Migration_ Pros & Cons.pdf
presentation_pfe-universite-molay-seltan.pptx
Slides PDF The World Game (s) Eco Economic Epochs.pdf
WebRTC in SignalWire - troubleshooting media negotiation
Best Practices for Testing and Debugging Shopify Third-Party API Integrations...
Cloud-Scale Log Monitoring _ Datadog.pdf
PptxGenJS_Demo_Chart_20250317130215833.pptx
Unit-1 introduction to cyber security discuss about how to secure a system
Introuction about WHO-FIC in ICD-10.pptx
Digital Literacy And Online Safety on internet
CHE NAA, , b,mn,mblblblbljb jb jlb ,j , ,C PPT.pptx

StreamSight - Query-Driven Descriptive Analytics for IoT and Edge Computing

  • 1. Query-Driven Descriptive Analytics for IoT and Edge Computing Moysis Symeonides*, Demetris Trihinas✝, Zacharias Georgiou*, George Pallis*, Marios D. Dikaiakos* IEEE International Conference on Cloud Engineering (IC2E 2019) *Department of Computer Science University of Cyprus ✝Department of Computer Science University of Nicosia
  • 2. D. Trihinas [email protected] Laboratory for Internet Computing StreamSight - IC2E 2019 Distributed Data Processing Engines 2 2 ● Frameworks like Hadoop and Spark are contributing to the democratization of big data analytics by hiding the complexity related to: ○ Machine communication and resource management -> dealing with the underlying infrastructure. ○ Task scheduling and supervision for analytic jobs. ○ Fault tolerance for both the infrastructure and execution state. ○ Monitoring and logging. ○ ...
  • 3. D. Trihinas [email protected] Laboratory for Internet Computing StreamSight - IC2E 2019 ● Transforming the physical world into an information system. ● 3.6 Billion IoT devices are being used daily1 with these devices projected to generate 500 ZB of data2 by the end of the year (2019). The Internet of Things 3 ● It only seems “natural” that IoT services offload analytic jobs to the cloud for data processing. ● But… IoT services usually come with near real-time requirements and moving data “centrally” for processing penalizes analytics timeliness. [1] Next big things in IoT predictions for 2020, ITPro, 2018 [2] Global Cloud Index, Cisco, 2018 Analytic Insights IoT services
  • 4. D. Trihinas [email protected] Laboratory for Internet Computing StreamSight - IC2E 2019 Edge Computing… Saving IoT Analytics 4 Cloud Analytic Insights IoT services The “Edge” ● Data processing now possible in place -or within- local network. ○ Shorter response times for latency critical IoT services. ○ More efficient processing by offloading “centralized” components. ● Possible because hardware for mobile/fog/edge is scaling-up1. ● But… bandwidth and battery capacity NOT scaling at same rate2. [1] EdgeIoT: Mobile Edge Computing for the Internet of Things, X. Sun et al, IEEE Communications, 2016. [2] Low-Cost Approximate and Adaptive Monitoring Techniques for the Internet of Things, D. Trihinas et al., IEEE, Trans. on Services Computing, 2018.
  • 5. D. Trihinas [email protected] Laboratory for Internet Computing StreamSight - IC2E 2019 IoT Analytics Over the Edge 5 Cloud Analytic Insights IoT services The “Edge” How to process enormous volumes of streaming data at the edge to provide query-driven analytic insights while also minimizing response times?
  • 6. D. Trihinas [email protected] Laboratory for Internet Computing StreamSight - IC2E 2019 6 Query-Driven Analytics Abstractions required for modelling knowledge extraction from data streams Challenge 1: Expressing (ad-hoc) analytic queries ● One must have specific knowledge of the programming model of the underlying processing engine. ... ... Compute the average of a metric using a 60s sliding window ● Queries are bounded to the underlying processing engine (query portability).
  • 7. D. Trihinas [email protected] Laboratory for Internet Computing StreamSight - IC2E 2019 7 Query-Driven Analytics ● A naive “edge” deployment can impose compute and communication penalties for intermediate recomputations and data exchange. Challenge 2: Geo-distributed deployments are the norm for IoT services not the exception dnR1 = data exchange and computation R1R2 = result exchange ...d1 dnR2 = d1 dnR1 = ...d1 Naive Deployment ... Re-using intermediate results ...+ ...+ ● Network bandwidth between geo-distributed entities is far from uniform. Pixida: Optimizing data parallel jobs in wide-area data analytics , K. Kloudas, VLDB, 2015. Mechanisms to avoid data movement and recomputations are needed
  • 8. D. Trihinas [email protected] Laboratory for Internet Computing StreamSight - IC2E 2019 Outline of Today’s Talk 8 8 ● IoT analytics over geo-distributed topologies. ● Abstract query model for query-driven IoT analytics. ● The StreamSight Framework ○ Query plan compilation. ○ Edge computing improvements. ○ Experimentation. ● Future research directions and open research questions.
  • 9. D. Trihinas [email protected] Laboratory for Internet Computing StreamSight - IC2E 2019 9 Abstract Query Model ● Queries are applied on metric streams with the intent to derive insights. ● Insights can be reused-transformed-composed with other metric streams to create new insights. <bus_id, bus99>, <bus_delay, 5> <bus_region, NW> ... Metric Record Metric Stream
  • 10. D. Trihinas [email protected] Laboratory for Internet Computing StreamSight - IC2E 2019 10 Abstract Query Model Insight = COMPUTE <Expression> EVERY <Interval> [WITH Optimizations>] COMPUTE ➢The composition, transformation and aggregation of multiple metric streams (e.g., expression, composite, aggregate). EVERY ➢Denotes the interval the expression is evaluated and can be a time interval (e.g., every 1min) or tuple-based (e.g., every 1000 records). WITH ➢Optional statement for capturing user-defined optimizations and constraints for data streams and edge topologies. Metric Stream Expression Insight Stream Metric Stream ... EVERY
  • 11. D. Trihinas [email protected] Laboratory for Internet Computing StreamSight - IC2E 2019 11 Smart City Bus Network edge server ● Buses equipped with GPS tracking devices emitting updates to respected local edge server of the current region it is navigating through. ● Bus updates include: bus id, location coordinates, operating city region, an estimation of the current bus route delay, etc. ● Inspired by Dublin smart city bus network.
  • 12. D. Trihinas [email protected] Laboratory for Internet Computing StreamSight - IC2E 2019 12 Insight Operations 1. Window Operations: Aggregation of values within a time period COMPUTE ARITHMETIC_MEAN(bus_delay, 10 MINUTES) EVERY 5 SECONDS Raw metric stream Time periodAggregate Time Interval COMPUTE ARITHMETIC_MEAN(bus_delay, 10 MINUTES) BY city_segment EVERY 5 SECONDS Group by a metric key Examples of Aggregates: sum, count, sdev, median, percentile,etc. Apache Spark 14 ops Apache Spark 15 Ops
  • 13. D. Trihinas [email protected] Laboratory for Internet Computing StreamSight - IC2E 2019 13 Insight Operations 2. Temporal Compositions: Compositions with different time windows COMPUTE ( ARITHMETIC_MEAN(bus_delay, 10 MINUTES) / ARITHMEIC_MEAN (bus_delay, 60 MINUTES) ) EVERY 5 SECONDS 3. Accumulated Compositions: Updates on previously computed data COMPUTE EWMA[0.85](passengers) BY bus_stop EVERY 1 TUPLE Examples: running_mean, running_max, running_sdev, etc. Apache Spark 32 ops Apache Spark 24 ops
  • 14. D. Trihinas [email protected] Laboratory for Internet Computing StreamSight - IC2E 2019 COMPUTE bus_delay WHEN > ( RUNNING_MEAN(bus_delay) + 3 * RUNNING_SDEV(bus_delay) ) BY city_segment EVERY 5 SECONDS; 14 Insight Operations 4. Hybrid Compositions: Combing window and accumulated operations COMPUTE ( ARITHMETIC_MEAN( bus_delay, 10 MINUTES) - EWMA[0.65]( bus_delay) ) BY city_segment EVERY 5 SECONDS 5. Filtered Compositions: Filter input and output streams Window Operation Accumulated Operation Filter Predicate Apache Spark 34 ops Apache Spark 41 ops
  • 15. D. Trihinas [email protected] Laboratory for Internet Computing StreamSight - IC2E 2019 15 Collaborative Edge Services ● Infrastructures of multiple stakeholders that are geographically distributed ● Inspired by publically available data from: ○ the New York transportation authority, ○ the Dublin smart city bus network and ○ Uber ● Endorsed with real-time weather data from open- access meteorological stations ● Companies, Employees and Clients can easily submit their queries
  • 16. D. Trihinas [email protected] Laboratory for Internet Computing StreamSight - IC2E 2019 16 Collaborative Edge Services COMPUTE vehicleID FROM (taxis, car_sharing) WHEN GEOHASH[10](cusLoc) == GEOHASH[10](vehLoc) EVERY 1 MINUTES Geo-analytic Queries Travel app user interested in finding closest taxis or car-sharing vehicles. Multiple Sources The city segment with least number of vehicles in a 15min sliding window when the temperature drops below 10◦C COMPUTE MIN( COUNT(buses, 15 MINUTES) BY city_segment + COUNT(taxis, 15 MINUTES) BY city_segment + COUNT(sharing, 15 MINUTES) BY city_segment ) WHEN temperature <= 10 EVERY 10 MINUTES COMPUTE TOP_K[5] ( MEAN(total_amount, 1 MONTH)- MEAN(total_amount, 1 MONTH, 1 MONTH ) ) BY city_segment EVERY 1 HOURS The top-5 city areas based on current and previous month average amount. 1 MONTH offset Data-driven suggestions
  • 17. D. Trihinas [email protected] Laboratory for Internet Computing StreamSight - IC2E 2019 Outline of Today’s Talk 17 17 ● IoT analytics over geo-distributed topologies. ● Abstract query model for query-driven IoT analytics. ● The StreamSight Framework ○ Query plan compilation. ○ Edge computing improvements. ○ Experimentation. ● Future research directions and open research questions.
  • 18. D. Trihinas [email protected] Laboratory for Internet Computing StreamSight - IC2E 2019 18 Specification, compilation, and execution of streaming IoT analytic queries on distributed processing engines optimized for edge computing environments. StreamSight Framework StreamSight: A Query-Driven Framework for Streaming Analytics in Edge Computing. Z. Georgiou et al, IEEE/ACM UCC, 2018. Currently Supporting Future Adapters ...
  • 19. D. Trihinas [email protected] Laboratory for Internet Computing StreamSight - IC2E 2019 COMPUTE ARITHMETIC_MEAN( bus_delay, 10 MINUTES) BY city_segment EVERY 5 SECONDS 19 Query Model Translation ● Nodes correspond to a grammar rule of the language ● Leaves are the tokens and symbols of the language Insight Description Abstract Syntax Tree ● Parser performs early validation to verify syntactic correctness of query.
  • 20. D. Trihinas [email protected] Laboratory for Internet Computing StreamSight - IC2E 2019 ● Constructs Query Execution Plan, assembling the pipeline of stream operations from the AST representation. 20 Compilation Phase ● A recursive algorithm traverses the AST ● Each node is mapped to a stream operation of the underlying processing engine Abstract Syntax Tree ● Naive AST Mapping... extremely inefficient by ignoring geo-distributed nature of edge realms ○ Unnecessary intermediate re-computations ○ Increased data movement ● AST must acknowledge these. ... ...
  • 21. D. Trihinas [email protected] Laboratory for Internet Computing StreamSight - IC2E 2019 - System Optimizations 21 Reusing intermediate results ● StreamSight caches and broadcasts across worker nodes expressions, composites and results to reduce unnecessary re-computations. Insight 1: Calculate current average bus_delay Insight 2: Calculate the ratio between current and last hour bus_delay
  • 22. D. Trihinas [email protected] Laboratory for Internet Computing StreamSight - IC2E 2019 User Optimizations 22 [1] ApproxIoT: Approximate Analytics for Edge Computing, Z. Wen et al, ICDCS, 2018 Sampling enables the execution of an insight description on a portion of the streamed measurements for approximate but in time answers (k <<N) ● Uniform Sampling ● Weighted Hierarchical Reservoir Sampling (WHRS)1 ● Applies on the fly reservoir + stratified sampling StreamSight allows the user to prioritize insights ● On high-load influx or network uncertainties critical queries are not delayed while less important are queued.
  • 23. D. Trihinas [email protected] Laboratory for Internet Computing StreamSight - IC2E 2019 23 User Optimizations COMPUTE MAX(taxis_fare_amount, 60 MINUTES) BY city_segment EVERY 1 MINUTES WITH SALIENCE 1 Priority Higher is better Sampling with Error Margin & Confidence: COMPUTE ARITHMETIC_MEAN(taxi_passengers, 10 MINUTES) EVERY 30 SECONDS WITH MAX_ERROR 0.05 AND CONFIDENCE 0.95 Error upper bound Confidence Interval COMPUTE ARITHMETIC_MEAN(bus_delay, 60 MINUTES) BY stop_id EVERY 5 MINUTES WITH SALIENCE 1 AND SAMPLE 0.2 Prioritization On high-load influx critical queries are not delayed Uniform Sampling Query execution on a portion of the data stream Query execution with bounded error guarantees for sampling
  • 24. D. Trihinas [email protected] Laboratory for Internet Computing StreamSight - IC2E 2019 24 User Optimizations COMPUTE COUNT(taxis) BY city_segment EVERY 1 SECONDS WITH ALLOW ON DEDICATED[5] Dedicated Execution Number of Dedicated Nodes COMPUTE PEWMA[0.5](bus_delay) BY bus_id EVERY 30 SECONDS WITH MAX_ERROR 0.05 AND CONFIDENCE 0.95 AND AWARENESS ON COMPUTATIONS Try to minimize the Computations Try to maximize the Accuracy Awareness on Computations Accuracy Aware Execution COMPUTE PEWMA[0.5](bus_delay) BY bus_id EVERY 30 SECONDS WITH MAX_ERROR 0.05 AND CONFIDENCE 0.95 AND AWARENESS ON ACCURACY Execution of crucial queries on dedicated Nodes Minimize the computation footprint of execution for less significant queries but at the same time keep the error less than 5% Only in high influx periods sacrifice a portion of the accuracy but keep the error less than 5%
  • 25. D. Trihinas [email protected] Laboratory for Internet Computing StreamSight - IC2E 2019 25 Evaluation
  • 26. D. Trihinas [email protected] Laboratory for Internet Computing StreamSight - IC2E 2019 26 Dublin Bus Workload Real-World Datasets ● Dublin Smart City Buses Network[1] ○ 968 Buses (Jan 2014) ○ 16 metrics/record, including: bus_id, bus_delay, city_segment ○ Used 7 insights of actual interest for Bus operators [1] Dublin, “Smart City ITS,” https://data.smartdublin.ie/, 2018 16 Edge servers ● 1 vCPU, 1GB MEM, 2↑ 16↓ Mbps Evaluation Metric ● Batch Processing Time Unstable System Stable System ➢StreamSight achieved x1.4 speedup over the baseline ➢StreamSight+WHRS achieved x4.3 speedup over the baseline
  • 27. D. Trihinas [email protected] Laboratory for Internet Computing StreamSight - IC2E 2019 27 Re-usage of Intermediate Results ● Dublin Bus Workload ● Average Processing Time ( Fixed Input rate 700 req/s ) StreamSight DOES NOT incur a performance overhead Baseline configuration failed
  • 28. D. Trihinas [email protected] Laboratory for Internet Computing StreamSight - IC2E 2019 Outline of Today’s Talk 28 28 ● IoT analytics over geo-distributed topologies. ● Abstract query model for query-driven IoT analytics. ● The StreamSight Framework ○ Query plan compilation. ○ Edge computing improvements. ○ Experimentation. ● Future research directions and open research questions.
  • 29. D. Trihinas [email protected] Laboratory for Internet Computing StreamSight - IC2E 2019 29 ● Same composition across different insights - different queries but with common operators. ● Same operators across different compositions - e.g., MEAN, is composed from a SUM divided by a COUNT. If either SUM or COUNT available then reuse them. ● Same composition across different offsets1 ● Re-use insights across users - involves tracking shared results across deployments and users, privacy protection, etc. (possibly use of blockchain?) COMPUTE ARITHMETIC_MEAN(consumption, 10 MINUTES)/ ARITHMETIC_MEAN(consumption, 10 MINUTES, 10 MINUTES) EVERY 15 MINUTES we can cache and reuse the composition for 10 minutes Reusage of Intermediate Results [1] SlickDeque: High Throughput and Low Latency Incremental Sliding-Window Aggregation. A. Shein et al, EDBT, 2018.
  • 30. D. Trihinas [email protected] Laboratory for Internet Computing StreamSight - IC2E 2019 30 ● Query model operators: DEDICATED, SALIENCE, ALLOW ON, AWARENESS, etc. ● Still… fog-device-user mobility and network uncertainties affect IoT services QoS, cost, and energy consumption. ● Analytics job scheduling requires “intelligent” consideration of data placement when orchestrating dynamic IoT services. ● Ignoring this can result in IoT services placed for optimal responsiveness but failing to guarantee timely insight refreshment. Query Execution Placement ADMin: Adaptive Monitoring Dissemination for the Internet of Things, D. Trihinas et al., IEEE,INFOCOM, 2017.
  • 31. D. Trihinas [email protected] Laboratory for Internet Computing StreamSight - IC2E 2019 31 ● Moving to the “edge” means not only are data sources diverse but possibly even the data processing engines. ● These engines must “speak” the same language. ● Open specification vs federation layer? Multiple and Heterogeneous Data Processing Engines OpenFog Consortium and OpenEdge Initiative
  • 32. D. Trihinas [email protected] Laboratory for Internet Computing StreamSight - IC2E 2019 32 ● Do we always need to actually compute the answer on the entire data? ○ Sampling… ○ Yes, but we need bounded approximations… and these approximations must be computed efficiently across geo-distributed environments. ■ Beware… substituting one computation with another must be beneficial in terms of performance (e.g., multivariate and dependent metrics)1. ● Do we always need to actually compute the answer? ○ or... can we use a bounded approximation on recent history be satisfactory2. Data-less Query Execution [1] ATMoN: Adapting the ”Temporality” in Large-Scale Dynamic Networks, D Trihinas et al, IEEE ICDCS, 2018. [2] Towards intelligent distributed data systems for scalable efficient and accurate analytics, P. Triantafyllou et al, IEEE ICDCS, 2018.
  • 33. D. Trihinas [email protected] Laboratory for Internet Computing StreamSight - IC2E 2019 33 ● Query model provides provisions for data confidentiality, restricted access control and data movement constraints across geo-locations. ● Offloading sensitive data to the cloud hinders man-in-the-middle attacks… on the other hand… processing “in place” hinders attacks (e.g., DDoS) on “easier” attacking planes (e.g., low-power IoT devices). ● Query model NOT enough… geo-distributed analytics requires task scheduling algorithms to acknowledge privacy-aware compute… How to do this efficiently? Security & Privacy COMPUTE patient_stream EVERY 5 MINUTES WITH ALLOW WHEN MEAN( heart_beat, 1 MINUTES ) >= 190 AND doctor_id IN (doctor_ids) AND region == clinic_region Evaluation Rule
  • 34. D. Trihinas [email protected] Laboratory for Internet Computing StreamSight - IC2E 2019 Conclusion 34 ● Abstract query model for query-driven IoT analytics ○ Use cases (smart city, energy, health, microservices) illustrating value of the query model. ● A prototype framework called StreamSight ○ A framework for the specification, compilation, and execution of streaming analytic queries on the “Edge” . ○ Optimizations: ■ Intermediate results ■ User-optimizations ○ StreamSight can achieve up to 4.3x speedup compared to a naively deployment. ● Many open research challenges for geo-distributed and query-driven analytics in edge/fog topologies. Reduce compute and network load on the Edge
  • 35. D. Trihinas [email protected] Laboratory for Internet Computing StreamSight - IC2E 2019 THANK YOU This work is partially supported by the European Commission in terms of Unicorn 731846 H2020 project (H2020-ICT-2016-1) Download StreamSight at: https://github.com/UCY- LINC-LAB/StreamSight.git 35
  • 36. D. Trihinas [email protected] Laboratory for Internet Computing StreamSight - IC2E 2019 36 Energy Consumption in Micro-DCs ● Micro-DCs, also denoted as Green-DCs, powered by: ○ National electricity providers and ○ Photovoltaic power harvesting stations placed near to the DCs ● A wide range of sensors are placed in all datacenter racks and the photovoltaic stations which generates measurements like: ○ Temperature and Energy consumption per Data Center, per Rack or per Node ○ Energy generation per Photovoltaic Panel ○ Weather data from station like humidity, wind, temperature etc ● Inspired by ENEDI project http://enedi.eu ENEDI: Energy Saving in Datacenters, Tryfonos et al, IEEE Global IoT, 2018.
  • 37. D. Trihinas [email protected] Laboratory for Internet Computing StreamSight - IC2E 2019 ProcessingTime(s) 37 Insight Prioritization ● Dublin Bus Workload ● Average Processing Time (fixed workload) ● 1 Insight with high priority and 3 insights with low priority Non prioritized queries are queued Introduced artificial latency (x2) between worker nodes Prioritized insight experiences no delay