SlideShare a Scribd company logo
BASEL BERN BRUGG DÜSSELDORF FRANKFURT A.M. FREIBURG I.BR. GENF
HAMBURG KOPENHAGEN LAUSANNE MÜNCHEN STUTTGART WIEN ZÜRICH
Ingesting and Processing IoT Data -
using MQTT, Kafka Connect and KSQL
Guido Schmutz
Kafka Summit 2018 – 16.10.2018
@gschmutz guidoschmutz.wordpress.com
Guido Schmutz
Working at Trivadis for more than 21 years
Oracle ACE Director for Fusion Middleware and SOA
Consultant, Trainer Software Architect for Java, Oracle, SOA and
Big Data / Fast Data
Head of Trivadis Architecture Board
Technology Manager @ Trivadis
More than 30 years of software development experience
Contact: guido.schmutz@trivadis.com
Blog: http://guidoschmutz.wordpress.com
Slideshare: http://www.slideshare.net/gschmutz
Twitter: gschmutz
Agenda
1. Introduction
2. IoT Logistics use case – Kafka Ecosystem "in Action”
3. Stream Data Integration – IoT Device to Kafka over MQTT
4. Stream Analytics with KSQL
5. Summary
Introduction
Hadoop Clusterd
Hadoop Cluster
Big Data
Reference Architecture for Data Analytics Solutions
SQL
Search
Service
BI Tools
Enterprise Data
Warehouse
Search / Explore
File Import / SQL Import
Event
Hub
D
ata
Flow
D
ata
Flow
Change DataCapture Parallel
Processing
Storage
Storage
RawRefined
Results
SQL
Export
Microservice State
{ }
API
Stream
Processor
State
{ }
API
Event
Stream
Event
Stream
Search
Service
Stream Analytics
Microservices
Enterprise Apps
Logic
{ }
API
Edge Node
Rules
Event Hub
Storage
Bulk Source
Event Source
Location
DB
Extract
File
DB
IoT
Data
Mobile
Apps
Social
Event Stream
Telemetry
Hadoop Clusterd
Hadoop Cluster
Big Data
Reference Architecture for Data Analytics Solutions
SQL
Search
Service
BI Tools
Enterprise Data
Warehouse
Search / Explore
File Import / SQL Import
Event
Hub
D
ata
Flow
D
ata
Flow
Change DataCapture Parallel
Processing
Storage
Storage
RawRefined
SQL
Export
Microservice State
{ }
API
Event
Stream
Event
Stream
Search
Service
Microservices
Enterprise Apps
Logic
{ }
API
Edge Node
Rules
Event Hub
Storage
Bulk Source
Event Source
Location
DB
Extract
File
IoT
Data
Mobile
Apps
Social
Event Stream
Telemetry
Stream
Processor
State
{ }
API
Stream Analytics
Results
DB
Two Types of Stream Processing
(from Gartner)
Stream Data Integration
• Primarily cover streaming ETL
• Integration of data source and data sinks
• Filter and transform data
• (Enrich data)
• Route data
Stream Analytics
• analytics use cases
• calculating aggregates and detecting
patterns to generate higher-level, more
relevant summary information (complex
events => used to be CEP)
• Complex events may signify threats or
opportunities that require a response
Stream Integration and Stream Analytics with Kafka
Source
Connector
trucking_
driver
Kafka Broker
Sink
Connector
Stream
Processing
Stream Data Integration and Stream Analytics with
Kafka
Source
Connector
trucking_
driver
Kafka Broker
Sink
Connector
Stream
Processing
Hadoop Clusterd
Hadoop Cluster
Big Data
Unified Architecture for Modern Data Analytics Solutions
SQL
Search
Service
BI Tools
Enterprise Data
Warehouse
Search / Explore
File Import / SQL Import
Event
Hub
D
ata
Flow
D
ata
Flow
Change DataCapture Parallel
Processing
Storage
Storage
RawRefined
Results
SQL
Export
Microservice State
{ }
API
Stream
Processor
State
{ }
API
Event
Stream
Event
Stream
Search
Service
Stream Analytics
Microservices
Enterprise Apps
Logic
{ }
API
Edge Node
Rules
Event Hub
Storage
Bulk Source
Event Source
Location
DB
Extract
File
DB
IoT
Data
Mobile
Apps
Social
Event Stream
Telemetry
Various IoT Data Protocols
• MQTT (Message Queue Telemetry Transport)
• CoaP
• AMQP
• DDS (Data Distribution Service)
• STOMP
• REST
• WebSockets
• …
IoT Logistics use case – Kafka
Ecosystem "in Action"
Demo - IoT Logistics Use Case
Trucks are sending driving info and geo-position
data in one single message
Position &
Driving Info
Testdata-Generator originally by Hortonworks
{"timestamp":1537343400827,"truckId":87,
"driverId":13,"routeId":987179512,"eventType":"Normal",
,"latitude":38.65,"longitude":-90.21, "correlationId":"-
3208700263746910537"}
{
"timestamp":1537343400827,
"truckId":87,
"driverId":13,
"routeId":987179512,
"eventType":"Normal",
"latitude":38.65,
"longitude":-90.21,
"correlationId":"-32087002637”
}
?
Stream Data Integration – IoT
Device to Kafka over MQTT
Stream Data Integration
Source
Connector
trucking_
driver
Kafka Broker
Sink
Connector
Stream
Processing
(I) IoT Device sends data via MQTT
Message Queue Telemetry Transport (MQTT)
Pub/Sub architecture with Message Broker
Built in retry / QoS mechanism
Last Will and Testament (LWT)
Not all MQTT brokers are scalable
Available
Does not provide state (history)
truck/nn/
position
{"timestamp":1537343400827,"truckId":87,
"driverId":13,"routeId":987179512,"eventType":"Normal",
,"latitude":38.65,"longitude":-90.21, "correlationId":"-
3208700263746910537"}
Position &
Driving Info
MQTT to Kafka using Confluent MQTT Connector
IoT Device sends data via MQTTs – how to get the data
into Kafka?
truck
position
truck/nn/
position
?
{"timestamp":1537343400827,"truckId":87,
"driverId":13,"routeId":987179512,"eventType":"Normal",
,"latitude":38.65,"longitude":-90.21, "correlationId":"-
3208700263746910537"}
Position &
Driving Info
2 Ways for MQTT with Confluent Streaming Platform
Confluent MQTT Connector (Preview)
• Pull-based
• integrate with (existing) MQTT servers
• can be used both as a Source and Sink
• output is an envelope with all of the
properties of the incoming message
• Value: body of MQTT message
• Key: is the MQTT topic the message was
written to
• Can consume multiple MQTT topics and write to
one single Kafka topic
• RegexRouter SMT can be used to change topic
names
Confluent MQTT Proxy
• Push-based
• enables MQTT clients to use the MQTT
protocol to publish data directly to Kafka
• MQTT Proxy is stateless and independent
of other instances
• simple mapping scheme of MQTT topics to
Kafka topics based on regular expressions
• reduced lag in message publishing
compared to traditional MQTT brokers
(II) MQTT to Kafka using Confluent MQTT Connector
truck/nn/
position
mqtt to
kafka
truck_position kafkacat
{"timestamp":1537343400827,"truckId":87,
"driverId":13,"routeId":987179512,"eventType":"Normal",
,"latitude":38.65,"longitude":-90.21, "correlationId":"-
3208700263746910537"}
Position &
Driving Info
Confluent MQTT Connector
Currently available as a Preview on Confluent Hub
Setup plugin.path to specify the additional folder
confluent-hub install confluentinc/kafka-connect-mqtt:1.0.0-preview
plugin.path=/usr/share/java,/etc/kafka-connect/custom-plugins,
/usr/share/confluent-hub-components
Create an instance of Confluent MQTT Connector
#!/bin/bash
curl -X "POST" "http://192.168.69.138:8083/connectors" 
-H "Content-Type: application/json" 
-d $'{
"name": "mqtt-source",
"config": {
"connector.class": "io.confluent.connect.mqtt.MqttSourceConnector",
"tasks.max": "1",
"name": "mqtt-source",
"mqtt.server.uri": "tcp://mosquitto:1883",
"mqtt.topics": "truck/+/position",
"kafka.topic":"truck_position",
"mqtt.clean.session.enabled":"true",
"mqtt.connect.timeout.seconds":"30",
"mqtt.keepalive.interval.seconds":"60",
"mqtt.qos":"0"
}
}'
MQTTProxy
(III) MQTT to Kafka using Confluent MQTT Proxy
truck
position
engine metrics
console
consumer
Engine
Metrics
{"timestamp":1537343400827,"truckId":87,
"driverId":13,"routeId":987179512,"eventType":"Normal",
,"latitude":38.65,"longitude":-90.21, "correlationId":"-
3208700263746910537"}
Position &
Driving Info
Configure MQTT Proxy
Configure MQTT Proxy
Start MQTT Proxy
topic.regex.list=truck_position:.*position,
engine_metric:.*engine_metric
listeners=0.0.0.0:1883
bootstrap.servers=PLAINTEXT://broker-1:9092
confluent.topic.replication.factor=1
bin/kafka-mqtt-start kafka-mqtt.properties
MQTTProxy
MQTT Connector vs. MQTT Proxy
MQTT Connector
• Pull-based
• Use existing MQTT infrastructures
• Bi-directional
MQTT Proxy
• Push-based
• Does not provide all MQTT functionality
• Only uni-directional
Position
Position
Position
truck/nn/
driving info
mqtt to
kafka
truck
driving info
truck/nn/
position
mqtt to
kafka
truck
position
Position
Position
Position
truck/nn/
driving info
mqtt to
kafka
truck/nn/
position
mqtt to
kafka
Position
Position
Position
truck
driving info
truck
position
Position
Position
Position
REGION-1 DC
REGION-2 DC
REGION-1 DC
REGION-2 DC
Headquarter DC
Headquarter DC
(IV) MQTT to Kafka using StreamSets Data Collector
truck/nn/
position
mqtt to
kafka
truck_position
console
consumer
{"timestamp":1537343400827,"truckId":87,
"driverId":13,"routeId":987179512,"eventType":"Normal",
,"latitude":38.65,"longitude":-90.21, "correlationId":"-
3208700263746910537"}
Position &
Driving Info
MQTT to Kafka using StreamSets Data Collector
MQTT
Proxy
Wait … there is more ….
truck/nn/
position
mqtt to
kafka
truck_driving
info
truck_position
console
consumer
what about some
analytics ?
console
consumer
{"timestamp":1537343400827,"truckId":87,
"driverId":13,"routeId":987179512,"eventType":"Normal",
,"latitude":38.65,"longitude":-90.21, "correlationId":"-
3208700263746910537"}
Position &
Driving Info
Position &
Driving Info
{"timestamp":1537343400827,"truckId":87,
"driverId":13,"routeId":987179512,"eventType":"Normal",
,"latitude":38.65,"longitude":-90.21, "correlationId":"-
3208700263746910537"}
Stream Analytics with KSQL
Stream Analytics
Source
Connector
trucking_
driver
Kafka Broker
Sink
Connector
Stream
Processing
KSQL - Terminology
Stream
• “History”
• an unbounded sequence of structured data
("facts")
• Facts in a stream are immutable
• new facts can be inserted to a stream
• existing facts can never be updated or
deleted
• Streams can be created from a Kafka topic
or derived from an existing stream
Table
• “State”
• a view of a stream, or another table, and
represents a collection of evolving facts
• Facts in a table are mutable
• new facts can be inserted to the table
• existing facts can be updated or deleted
• Tables can be created from a Kafka topic or
derived from existing streams and tables
Enables stream processing with zero coding required
The simplest way to process streams of data in real-time
(V) Create STREAM on truck_position and use it in
KSQL CLI
truck/nn/
position
mqtt-to-
kafka
truck-
position
Stream
{"timestamp":1537343400827,"truckId":87,
"driverId":13,"routeId":987179512,"eventType":"Normal",
,"latitude":38.65,"longitude":-90.21, "correlationId":"-
3208700263746910537"}
Position &
Driving Info
KSQL CLI
Create a STREAM on truck_driving_info
ksql> CREATE STREAM truck_driving_info_s 
(ts VARCHAR, 
truckId VARCHAR, 
driverId BIGINT, 
routeId BIGINT, 
eventType VARCHAR, 
latitude DOUBLE, 
longitude DOUBLE, 
correlationId VARCHAR) 
WITH (kafka_topic='truck_driving_info', 
value_format=‘JSON');
Message
----------------
Stream created
Create a STREAM on truck_driving_info
ksql> describe truck_position_s;
Field | Type
---------------------------------
ROWTIME | BIGINT
ROWKEY | VARCHAR(STRING)
TS | VARCHAR(STRING)
TRUCKID | VARCHAR(STRING)
DRIVERID | BIGINT
ROUTEID | BIGINT
EVENTTYPE | VARCHAR(STRING)
LATITUDE | DOUBLE
LONGITUDE | DOUBLE
CORRELATIONID | VARCHAR(STRING)
KSQL - SELECT
Selects rows from a KSQL stream or table
Result of this statement will not be persisted in a Kafka topic and will only be printed out
in the console
from_item is one of the following: stream_name, table_name
SELECT select_expr [, ...]
FROM from_item
[ LEFT JOIN join_table ON join_criteria ]
[ WINDOW window_expression ]
[ WHERE condition ]
[ GROUP BY grouping_expression ]
[ HAVING having_expression ]
[ LIMIT count ];
Use SELECT to browse from Stream
ksql> SELECT * FROM truck_driving_info_s;
1539711991642 | truck/24/position | null | 24 | 10 | 1198242881 | Normal |
36.84 | -94.83 | -6187001306629414077
1539711991691 | truck/26/position | null | 26 | 13 | 1390372503 | Normal |
42.04 | -88.02 | -6187001306629414077
1539711991882 | truck/66/position | null | 66 | 22 | 1565885487 | Normal |
38.33 | -94.35 | -6187001306629414077
1539711991902 | truck/22/position | null | 22 | 26 | 1198242881 | Normal |
36.73 | -95.01 | -6187001306629414077
ksql> SELECT * FROM truck_position_s WHERE eventType != 'Normal';
1539712101614 | truck/67/position | null | 67 | 11 | 160405074 | Lane
Departure | 38.98 | -92.53 | -6187001306629414077
1539712116450 | truck/18/position | null | 18 | 25 | 987179512 | Overspeed
| 40.76 | -88.77 | -6187001306629414077
1539712120102 | truck/31/position | null | 31 | 12 | 927636994 | Unsafe
following distance | 38.22 | -91.18 | -6187001306629414077
(VI) – CREATE AS … SELECT …
detect_dangero
us_driving
truck/nn/
position
mqtt-to-
kafka
truck-
position
Stream
Stream
Dangerous-
driving
{"timestamp":1537343400827,"truckId":87,
"driverId":13,"routeId":987179512,"eventType":"Normal",
,"latitude":38.65,"longitude":-90.21, "correlationId":"-
3208700263746910537"}
Position &
Driving Info
CREATE STREAM … AS SELECT …
Create a new KSQL table along with the corresponding Kafka topic and stream the
result of the SELECT query as a changelog into the topic
WINDOW clause can only be used if the from_item is a stream
CREATE STREAM stream_name
[WITH ( property_name = expression [, ...] )]
AS SELECT select_expr [, ...]
FROM from_stream [ LEFT | FULL | INNER ]
JOIN [join_table | join_stream]
[ WITHIN [(before TIMEUNIT, after TIMEUNIT) | N TIMEUNIT] ] ON join_criteria
[ WHERE condition ]
[PARTITION BY column_name];
INSERT INTO … AS SELECT …
Stream the result of the SELECT query into an existing stream and its underlying topic
schema and partitioning column produced by the query must match the stream’s
schema and key
If the schema and partitioning column are incompatible with the stream, then the
statement will return an error
stream_name and from_item must both
refer to a Stream. Tables are not supported!
CREATE STREAM stream_name ...;
INSERT INTO stream_name
SELECT select_expr [., ...]
FROM from_stream
[ WHERE condition ]
[ PARTITION BY column_name ];
CREATE AS … SELECT …
ksql> CREATE STREAM dangerous_driving_s 
WITH (kafka_topic= dangerous_driving_s', 
value_format='JSON') 
AS SELECT * FROM truck_position_s 
WHERE eventtype != 'Normal';
Message
----------------------------
Stream created and running
ksql> select * from dangerous_driving_s;
1539712399201 | truck/67/position | null | 67 | 11 | 160405074 | Unsafe
following distance | 38.65 | -90.21 | -6187001306629414077
1539712416623 | truck/67/position | null | 67 | 11 | 160405074 | Unsafe
following distance | 39.1 | -94.59 | -6187001306629414077
1539712430051 | truck/18/position | null | 18 | 25 | 987179512 | Lane
Departure | 35.1 | -90.07 | -6187001306629414077
Windowing
streams are unbounded
need some meaningful time frames to do
computations (i.e. aggregations)
Computations over events done using
windows of data
Windows are tracked per unique key
Fixed Window Sliding Window Session Window
Time
Stream of Data Window of Data
(VII) Aggregate and Window
detect_dangero
us_driving
truck/nn/
position
mqtt-to-
kafka
truck-
position
Stream
Stream
Dangerous-
driving
count_by_
eventType
Table
Dangergous-
driving-count
{"timestamp":1537343400827,"truckId":87,
"driverId":13,"routeId":987179512,"eventType":"Normal",
,"latitude":38.65,"longitude":-90.21, "correlationId":"-
3208700263746910537"}
Position &
Driving Info
SELECT COUNT … GROUP BY
ksql> CREATE TABLE dangerous_driving_count AS 
SELECT eventType, count(*) nof 
FROM dangerous_driving_s 
WINDOW TUMBLING (SIZE 30 SECONDS) 
GROUP BY eventType;
Message
----------------------------
Table created and running
ksql> SELECT TIMESTAMPTOSTRING(ROWTIME, 'yyyy-MM-dd HH:mm:ss.SSS’),
eventType, nof
FROM dangerous_driving_count;;
2018-10-16 05:12:19.408 | Unsafe following distance | 1
2018-10-16 05:12:38.926 | Unsafe following distance | 1
2018-10-16 05:12:39.615 | Unsafe tail distance | 1
2018-10-16 05:12:43.155 | Overspeed | 1
Joining
Stream to Static (Table) Join Stream to Stream Join (one window join)
Stream to Stream Join (two window join)
Stream-to-
Static Join
Stream-to-
Stream
Join
Stream-to-
Stream
Join
TimeTime
Time
(VIII) – Join Table to enrich with Driver data
Truck
Driver
kdbc-to-
kafka
truck-
driver
27, Walter, Ward, Y,
24-JUL-85, 2017-10-
02 15:19:00
Table
join dangerous-
driving & driver
Stream
Dangerous-
driving & driver
detect_dangero
us_driving
truck/nn/
position
mqtt-to-
kafka
truck-
position
Stream
Stream
Dangerous-
driving
count_by_
eventType
Table
Dangergous-
driving-count
{"id":27,"firstName":"Walter","lastName":"W
ard","available":"Y","birthdate":"24-JUL-
85","last_update":1506923052012}
{"timestamp":1537343400827,"truckId":87,
"driverId":13,"routeId":987179512,"eventType":"Normal",
,"latitude":38.65,"longitude":-90.21, "correlationId":"-
3208700263746910537"}
Position &
Driving Info
Join Table to enrich with Driver data
#!/bin/bash
curl -X "POST" "http://192.168.69.138:8083/connectors" 
-H "Content-Type: application/json" 
-d $'{
"name": "jdbc-driver-source",
"config": {
"connector.class": "JdbcSourceConnector",
"connection.url":"jdbc:postgresql://db/sample?user=sample&password=sample",
"mode": "timestamp",
"timestamp.column.name":"last_update",
"table.whitelist":"driver",
"validate.non.null":"false",
"topic.prefix":"truck_",
"key.converter":"org.apache.kafka.connect.json.JsonConverter",
"key.converter.schemas.enable": "false",
"value.converter":"org.apache.kafka.connect.json.JsonConverter",
"value.converter.schemas.enable": "false",
"name": "jdbc-driver-source",
"transforms":"createKey,extractInt",
"transforms.createKey.type":"org.apache.kafka.connect.transforms.ValueToKey",
"transforms.createKey.fields":"id",
"transforms.extractInt.type":"org.apache.kafka.connect.transforms.ExtractField$Key",
"transforms.extractInt.field":"id"
}
}'
Create Table with Driver State
ksql> CREATE TABLE driver_t 
(id BIGINT, 
first_name VARCHAR, 
last_name VARCHAR, 
available VARCHAR) 
WITH (kafka_topic='truck_driver', 
value_format='JSON', 
key='id');
Message
----------------
Table created
Create Table with Driver State
ksql> CREATE STREAM dangerous_driving_and_driver_s 
WITH (kafka_topic='dangerous_driving_and_driver_s', 
value_format='JSON’, partitions=8) 
AS SELECT driverId, first_name, last_name, truckId, routeId, eventtype,
latitude, longitude 
FROM truck_position_s 
LEFT JOIN driver_t 
ON dangerous_driving_and_driver_s.driverId = driver_t.id;
Message
----------------------------
Stream created and running
ksql> select * from dangerous_driving_and_driver_s;
1539713095921 | 11 | 11 | Micky | Isaacson | 67 | 160405074 | Lane Departure |
39.01 | -93.85
1539713113254 | 11 | 11 | Micky | Isaacson | 67 | 160405074 | Unsafe following
distance | 39.0 | -93.65
(IX) – Custom UDF for calculating Geohash
Truck
Driver
kdbc-to-
kafka
truck-
driver
27, Walter, Ward, Y,
24-JUL-85, 2017-10-
02 15:19:00
Table
join dangerous-
driving & driver
Stream
Dangerous-
driving & driver
detect_dangero
us_driving
truck/nn/
position
mqtt-to-
kafka
truck-
position
Stream
Stream
Dangerous-
driving
count_by_
eventType
Table
Dangergous-
driving-count
{"id":27,"firstName":"Walter","lastName":"W
ard","available":"Y","birthdate":"24-JUL-
85","last_update":1506923052012}
{"timestamp":1537343400827,"truckId":87,
"driverId":13,"routeId":987179512,"eventType":"Normal",
,"latitude":38.65,"longitude":-90.21, "correlationId":"-
3208700263746910537"}
Position &
Driving Info
dangerous
driving by geo
Stream
dangerous-
drving-geohash
Custom UDF for calculating Geohashes
Geohash is a geocoding which encodes a
geographic location into a short string of letters
and digits
hierarchical spatial data structure which
subdivides space into buckets of grid shape
Length Area width x height
1 5,009.4km x 4,992.6km
2 1,252.3km x 624.1km
3 156.5km x 156km
4 39.1km x 19.5km
5 39.1km x 19.5km
12 3.7cm x 1.9cm
ksql> SELECT latitude, longitude, 
geohash(latitude, longitude, 4) 
FROM dangerous_driving_s;
38.31 | -91.07 | 9yz1
37.7 | -92.61 | 9ywn
34.78 | -92.31 | 9ynm
42.23 | -91.78 | 9zw8xw
...
http://geohash.gofreerange.com/
Add an UDF sample
Geohash and join to some important messages for drivers
@UdfDescription(name = "geohash",
description = "returns the geohash for a given LatLong")
public class GeoHashUDF {
@Udf(description = "encode lat/long to geohash of specified length.")
public String geohash(final double latitude, final double longitude,
int length) {
return GeoHash.encodeHash(latitude, longitude, length);
}
@Udf(description = "encode lat/long to geohash.")
public String geohash(final double latitude, final double longitude) {
return GeoHash.encodeHash(latitude, longitude);
}
}
Summary
Summary
Two ways to bring in MQTT data => MQTT Connector or MQTT Proxy
KSQL is another way to work with data in Kafka => you can (re)use some of your SQL
knowledge
• Similar semantics to SQL, but is for queries on continuous, streaming data
Well-suited for structured data (there is the "S" in KSQL)
There is more
• Stream to Stream Join
• REST API for executing KSQL
• Avro Format & Schema Registry
• Using Kafka Connect to write results to data stores
• …
Choosing the Right API
• Java, c#, c++, scala,
phyton, node.js,
go, php …
• subscribe()
• poll()
• send()
• flush()
• Anything Kafka
• Fluent Java API
• mapValues()
• filter()
• flush()
• Stream Analytics
• SQL dialect
• SELECT … FROM …
• JOIN ... WHERE
• GROUP BY
• Stream Analytics
Consumer,
Producer API
Kafka Streams KSQL
• Declarative
• Configuration
• REST API
• Out-of-the-box
connectors
• Stream Integration
Kafka Connect
Flexibility Simplicity
Source: adapted from Confluent
Technology on its own won't help you.
You need to know how to use it properly.

More Related Content

What's hot (20)

PPTX
App Modernization with Microsoft Azure
Microsoft Tech Community
 
PDF
OpenStack Ironic - Bare Metal-as-a-Service
Ramon Acedo Rodriguez
 
PDF
[OpenInfra Days Korea 2018] (Track 2) Neutron LBaaS 어디까지 왔니? - Octavia 소개
OpenStack Korea Community
 
PDF
Red Hat OpenShift Container Platform Overview
James Falkner
 
PPTX
Azure kubernetes service (aks)
Akash Agrawal
 
PPTX
Azure Container Services
WinWire Technologies Inc
 
PPTX
1- Introduction of Azure data factory.pptx
BRIJESH KUMAR
 
PDF
[OpenStack] 공개 소프트웨어 오픈스택 입문 & 파헤치기
Ian Choi
 
PPTX
Azure Synapse Analytics Overview (r2)
James Serra
 
PPTX
Nginx Reverse Proxy with Kafka.pptx
wonyong hwang
 
PDF
Best Practices for Streaming IoT Data with MQTT and Apache Kafka®
confluent
 
PPTX
Azure Networking (1).pptx
Razith2
 
PPTX
Microsoft azure overview
Ali Mkahal
 
PPTX
OCI GoldenGate Overview 2021年4月版
オラクルエンジニア通信
 
PDF
Understanding Azure Networking Services
InCycleSoftware
 
PDF
ELK Stack
Eberhard Wolff
 
PPTX
쿠버네티스 ( Kubernetes ) 소개 자료
Opennaru, inc.
 
PDF
Confluent Tech Talk Korea
confluent
 
PDF
Service mesh(istio) monitoring
Jeong-Ho Na
 
PDF
[B31] LOGMinerってレプリケーションソフトで使われているけどどうなってる? by Toshiya Morita
Insight Technology, Inc.
 
App Modernization with Microsoft Azure
Microsoft Tech Community
 
OpenStack Ironic - Bare Metal-as-a-Service
Ramon Acedo Rodriguez
 
[OpenInfra Days Korea 2018] (Track 2) Neutron LBaaS 어디까지 왔니? - Octavia 소개
OpenStack Korea Community
 
Red Hat OpenShift Container Platform Overview
James Falkner
 
Azure kubernetes service (aks)
Akash Agrawal
 
Azure Container Services
WinWire Technologies Inc
 
1- Introduction of Azure data factory.pptx
BRIJESH KUMAR
 
[OpenStack] 공개 소프트웨어 오픈스택 입문 & 파헤치기
Ian Choi
 
Azure Synapse Analytics Overview (r2)
James Serra
 
Nginx Reverse Proxy with Kafka.pptx
wonyong hwang
 
Best Practices for Streaming IoT Data with MQTT and Apache Kafka®
confluent
 
Azure Networking (1).pptx
Razith2
 
Microsoft azure overview
Ali Mkahal
 
OCI GoldenGate Overview 2021年4月版
オラクルエンジニア通信
 
Understanding Azure Networking Services
InCycleSoftware
 
ELK Stack
Eberhard Wolff
 
쿠버네티스 ( Kubernetes ) 소개 자료
Opennaru, inc.
 
Confluent Tech Talk Korea
confluent
 
Service mesh(istio) monitoring
Jeong-Ho Na
 
[B31] LOGMinerってレプリケーションソフトで使われているけどどうなってる? by Toshiya Morita
Insight Technology, Inc.
 

Similar to Ingesting and Processing IoT Data Using MQTT, Kafka Connect and Kafka Streams/KSQL (20)

PDF
Event Broker (Kafka) in a Modern Data Architecture
Guido Schmutz
 
PDF
Kafka Connect & Streams - the ecosystem around Kafka
Guido Schmutz
 
PDF
DBA Fundamentals Group: Continuous SQL with Kafka and Flink
Timothy Spann
 
PDF
Kafka Connect & Kafka Streams/KSQL - powerful ecosystem around Kafka core
Guido Schmutz
 
PDF
Kafka Connect & Kafka Streams/KSQL - the ecosystem around Kafka
Guido Schmutz
 
PDF
Apache Kafka - Scalable Message Processing and more!
Guido Schmutz
 
PDF
FIWARE Wednesday Webinars - Short Term History within Smart Systems
FIWARE
 
PDF
Discover How Volvo Cars Uses a Time Series Database to Become Data-Driven
DevOps.com
 
PDF
Programming IoT Gateways with macchina.io
Günter Obiltschnig
 
PDF
Event Hub (i.e. Kafka) in Modern Data Architecture
Guido Schmutz
 
PDF
Building Event-Driven (Micro) Services with Apache Kafka
Guido Schmutz
 
PDF
Self-Service Data Ingestion Using NiFi, StreamSets & Kafka
Guido Schmutz
 
PPTX
SingleStore & Kafka: Better Together to Power Modern Real-Time Data Architect...
HostedbyConfluent
 
PDF
FIWARE Global Summit - FIWARE Overview
FIWARE
 
PDF
Apache Pulsar with MQTT for Edge Computing - Pulsar Summit Asia 2021
StreamNative
 
PDF
KSQL - Stream Processing simplified!
Guido Schmutz
 
PDF
CargoChain Brochure - Technology
CargoChain
 
PPT
OGCE Overview for SciDAC 2009
marpierc
 
PDF
Apache Kafka - A modern Stream Processing Platform
Guido Schmutz
 
PDF
Io t data streaming
ratthaslip ranokphanuwat
 
Event Broker (Kafka) in a Modern Data Architecture
Guido Schmutz
 
Kafka Connect & Streams - the ecosystem around Kafka
Guido Schmutz
 
DBA Fundamentals Group: Continuous SQL with Kafka and Flink
Timothy Spann
 
Kafka Connect & Kafka Streams/KSQL - powerful ecosystem around Kafka core
Guido Schmutz
 
Kafka Connect & Kafka Streams/KSQL - the ecosystem around Kafka
Guido Schmutz
 
Apache Kafka - Scalable Message Processing and more!
Guido Schmutz
 
FIWARE Wednesday Webinars - Short Term History within Smart Systems
FIWARE
 
Discover How Volvo Cars Uses a Time Series Database to Become Data-Driven
DevOps.com
 
Programming IoT Gateways with macchina.io
Günter Obiltschnig
 
Event Hub (i.e. Kafka) in Modern Data Architecture
Guido Schmutz
 
Building Event-Driven (Micro) Services with Apache Kafka
Guido Schmutz
 
Self-Service Data Ingestion Using NiFi, StreamSets & Kafka
Guido Schmutz
 
SingleStore & Kafka: Better Together to Power Modern Real-Time Data Architect...
HostedbyConfluent
 
FIWARE Global Summit - FIWARE Overview
FIWARE
 
Apache Pulsar with MQTT for Edge Computing - Pulsar Summit Asia 2021
StreamNative
 
KSQL - Stream Processing simplified!
Guido Schmutz
 
CargoChain Brochure - Technology
CargoChain
 
OGCE Overview for SciDAC 2009
marpierc
 
Apache Kafka - A modern Stream Processing Platform
Guido Schmutz
 
Io t data streaming
ratthaslip ranokphanuwat
 
Ad

More from confluent (20)

PDF
Stream Processing Handson Workshop - Flink SQL Hands-on Workshop (Korean)
confluent
 
PPTX
Webinar Think Right - Shift Left - 19-03-2025.pptx
confluent
 
PDF
Migration, backup and restore made easy using Kannika
confluent
 
PDF
Five Things You Need to Know About Data Streaming in 2025
confluent
 
PDF
Data in Motion Tour Seoul 2024 - Keynote
confluent
 
PDF
Data in Motion Tour Seoul 2024 - Roadmap Demo
confluent
 
PDF
From Stream to Screen: Real-Time Data Streaming to Web Frontends with Conflue...
confluent
 
PDF
Confluent per il settore FSI: Accelerare l'Innovazione con il Data Streaming...
confluent
 
PDF
Data in Motion Tour 2024 Riyadh, Saudi Arabia
confluent
 
PDF
Build a Real-Time Decision Support Application for Financial Market Traders w...
confluent
 
PDF
Strumenti e Strategie di Stream Governance con Confluent Platform
confluent
 
PDF
Compose Gen-AI Apps With Real-Time Data - In Minutes, Not Weeks
confluent
 
PDF
Building Real-Time Gen AI Applications with SingleStore and Confluent
confluent
 
PDF
Unlocking value with event-driven architecture by Confluent
confluent
 
PDF
Il Data Streaming per un’AI real-time di nuova generazione
confluent
 
PDF
Unleashing the Future: Building a Scalable and Up-to-Date GenAI Chatbot with ...
confluent
 
PDF
Break data silos with real-time connectivity using Confluent Cloud Connectors
confluent
 
PDF
Building API data products on top of your real-time data infrastructure
confluent
 
PDF
Speed Wins: From Kafka to APIs in Minutes
confluent
 
PDF
Evolving Data Governance for the Real-time Streaming and AI Era
confluent
 
Stream Processing Handson Workshop - Flink SQL Hands-on Workshop (Korean)
confluent
 
Webinar Think Right - Shift Left - 19-03-2025.pptx
confluent
 
Migration, backup and restore made easy using Kannika
confluent
 
Five Things You Need to Know About Data Streaming in 2025
confluent
 
Data in Motion Tour Seoul 2024 - Keynote
confluent
 
Data in Motion Tour Seoul 2024 - Roadmap Demo
confluent
 
From Stream to Screen: Real-Time Data Streaming to Web Frontends with Conflue...
confluent
 
Confluent per il settore FSI: Accelerare l'Innovazione con il Data Streaming...
confluent
 
Data in Motion Tour 2024 Riyadh, Saudi Arabia
confluent
 
Build a Real-Time Decision Support Application for Financial Market Traders w...
confluent
 
Strumenti e Strategie di Stream Governance con Confluent Platform
confluent
 
Compose Gen-AI Apps With Real-Time Data - In Minutes, Not Weeks
confluent
 
Building Real-Time Gen AI Applications with SingleStore and Confluent
confluent
 
Unlocking value with event-driven architecture by Confluent
confluent
 
Il Data Streaming per un’AI real-time di nuova generazione
confluent
 
Unleashing the Future: Building a Scalable and Up-to-Date GenAI Chatbot with ...
confluent
 
Break data silos with real-time connectivity using Confluent Cloud Connectors
confluent
 
Building API data products on top of your real-time data infrastructure
confluent
 
Speed Wins: From Kafka to APIs in Minutes
confluent
 
Evolving Data Governance for the Real-time Streaming and AI Era
confluent
 
Ad

Recently uploaded (20)

PPTX
01_Approach Cyber- DORA Incident Management.pptx
FinTech Belgium
 
PPTX
𝙳𝚘𝚠𝚗𝚕𝚘𝚊𝚍—Wondershare Filmora Crack 14.0.7 + Key Download 2025
sebastian aliya
 
PDF
2025_06_18 - OpenMetadata Community Meeting.pdf
OpenMetadata
 
PDF
Quantum AI Discoveries: Fractal Patterns Consciousness and Cyclical Universes
Saikat Basu
 
PDF
“Scaling i.MX Applications Processors’ Native Edge AI with Discrete AI Accele...
Edge AI and Vision Alliance
 
PDF
Open Source Milvus Vector Database v 2.6
Zilliz
 
PPTX
Curietech AI in action - Accelerate MuleSoft development
shyamraj55
 
PDF
“MPU+: A Transformative Solution for Next-Gen AI at the Edge,” a Presentation...
Edge AI and Vision Alliance
 
PDF
Cracking the Code - Unveiling Synergies Between Open Source Security and AI.pdf
Priyanka Aash
 
PPTX
MARTSIA: A Tool for Confidential Data Exchange via Public Blockchain - Pitch ...
Michele Kryston
 
PDF
My Journey from CAD to BIM: A True Underdog Story
Safe Software
 
PPSX
Usergroup - OutSystems Architecture.ppsx
Kurt Vandevelde
 
PPTX
Enabling the Digital Artisan – keynote at ICOCI 2025
Alan Dix
 
PDF
Database Benchmarking for Performance Masterclass: Session 1 - Benchmarking F...
ScyllaDB
 
PDF
FME as an Orchestration Tool with Principles From Data Gravity
Safe Software
 
DOCX
Daily Lesson Log MATATAG ICT TEchnology 8
LOIDAALMAZAN3
 
PPTX
reInforce 2025 Lightning Talk - Scott Francis.pptx
ScottFrancis51
 
PPTX
Practical Applications of AI in Local Government
OnBoard
 
PDF
Hyderabad MuleSoft In-Person Meetup (June 21, 2025) Slides
Ravi Tamada
 
PDF
Python Conference Singapore - 19 Jun 2025
ninefyi
 
01_Approach Cyber- DORA Incident Management.pptx
FinTech Belgium
 
𝙳𝚘𝚠𝚗𝚕𝚘𝚊𝚍—Wondershare Filmora Crack 14.0.7 + Key Download 2025
sebastian aliya
 
2025_06_18 - OpenMetadata Community Meeting.pdf
OpenMetadata
 
Quantum AI Discoveries: Fractal Patterns Consciousness and Cyclical Universes
Saikat Basu
 
“Scaling i.MX Applications Processors’ Native Edge AI with Discrete AI Accele...
Edge AI and Vision Alliance
 
Open Source Milvus Vector Database v 2.6
Zilliz
 
Curietech AI in action - Accelerate MuleSoft development
shyamraj55
 
“MPU+: A Transformative Solution for Next-Gen AI at the Edge,” a Presentation...
Edge AI and Vision Alliance
 
Cracking the Code - Unveiling Synergies Between Open Source Security and AI.pdf
Priyanka Aash
 
MARTSIA: A Tool for Confidential Data Exchange via Public Blockchain - Pitch ...
Michele Kryston
 
My Journey from CAD to BIM: A True Underdog Story
Safe Software
 
Usergroup - OutSystems Architecture.ppsx
Kurt Vandevelde
 
Enabling the Digital Artisan – keynote at ICOCI 2025
Alan Dix
 
Database Benchmarking for Performance Masterclass: Session 1 - Benchmarking F...
ScyllaDB
 
FME as an Orchestration Tool with Principles From Data Gravity
Safe Software
 
Daily Lesson Log MATATAG ICT TEchnology 8
LOIDAALMAZAN3
 
reInforce 2025 Lightning Talk - Scott Francis.pptx
ScottFrancis51
 
Practical Applications of AI in Local Government
OnBoard
 
Hyderabad MuleSoft In-Person Meetup (June 21, 2025) Slides
Ravi Tamada
 
Python Conference Singapore - 19 Jun 2025
ninefyi
 

Ingesting and Processing IoT Data Using MQTT, Kafka Connect and Kafka Streams/KSQL

  • 1. BASEL BERN BRUGG DÜSSELDORF FRANKFURT A.M. FREIBURG I.BR. GENF HAMBURG KOPENHAGEN LAUSANNE MÜNCHEN STUTTGART WIEN ZÜRICH Ingesting and Processing IoT Data - using MQTT, Kafka Connect and KSQL Guido Schmutz Kafka Summit 2018 – 16.10.2018 @gschmutz guidoschmutz.wordpress.com
  • 2. Guido Schmutz Working at Trivadis for more than 21 years Oracle ACE Director for Fusion Middleware and SOA Consultant, Trainer Software Architect for Java, Oracle, SOA and Big Data / Fast Data Head of Trivadis Architecture Board Technology Manager @ Trivadis More than 30 years of software development experience Contact: [email protected] Blog: http://guidoschmutz.wordpress.com Slideshare: http://www.slideshare.net/gschmutz Twitter: gschmutz
  • 3. Agenda 1. Introduction 2. IoT Logistics use case – Kafka Ecosystem "in Action” 3. Stream Data Integration – IoT Device to Kafka over MQTT 4. Stream Analytics with KSQL 5. Summary
  • 5. Hadoop Clusterd Hadoop Cluster Big Data Reference Architecture for Data Analytics Solutions SQL Search Service BI Tools Enterprise Data Warehouse Search / Explore File Import / SQL Import Event Hub D ata Flow D ata Flow Change DataCapture Parallel Processing Storage Storage RawRefined Results SQL Export Microservice State { } API Stream Processor State { } API Event Stream Event Stream Search Service Stream Analytics Microservices Enterprise Apps Logic { } API Edge Node Rules Event Hub Storage Bulk Source Event Source Location DB Extract File DB IoT Data Mobile Apps Social Event Stream Telemetry
  • 6. Hadoop Clusterd Hadoop Cluster Big Data Reference Architecture for Data Analytics Solutions SQL Search Service BI Tools Enterprise Data Warehouse Search / Explore File Import / SQL Import Event Hub D ata Flow D ata Flow Change DataCapture Parallel Processing Storage Storage RawRefined SQL Export Microservice State { } API Event Stream Event Stream Search Service Microservices Enterprise Apps Logic { } API Edge Node Rules Event Hub Storage Bulk Source Event Source Location DB Extract File IoT Data Mobile Apps Social Event Stream Telemetry Stream Processor State { } API Stream Analytics Results DB
  • 7. Two Types of Stream Processing (from Gartner) Stream Data Integration • Primarily cover streaming ETL • Integration of data source and data sinks • Filter and transform data • (Enrich data) • Route data Stream Analytics • analytics use cases • calculating aggregates and detecting patterns to generate higher-level, more relevant summary information (complex events => used to be CEP) • Complex events may signify threats or opportunities that require a response
  • 8. Stream Integration and Stream Analytics with Kafka Source Connector trucking_ driver Kafka Broker Sink Connector Stream Processing
  • 9. Stream Data Integration and Stream Analytics with Kafka Source Connector trucking_ driver Kafka Broker Sink Connector Stream Processing
  • 10. Hadoop Clusterd Hadoop Cluster Big Data Unified Architecture for Modern Data Analytics Solutions SQL Search Service BI Tools Enterprise Data Warehouse Search / Explore File Import / SQL Import Event Hub D ata Flow D ata Flow Change DataCapture Parallel Processing Storage Storage RawRefined Results SQL Export Microservice State { } API Stream Processor State { } API Event Stream Event Stream Search Service Stream Analytics Microservices Enterprise Apps Logic { } API Edge Node Rules Event Hub Storage Bulk Source Event Source Location DB Extract File DB IoT Data Mobile Apps Social Event Stream Telemetry
  • 11. Various IoT Data Protocols • MQTT (Message Queue Telemetry Transport) • CoaP • AMQP • DDS (Data Distribution Service) • STOMP • REST • WebSockets • …
  • 12. IoT Logistics use case – Kafka Ecosystem "in Action"
  • 13. Demo - IoT Logistics Use Case Trucks are sending driving info and geo-position data in one single message Position & Driving Info Testdata-Generator originally by Hortonworks {"timestamp":1537343400827,"truckId":87, "driverId":13,"routeId":987179512,"eventType":"Normal", ,"latitude":38.65,"longitude":-90.21, "correlationId":"- 3208700263746910537"} { "timestamp":1537343400827, "truckId":87, "driverId":13, "routeId":987179512, "eventType":"Normal", "latitude":38.65, "longitude":-90.21, "correlationId":"-32087002637” } ?
  • 14. Stream Data Integration – IoT Device to Kafka over MQTT
  • 15. Stream Data Integration Source Connector trucking_ driver Kafka Broker Sink Connector Stream Processing
  • 16. (I) IoT Device sends data via MQTT Message Queue Telemetry Transport (MQTT) Pub/Sub architecture with Message Broker Built in retry / QoS mechanism Last Will and Testament (LWT) Not all MQTT brokers are scalable Available Does not provide state (history) truck/nn/ position {"timestamp":1537343400827,"truckId":87, "driverId":13,"routeId":987179512,"eventType":"Normal", ,"latitude":38.65,"longitude":-90.21, "correlationId":"- 3208700263746910537"} Position & Driving Info
  • 17. MQTT to Kafka using Confluent MQTT Connector
  • 18. IoT Device sends data via MQTTs – how to get the data into Kafka? truck position truck/nn/ position ? {"timestamp":1537343400827,"truckId":87, "driverId":13,"routeId":987179512,"eventType":"Normal", ,"latitude":38.65,"longitude":-90.21, "correlationId":"- 3208700263746910537"} Position & Driving Info
  • 19. 2 Ways for MQTT with Confluent Streaming Platform Confluent MQTT Connector (Preview) • Pull-based • integrate with (existing) MQTT servers • can be used both as a Source and Sink • output is an envelope with all of the properties of the incoming message • Value: body of MQTT message • Key: is the MQTT topic the message was written to • Can consume multiple MQTT topics and write to one single Kafka topic • RegexRouter SMT can be used to change topic names Confluent MQTT Proxy • Push-based • enables MQTT clients to use the MQTT protocol to publish data directly to Kafka • MQTT Proxy is stateless and independent of other instances • simple mapping scheme of MQTT topics to Kafka topics based on regular expressions • reduced lag in message publishing compared to traditional MQTT brokers
  • 20. (II) MQTT to Kafka using Confluent MQTT Connector truck/nn/ position mqtt to kafka truck_position kafkacat {"timestamp":1537343400827,"truckId":87, "driverId":13,"routeId":987179512,"eventType":"Normal", ,"latitude":38.65,"longitude":-90.21, "correlationId":"- 3208700263746910537"} Position & Driving Info
  • 21. Confluent MQTT Connector Currently available as a Preview on Confluent Hub Setup plugin.path to specify the additional folder confluent-hub install confluentinc/kafka-connect-mqtt:1.0.0-preview plugin.path=/usr/share/java,/etc/kafka-connect/custom-plugins, /usr/share/confluent-hub-components
  • 22. Create an instance of Confluent MQTT Connector #!/bin/bash curl -X "POST" "http://192.168.69.138:8083/connectors" -H "Content-Type: application/json" -d $'{ "name": "mqtt-source", "config": { "connector.class": "io.confluent.connect.mqtt.MqttSourceConnector", "tasks.max": "1", "name": "mqtt-source", "mqtt.server.uri": "tcp://mosquitto:1883", "mqtt.topics": "truck/+/position", "kafka.topic":"truck_position", "mqtt.clean.session.enabled":"true", "mqtt.connect.timeout.seconds":"30", "mqtt.keepalive.interval.seconds":"60", "mqtt.qos":"0" } }'
  • 23. MQTTProxy (III) MQTT to Kafka using Confluent MQTT Proxy truck position engine metrics console consumer Engine Metrics {"timestamp":1537343400827,"truckId":87, "driverId":13,"routeId":987179512,"eventType":"Normal", ,"latitude":38.65,"longitude":-90.21, "correlationId":"- 3208700263746910537"} Position & Driving Info
  • 24. Configure MQTT Proxy Configure MQTT Proxy Start MQTT Proxy topic.regex.list=truck_position:.*position, engine_metric:.*engine_metric listeners=0.0.0.0:1883 bootstrap.servers=PLAINTEXT://broker-1:9092 confluent.topic.replication.factor=1 bin/kafka-mqtt-start kafka-mqtt.properties
  • 25. MQTTProxy MQTT Connector vs. MQTT Proxy MQTT Connector • Pull-based • Use existing MQTT infrastructures • Bi-directional MQTT Proxy • Push-based • Does not provide all MQTT functionality • Only uni-directional Position Position Position truck/nn/ driving info mqtt to kafka truck driving info truck/nn/ position mqtt to kafka truck position Position Position Position truck/nn/ driving info mqtt to kafka truck/nn/ position mqtt to kafka Position Position Position truck driving info truck position Position Position Position REGION-1 DC REGION-2 DC REGION-1 DC REGION-2 DC Headquarter DC Headquarter DC
  • 26. (IV) MQTT to Kafka using StreamSets Data Collector truck/nn/ position mqtt to kafka truck_position console consumer {"timestamp":1537343400827,"truckId":87, "driverId":13,"routeId":987179512,"eventType":"Normal", ,"latitude":38.65,"longitude":-90.21, "correlationId":"- 3208700263746910537"} Position & Driving Info
  • 27. MQTT to Kafka using StreamSets Data Collector
  • 28. MQTT Proxy Wait … there is more …. truck/nn/ position mqtt to kafka truck_driving info truck_position console consumer what about some analytics ? console consumer {"timestamp":1537343400827,"truckId":87, "driverId":13,"routeId":987179512,"eventType":"Normal", ,"latitude":38.65,"longitude":-90.21, "correlationId":"- 3208700263746910537"} Position & Driving Info Position & Driving Info {"timestamp":1537343400827,"truckId":87, "driverId":13,"routeId":987179512,"eventType":"Normal", ,"latitude":38.65,"longitude":-90.21, "correlationId":"- 3208700263746910537"}
  • 31. KSQL - Terminology Stream • “History” • an unbounded sequence of structured data ("facts") • Facts in a stream are immutable • new facts can be inserted to a stream • existing facts can never be updated or deleted • Streams can be created from a Kafka topic or derived from an existing stream Table • “State” • a view of a stream, or another table, and represents a collection of evolving facts • Facts in a table are mutable • new facts can be inserted to the table • existing facts can be updated or deleted • Tables can be created from a Kafka topic or derived from existing streams and tables Enables stream processing with zero coding required The simplest way to process streams of data in real-time
  • 32. (V) Create STREAM on truck_position and use it in KSQL CLI truck/nn/ position mqtt-to- kafka truck- position Stream {"timestamp":1537343400827,"truckId":87, "driverId":13,"routeId":987179512,"eventType":"Normal", ,"latitude":38.65,"longitude":-90.21, "correlationId":"- 3208700263746910537"} Position & Driving Info KSQL CLI
  • 33. Create a STREAM on truck_driving_info ksql> CREATE STREAM truck_driving_info_s (ts VARCHAR, truckId VARCHAR, driverId BIGINT, routeId BIGINT, eventType VARCHAR, latitude DOUBLE, longitude DOUBLE, correlationId VARCHAR) WITH (kafka_topic='truck_driving_info', value_format=‘JSON'); Message ---------------- Stream created
  • 34. Create a STREAM on truck_driving_info ksql> describe truck_position_s; Field | Type --------------------------------- ROWTIME | BIGINT ROWKEY | VARCHAR(STRING) TS | VARCHAR(STRING) TRUCKID | VARCHAR(STRING) DRIVERID | BIGINT ROUTEID | BIGINT EVENTTYPE | VARCHAR(STRING) LATITUDE | DOUBLE LONGITUDE | DOUBLE CORRELATIONID | VARCHAR(STRING)
  • 35. KSQL - SELECT Selects rows from a KSQL stream or table Result of this statement will not be persisted in a Kafka topic and will only be printed out in the console from_item is one of the following: stream_name, table_name SELECT select_expr [, ...] FROM from_item [ LEFT JOIN join_table ON join_criteria ] [ WINDOW window_expression ] [ WHERE condition ] [ GROUP BY grouping_expression ] [ HAVING having_expression ] [ LIMIT count ];
  • 36. Use SELECT to browse from Stream ksql> SELECT * FROM truck_driving_info_s; 1539711991642 | truck/24/position | null | 24 | 10 | 1198242881 | Normal | 36.84 | -94.83 | -6187001306629414077 1539711991691 | truck/26/position | null | 26 | 13 | 1390372503 | Normal | 42.04 | -88.02 | -6187001306629414077 1539711991882 | truck/66/position | null | 66 | 22 | 1565885487 | Normal | 38.33 | -94.35 | -6187001306629414077 1539711991902 | truck/22/position | null | 22 | 26 | 1198242881 | Normal | 36.73 | -95.01 | -6187001306629414077 ksql> SELECT * FROM truck_position_s WHERE eventType != 'Normal'; 1539712101614 | truck/67/position | null | 67 | 11 | 160405074 | Lane Departure | 38.98 | -92.53 | -6187001306629414077 1539712116450 | truck/18/position | null | 18 | 25 | 987179512 | Overspeed | 40.76 | -88.77 | -6187001306629414077 1539712120102 | truck/31/position | null | 31 | 12 | 927636994 | Unsafe following distance | 38.22 | -91.18 | -6187001306629414077
  • 37. (VI) – CREATE AS … SELECT … detect_dangero us_driving truck/nn/ position mqtt-to- kafka truck- position Stream Stream Dangerous- driving {"timestamp":1537343400827,"truckId":87, "driverId":13,"routeId":987179512,"eventType":"Normal", ,"latitude":38.65,"longitude":-90.21, "correlationId":"- 3208700263746910537"} Position & Driving Info
  • 38. CREATE STREAM … AS SELECT … Create a new KSQL table along with the corresponding Kafka topic and stream the result of the SELECT query as a changelog into the topic WINDOW clause can only be used if the from_item is a stream CREATE STREAM stream_name [WITH ( property_name = expression [, ...] )] AS SELECT select_expr [, ...] FROM from_stream [ LEFT | FULL | INNER ] JOIN [join_table | join_stream] [ WITHIN [(before TIMEUNIT, after TIMEUNIT) | N TIMEUNIT] ] ON join_criteria [ WHERE condition ] [PARTITION BY column_name];
  • 39. INSERT INTO … AS SELECT … Stream the result of the SELECT query into an existing stream and its underlying topic schema and partitioning column produced by the query must match the stream’s schema and key If the schema and partitioning column are incompatible with the stream, then the statement will return an error stream_name and from_item must both refer to a Stream. Tables are not supported! CREATE STREAM stream_name ...; INSERT INTO stream_name SELECT select_expr [., ...] FROM from_stream [ WHERE condition ] [ PARTITION BY column_name ];
  • 40. CREATE AS … SELECT … ksql> CREATE STREAM dangerous_driving_s WITH (kafka_topic= dangerous_driving_s', value_format='JSON') AS SELECT * FROM truck_position_s WHERE eventtype != 'Normal'; Message ---------------------------- Stream created and running ksql> select * from dangerous_driving_s; 1539712399201 | truck/67/position | null | 67 | 11 | 160405074 | Unsafe following distance | 38.65 | -90.21 | -6187001306629414077 1539712416623 | truck/67/position | null | 67 | 11 | 160405074 | Unsafe following distance | 39.1 | -94.59 | -6187001306629414077 1539712430051 | truck/18/position | null | 18 | 25 | 987179512 | Lane Departure | 35.1 | -90.07 | -6187001306629414077
  • 41. Windowing streams are unbounded need some meaningful time frames to do computations (i.e. aggregations) Computations over events done using windows of data Windows are tracked per unique key Fixed Window Sliding Window Session Window Time Stream of Data Window of Data
  • 42. (VII) Aggregate and Window detect_dangero us_driving truck/nn/ position mqtt-to- kafka truck- position Stream Stream Dangerous- driving count_by_ eventType Table Dangergous- driving-count {"timestamp":1537343400827,"truckId":87, "driverId":13,"routeId":987179512,"eventType":"Normal", ,"latitude":38.65,"longitude":-90.21, "correlationId":"- 3208700263746910537"} Position & Driving Info
  • 43. SELECT COUNT … GROUP BY ksql> CREATE TABLE dangerous_driving_count AS SELECT eventType, count(*) nof FROM dangerous_driving_s WINDOW TUMBLING (SIZE 30 SECONDS) GROUP BY eventType; Message ---------------------------- Table created and running ksql> SELECT TIMESTAMPTOSTRING(ROWTIME, 'yyyy-MM-dd HH:mm:ss.SSS’), eventType, nof FROM dangerous_driving_count;; 2018-10-16 05:12:19.408 | Unsafe following distance | 1 2018-10-16 05:12:38.926 | Unsafe following distance | 1 2018-10-16 05:12:39.615 | Unsafe tail distance | 1 2018-10-16 05:12:43.155 | Overspeed | 1
  • 44. Joining Stream to Static (Table) Join Stream to Stream Join (one window join) Stream to Stream Join (two window join) Stream-to- Static Join Stream-to- Stream Join Stream-to- Stream Join TimeTime Time
  • 45. (VIII) – Join Table to enrich with Driver data Truck Driver kdbc-to- kafka truck- driver 27, Walter, Ward, Y, 24-JUL-85, 2017-10- 02 15:19:00 Table join dangerous- driving & driver Stream Dangerous- driving & driver detect_dangero us_driving truck/nn/ position mqtt-to- kafka truck- position Stream Stream Dangerous- driving count_by_ eventType Table Dangergous- driving-count {"id":27,"firstName":"Walter","lastName":"W ard","available":"Y","birthdate":"24-JUL- 85","last_update":1506923052012} {"timestamp":1537343400827,"truckId":87, "driverId":13,"routeId":987179512,"eventType":"Normal", ,"latitude":38.65,"longitude":-90.21, "correlationId":"- 3208700263746910537"} Position & Driving Info
  • 46. Join Table to enrich with Driver data #!/bin/bash curl -X "POST" "http://192.168.69.138:8083/connectors" -H "Content-Type: application/json" -d $'{ "name": "jdbc-driver-source", "config": { "connector.class": "JdbcSourceConnector", "connection.url":"jdbc:postgresql://db/sample?user=sample&password=sample", "mode": "timestamp", "timestamp.column.name":"last_update", "table.whitelist":"driver", "validate.non.null":"false", "topic.prefix":"truck_", "key.converter":"org.apache.kafka.connect.json.JsonConverter", "key.converter.schemas.enable": "false", "value.converter":"org.apache.kafka.connect.json.JsonConverter", "value.converter.schemas.enable": "false", "name": "jdbc-driver-source", "transforms":"createKey,extractInt", "transforms.createKey.type":"org.apache.kafka.connect.transforms.ValueToKey", "transforms.createKey.fields":"id", "transforms.extractInt.type":"org.apache.kafka.connect.transforms.ExtractField$Key", "transforms.extractInt.field":"id" } }'
  • 47. Create Table with Driver State ksql> CREATE TABLE driver_t (id BIGINT, first_name VARCHAR, last_name VARCHAR, available VARCHAR) WITH (kafka_topic='truck_driver', value_format='JSON', key='id'); Message ---------------- Table created
  • 48. Create Table with Driver State ksql> CREATE STREAM dangerous_driving_and_driver_s WITH (kafka_topic='dangerous_driving_and_driver_s', value_format='JSON’, partitions=8) AS SELECT driverId, first_name, last_name, truckId, routeId, eventtype, latitude, longitude FROM truck_position_s LEFT JOIN driver_t ON dangerous_driving_and_driver_s.driverId = driver_t.id; Message ---------------------------- Stream created and running ksql> select * from dangerous_driving_and_driver_s; 1539713095921 | 11 | 11 | Micky | Isaacson | 67 | 160405074 | Lane Departure | 39.01 | -93.85 1539713113254 | 11 | 11 | Micky | Isaacson | 67 | 160405074 | Unsafe following distance | 39.0 | -93.65
  • 49. (IX) – Custom UDF for calculating Geohash Truck Driver kdbc-to- kafka truck- driver 27, Walter, Ward, Y, 24-JUL-85, 2017-10- 02 15:19:00 Table join dangerous- driving & driver Stream Dangerous- driving & driver detect_dangero us_driving truck/nn/ position mqtt-to- kafka truck- position Stream Stream Dangerous- driving count_by_ eventType Table Dangergous- driving-count {"id":27,"firstName":"Walter","lastName":"W ard","available":"Y","birthdate":"24-JUL- 85","last_update":1506923052012} {"timestamp":1537343400827,"truckId":87, "driverId":13,"routeId":987179512,"eventType":"Normal", ,"latitude":38.65,"longitude":-90.21, "correlationId":"- 3208700263746910537"} Position & Driving Info dangerous driving by geo Stream dangerous- drving-geohash
  • 50. Custom UDF for calculating Geohashes Geohash is a geocoding which encodes a geographic location into a short string of letters and digits hierarchical spatial data structure which subdivides space into buckets of grid shape Length Area width x height 1 5,009.4km x 4,992.6km 2 1,252.3km x 624.1km 3 156.5km x 156km 4 39.1km x 19.5km 5 39.1km x 19.5km 12 3.7cm x 1.9cm ksql> SELECT latitude, longitude, geohash(latitude, longitude, 4) FROM dangerous_driving_s; 38.31 | -91.07 | 9yz1 37.7 | -92.61 | 9ywn 34.78 | -92.31 | 9ynm 42.23 | -91.78 | 9zw8xw ... http://geohash.gofreerange.com/
  • 51. Add an UDF sample Geohash and join to some important messages for drivers @UdfDescription(name = "geohash", description = "returns the geohash for a given LatLong") public class GeoHashUDF { @Udf(description = "encode lat/long to geohash of specified length.") public String geohash(final double latitude, final double longitude, int length) { return GeoHash.encodeHash(latitude, longitude, length); } @Udf(description = "encode lat/long to geohash.") public String geohash(final double latitude, final double longitude) { return GeoHash.encodeHash(latitude, longitude); } }
  • 53. Summary Two ways to bring in MQTT data => MQTT Connector or MQTT Proxy KSQL is another way to work with data in Kafka => you can (re)use some of your SQL knowledge • Similar semantics to SQL, but is for queries on continuous, streaming data Well-suited for structured data (there is the "S" in KSQL) There is more • Stream to Stream Join • REST API for executing KSQL • Avro Format & Schema Registry • Using Kafka Connect to write results to data stores • …
  • 54. Choosing the Right API • Java, c#, c++, scala, phyton, node.js, go, php … • subscribe() • poll() • send() • flush() • Anything Kafka • Fluent Java API • mapValues() • filter() • flush() • Stream Analytics • SQL dialect • SELECT … FROM … • JOIN ... WHERE • GROUP BY • Stream Analytics Consumer, Producer API Kafka Streams KSQL • Declarative • Configuration • REST API • Out-of-the-box connectors • Stream Integration Kafka Connect Flexibility Simplicity Source: adapted from Confluent
  • 55. Technology on its own won't help you. You need to know how to use it properly.