SlideShare a Scribd company logo
KSQL
The Open Source Streaming SQL Engine for Apache Kafka
Kai Waehner
Technology Evangelist
kontakt@kai-waehner.de
LinkedIn
@KaiWaehner
www.confluent.io
www.kai-waehner.de
2Confidential
1.0 Enterprise
Ready J
A Brief History of Apache Kafka and Confluent
0.11 Exactly-once
semantics
0.10 Data processing
(Streams API)
0.9 Data integration
(Connect API)
Intra-cluster
replication
0.8
2012 2014
Cluster mirroring0.7
2015 2016 20172013 2018
CP 4.1
KSQL GA
3Confidential
KSQL – The Streaming SQL Engine for Apache Kafka
4KSQL- Streaming SQL for Apache Kafka
Why KSQL?
Population
CodingSophistication
Realm of Stream Processing
New, Expanded Realm
BI
Analysts
Core
Developers
Data
Engineers
Core Developers
who don’t like
Java
Kafka
Streams
KSQL
5KSQL- Streaming SQL for Apache Kafka
Shoulders of Streaming Giants
subscribe(), poll(), send(),
flush(), beginTransaction(), …
KStream, KTable, filter(), map(), flatMap(), join(),
aggregate(), transform(), …
CREATE STREAM, CREATE TABLE,
SELECT, JOIN, GROUP BY, SUM, …
KSQL UDFs
6KSQL- Streaming SQL for Apache Kafka
KSQL for Data Exploration and Debugging
An easy way to inspect your data in Kafka
SHOW TOPICS;
SELECT page, user_id, status, bytes
FROM clickstream
WHERE user_agent LIKE 'Mozilla/5.0%';
PRINT 'my-topic' FROM BEGINNING;
7KSQL- Streaming SQL for Apache Kafka
KSQL for Data Transformation
Quickly make derivations of existing data in Kafka
CREATE STREAM clicks_by_user_id
WITH (PARTITIONS=6,
TIMESTAMP='view_time’
VALUE_FORMAT='JSON') AS
SELECT * FROM clickstream
PARTITION BY user_id;
Change number of partitions1
Convert data to JSON2
Repartition the data3
8KSQL- Streaming SQL for Apache Kafka
KSQL for Real-Time, Streaming ETL
Filter, cleanse, process data while it is in motion
CREATE STREAM clicks_from_vip_users AS
SELECT user_id, u.country, page, action
FROM clickstream c
LEFT JOIN users u ON c.user_id = u.user_id
WHERE u.level ='Platinum'; Pick only VIP users1
9KSQL- Streaming SQL for Apache Kafka
Example: CDC from DB via Kafka to Elastic
10KSQL- Streaming SQL for Apache Kafka
KSQL for Real-time Data Enrichment
Join data from a variety of sources to see the full picture
CREATE STREAM enriched_payments AS
SELECT payment_id, c.country, total
FROM payments_stream p
LEFT JOIN customers_table c
ON p.user_id = c.user_id;
Stream-Stream Join2
Stream-Table Join1
11KSQL- Streaming SQL for Apache Kafka
Example: Retail
12KSQL- Streaming SQL for Apache Kafka
KSQL for Real-Time Monitoring
Derive insights from events (IoT, sensors, etc.) and turn them into actions
CREATE TABLE failing_vehicles AS
SELECT vehicle, COUNT(*)
FROM vehicle_monitoring_stream
WINDOW TUMBLING (SIZE 1 MINUTE)
WHERE event_type = 'ERROR’
GROUP BY vehicle
HAVING COUNT(*) >= 5; Now we know to alert, and whom1
13KSQL- Streaming SQL for Apache Kafka
Example: IoT, Automotive, Connected Cars
streams
14KSQL- Streaming SQL for Apache Kafka
KSQL for Anomaly Detection
Aggregate data to identify patterns and anomalies in real-time
CREATE TABLE possible_fraud AS
SELECT card_number, COUNT(*)
FROM authorization_attempts
WINDOW TUMBLING (SIZE 30 SECONDS)
GROUP BY card_number
HAVING COUNT(*) > 3;
Aggregate data1
… per 30-sec windows2
15KSQL- Streaming SQL for Apache Kafka
Example: Anomaly Detection with Deep Learning (Autoencoder)
“CREATE STREAM AnomalyDetection AS
SELECT sensor_id, detectAnomaly(sensor_values)
FROM car_engine;“
User Defined Function (UDF)
https://github.com/kaiwaehner/
ksql-udf-deep-learning-mqtt-iot
16KSQL- Streaming SQL for Apache Kafka
Independent Dev / Test / Prod of different Apps and Microservices
17KSQL- Streaming SQL for Apache Kafka
No Matter Where it Runs
18KSQL- Streaming SQL for Apache Kafka
KSQL Concepts
● No need for source code
• Zero, none at all, not even one line.
• No SerDes, no generics, no lambdas, ...
● All the Kafka and Kafka Streams “magic” out-of-the-box
• Exactly Once Semantics
• Windowing
• Event-time aggregation
• Late-arriving data
• Distributed, fault-tolerant, scalable, ...
19KSQL- Streaming SQL for Apache Kafka
KSQL is Equally viable for S / M / L / XL / XXL use cases
Ok. Ok. Ok.
… and KSQL is ready for production, including 24/7 support!
20KSQL- Streaming SQL for Apache Kafka
Fault-Tolerance, powered by Kafka
21KSQL- Streaming SQL for Apache Kafka
STREAM and TABLE as first-class citizens
22KSQL- Streaming SQL for Apache Kafka
WINDOWing
● Not ANSI SQL ! à Continuous Queries
• TUMBLING
• SELECT appname, ip, COUNT(appname) AS problem_count FROM
logstream WINDOW TUMBLING (size 1 minute) WHERE loglevel='ERROR'
GROUP BY appname, ip;
• HOPPING
• SELECT itemid, SUM(arraycol[0]) FROM orders WINDOW HOPPING
(size 20 second, advance by 5 second) GROUP BY itemid;
• SESSION
• SELECT itemid, SUM(sales_price) FROM orders WINDOW SESSION
(20 second) GROUP BY itemid;
23KSQL- Streaming SQL for Apache Kafka
KSQL - Components
KSQL has 3 main components:
1. The Engine which actually runs the Kafka Streams topologies
2. The REST server interface enables an Engine to receive instructions from the CLI
or any other client
3. The CLI, designed to be familiar to users of MySQL, Postgres etc.
(Note that you also need a Kafka Cluster… KSQL is deployed independently)
24KSQL- Streaming SQL for Apache Kafka
KSQL can be used interactively + programmatically
ksql> POST /query
1UI
2CLI
3REST
4Headless
25KSQL- Streaming SQL for Apache Kafka
Architecture (Client – Server Mode)
JVM
KSQL Server
KSQL CLI or any REST Client
JVM
KSQL Server
JVM
KSQL Server
Kafka Cluster
26KSQL- Streaming SQL for Apache Kafka
Architecture (Headless Mode)
JVM
KSQL Server
JVM
KSQL Server
JVM
KSQL Server
Kafka Cluster
27KSQL- Streaming SQL for Apache Kafka
Dedicating resources
Join Engines to the same
‘service pool’ by means of the
ksql.service.id property
28KSQL- Streaming SQL for Apache Kafka
User Defined Functions (UDF, UDAF)
Write UDF code in Java, mark with annotations @UdfDescription, @Udf.
SELECT address, STRINGLENGTH(address->street) FROM orders;
Make UDF available to KSQL (next slides), then use it like any other KSQL function in your queries:
The UDF name in KSQL queries is
whatever you define in the `name` field in
the annotation (here: “stringLength”).
29KSQL- Streaming SQL for Apache Kafka
Live Demo
KSQL in Action
30KSQL- Streaming SQL for Apache Kafka
KSQL Quick Start – Getting Started in Minutes!
https://docs.confluent.io/
current/quickstart/index.html
Local runtime
or
Docker container
31KSQL- Streaming SQL for Apache Kafka
Demo - Clickstream Analysis
• https://docs.confluent.io/current/ksql/docs/tutorials/clickstream-docker.html#ksql-clickstream-
docker
• Leverages Apache Kafka, Kafka Connect, KSQL, Elasticsearch and Grafana
• 5min screencast: https://www.youtube.com/watch?v=A45uRzJiv7I
• Setup in 5 minutes (with or without Docker)
SELECT STREAM
CEIL(timestamp TO HOUR) AS timeWindow, productId,
COUNT(*) AS hourlyOrders, SUM(units) AS units
FROM Orders GROUP BY CEIL(timestamp TO HOUR),
productId;
timeWindow | productId | hourlyOrders | units
------------+-----------+--------------+-------
08:00:00 | 10 | 2 | 5
08:00:00 | 20 | 1 | 8
09:00:00 | 10 | 4 | 22
09:00:00 | 40 | 1 | 45
... | ... | ... | ...
32KSQL- Streaming SQL for Apache Kafka
KSQL Recipes
https://www.confluent.io/stream-processing-cookbook
33KSQL- Streaming SQL for Apache Kafka
Resources and Next Steps
Get Involved
• Try the Quickstart on GitHub
• Check out the code
• Play with the examples
KSQL is GA… You can already use it for production deployments!
https://github.com/confluentinc/ksql
http://confluent.io/ksql
https://slackpass.io/confluentcommunity #ksql
KSQLis the
Streaming
SQL Engine
for
Apache Kafka
Questions?
Kai Waehner
Technology Evangelist
kontakt@kai-waehner.de
LinkedIn
@KaiWaehner
www.confluent.io
www.kai-waehner.de

More Related Content

PDF
Apache Kafka as Event-Driven Open Source Streaming Platform (Prague Meetup)
PDF
Apache Kafka vs. Integration Middleware (MQ, ETL, ESB)
PDF
Unleashing Apache Kafka and TensorFlow in the Cloud

PDF
KSQL – An Open Source Streaming Engine for Apache Kafka
PDF
Rethinking Stream Processing with Apache Kafka, Kafka Streams and KSQL
PDF
Deep Learning at Extreme Scale (in the Cloud) 
with the Apache Kafka Open Sou...
PDF
Confluent REST Proxy and Schema Registry (Concepts, Architecture, Features)
PDF
IoT Sensor Analytics with Python, Jupyter, TensorFlow, Keras, Apache Kafka, K...
Apache Kafka as Event-Driven Open Source Streaming Platform (Prague Meetup)
Apache Kafka vs. Integration Middleware (MQ, ETL, ESB)
Unleashing Apache Kafka and TensorFlow in the Cloud

KSQL – An Open Source Streaming Engine for Apache Kafka
Rethinking Stream Processing with Apache Kafka, Kafka Streams and KSQL
Deep Learning at Extreme Scale (in the Cloud) 
with the Apache Kafka Open Sou...
Confluent REST Proxy and Schema Registry (Concepts, Architecture, Features)
IoT Sensor Analytics with Python, Jupyter, TensorFlow, Keras, Apache Kafka, K...

What's hot (20)

PDF
Service Mesh with Apache Kafka, Kubernetes, Envoy, Istio and Linkerd
PDF
Apache Kafka Open Source Ecosystem for Machine Learning at Extreme Scale (Apa...
PDF
Kafka Connect and Streams (Concepts, Architecture, Features)
PDF
Unleashing Apache Kafka and TensorFlow in Hybrid Cloud Architectures
PDF
IoT Sensor Analytics with Kafka, ksqlDB and TensorFlow
PPTX
IIoT with Kafka and Machine Learning for Supply Chain Optimization In Real Ti...
PDF
App modernization on AWS with Apache Kafka and Confluent Cloud
PDF
Cloud Native London 2019 Faas composition using Kafka and cloud-events
PDF
Apache Kafka 2.3 + Confluent Platform 5.3 => What's New?
PDF
Machine Learning Trends of 2018 combined with the Apache Kafka Ecosystem
PDF
Apache Kafka as Event Streaming Platform for Microservice Architectures
PDF
Kafka Streams vs. KSQL for Stream Processing on top of Apache Kafka
PDF
Best Practices for Streaming IoT Data with MQTT and Apache Kafka
PDF
Streaming Machine Learning with Python, Jupyter, TensorFlow, Apache Kafka and...
PDF
Event-Driven Stream Processing and Model Deployment with Apache Kafka, Kafka ...
PDF
Kafka for Real-Time Replication between Edge and Hybrid Cloud
PDF
Serverless Kafka on AWS as Part of a Cloud-native Data Lake Architecture
PPTX
Modern Cloud-Native Streaming Platforms: Event Streaming Microservices with A...
PDF
Can Apache Kafka Replace a Database?
PDF
Building a real-time data processing pipeline using Apache Kafka, Kafka Conne...
Service Mesh with Apache Kafka, Kubernetes, Envoy, Istio and Linkerd
Apache Kafka Open Source Ecosystem for Machine Learning at Extreme Scale (Apa...
Kafka Connect and Streams (Concepts, Architecture, Features)
Unleashing Apache Kafka and TensorFlow in Hybrid Cloud Architectures
IoT Sensor Analytics with Kafka, ksqlDB and TensorFlow
IIoT with Kafka and Machine Learning for Supply Chain Optimization In Real Ti...
App modernization on AWS with Apache Kafka and Confluent Cloud
Cloud Native London 2019 Faas composition using Kafka and cloud-events
Apache Kafka 2.3 + Confluent Platform 5.3 => What's New?
Machine Learning Trends of 2018 combined with the Apache Kafka Ecosystem
Apache Kafka as Event Streaming Platform for Microservice Architectures
Kafka Streams vs. KSQL for Stream Processing on top of Apache Kafka
Best Practices for Streaming IoT Data with MQTT and Apache Kafka
Streaming Machine Learning with Python, Jupyter, TensorFlow, Apache Kafka and...
Event-Driven Stream Processing and Model Deployment with Apache Kafka, Kafka ...
Kafka for Real-Time Replication between Edge and Hybrid Cloud
Serverless Kafka on AWS as Part of a Cloud-native Data Lake Architecture
Modern Cloud-Native Streaming Platforms: Event Streaming Microservices with A...
Can Apache Kafka Replace a Database?
Building a real-time data processing pipeline using Apache Kafka, Kafka Conne...
Ad

Similar to KSQL – The Open Source SQL Streaming Engine for Apache Kafka (Big Data Spain 2018) (20)

PDF
Kai Waehner - KSQL – The Open Source SQL Streaming Engine for Apache Kafka - ...
PDF
Kai Waehner - KSQL – The Open Source SQL Streaming Engine for Apache Kafka - ...
PDF
KSQL Deep Dive - The Open Source Streaming Engine for Apache Kafka
PDF
KSQL Intro
PPTX
KSQL and Kafka Streams – When to Use Which, and When to Use Both
PDF
Big, Fast, Easy Data: Distributed Stream Processing for Everyone with KSQL, t...
PDF
KSQL---Streaming SQL for Apache Kafka
PDF
Un'introduzione a Kafka Streams e KSQL... and why they matter!
PDF
Unlocking the world of stream processing with KSQL, the streaming SQL engine ...
PDF
KSQL: Open Source Streaming for Apache Kafka
PPTX
Real Time Stream Processing with KSQL and Kafka
ODP
KSQL- Streaming Sql for Kafka
PDF
Apache Kafka and KSQL in Action: Let's Build a Streaming Data Pipeline!
PDF
Riviera Jug - 20/03/2018 - KSQL
PDF
Real-Time Stream Processing with KSQL and Apache Kafka
PDF
Build a Bridge to Cloud with Apache Kafka® for Data Analytics Cloud Services
PDF
Kafka Summit SF 2017 - Kafka Stream Processing for Everyone with KSQL
PDF
APAC ksqlDB Workshop
PDF
Big Data LDN 2017: Look Ma, No Code! Building Streaming Data Pipelines With A...
PDF
Paris jug ksql - 2018-06-28
Kai Waehner - KSQL – The Open Source SQL Streaming Engine for Apache Kafka - ...
Kai Waehner - KSQL – The Open Source SQL Streaming Engine for Apache Kafka - ...
KSQL Deep Dive - The Open Source Streaming Engine for Apache Kafka
KSQL Intro
KSQL and Kafka Streams – When to Use Which, and When to Use Both
Big, Fast, Easy Data: Distributed Stream Processing for Everyone with KSQL, t...
KSQL---Streaming SQL for Apache Kafka
Un'introduzione a Kafka Streams e KSQL... and why they matter!
Unlocking the world of stream processing with KSQL, the streaming SQL engine ...
KSQL: Open Source Streaming for Apache Kafka
Real Time Stream Processing with KSQL and Kafka
KSQL- Streaming Sql for Kafka
Apache Kafka and KSQL in Action: Let's Build a Streaming Data Pipeline!
Riviera Jug - 20/03/2018 - KSQL
Real-Time Stream Processing with KSQL and Apache Kafka
Build a Bridge to Cloud with Apache Kafka® for Data Analytics Cloud Services
Kafka Summit SF 2017 - Kafka Stream Processing for Everyone with KSQL
APAC ksqlDB Workshop
Big Data LDN 2017: Look Ma, No Code! Building Streaming Data Pipelines With A...
Paris jug ksql - 2018-06-28
Ad

More from Kai Wähner (20)

PDF
Apache Kafka as Data Hub for Crypto, NFT, Metaverse (Beyond the Buzz!)
PDF
When NOT to use Apache Kafka?
PDF
Kafka for Live Commerce to Transform the Retail and Shopping Metaverse
PDF
The Heart of the Data Mesh Beats in Real-Time with Apache Kafka
PDF
Apache Kafka vs. Cloud-native iPaaS Integration Platform Middleware
PDF
Data Warehouse vs. Data Lake vs. Data Streaming – Friends, Enemies, Frenemies?
PDF
Serverless Kafka and Spark in a Multi-Cloud Lakehouse Architecture
PDF
Resilient Real-time Data Streaming across the Edge and Hybrid Cloud with Apac...
PDF
Data Streaming with Apache Kafka in the Defence and Cybersecurity Industry
PDF
Apache Kafka in the Healthcare Industry
PDF
Apache Kafka in the Healthcare Industry
PDF
Apache Kafka for Real-time Supply Chain in the Food and Retail Industry
PDF
Apache Kafka for Predictive Maintenance in Industrial IoT / Industry 4.0
PDF
Apache Kafka Landscape for Automotive and Manufacturing
PDF
Kappa vs Lambda Architectures and Technology Comparison
PPTX
The Top 5 Apache Kafka Use Cases and Architectures in 2022
PDF
Event Streaming CTO Roundtable for Cloud-native Kafka Architectures
PDF
Apache Kafka in the Public Sector (Government, National Security, Citizen Ser...
PDF
Telco 4.0 - Payment and FinServ Integration for Data in Motion with 5G and Ap...
PDF
Apache Kafka in the Transportation and Logistics
Apache Kafka as Data Hub for Crypto, NFT, Metaverse (Beyond the Buzz!)
When NOT to use Apache Kafka?
Kafka for Live Commerce to Transform the Retail and Shopping Metaverse
The Heart of the Data Mesh Beats in Real-Time with Apache Kafka
Apache Kafka vs. Cloud-native iPaaS Integration Platform Middleware
Data Warehouse vs. Data Lake vs. Data Streaming – Friends, Enemies, Frenemies?
Serverless Kafka and Spark in a Multi-Cloud Lakehouse Architecture
Resilient Real-time Data Streaming across the Edge and Hybrid Cloud with Apac...
Data Streaming with Apache Kafka in the Defence and Cybersecurity Industry
Apache Kafka in the Healthcare Industry
Apache Kafka in the Healthcare Industry
Apache Kafka for Real-time Supply Chain in the Food and Retail Industry
Apache Kafka for Predictive Maintenance in Industrial IoT / Industry 4.0
Apache Kafka Landscape for Automotive and Manufacturing
Kappa vs Lambda Architectures and Technology Comparison
The Top 5 Apache Kafka Use Cases and Architectures in 2022
Event Streaming CTO Roundtable for Cloud-native Kafka Architectures
Apache Kafka in the Public Sector (Government, National Security, Citizen Ser...
Telco 4.0 - Payment and FinServ Integration for Data in Motion with 5G and Ap...
Apache Kafka in the Transportation and Logistics

Recently uploaded (20)

PDF
Peak of Data & AI Encore- AI for Metadata and Smarter Workflows
PDF
Building Integrated photovoltaic BIPV_UPV.pdf
PDF
Optimiser vos workloads AI/ML sur Amazon EC2 et AWS Graviton
PDF
Advanced methodologies resolving dimensionality complications for autism neur...
PDF
Machine learning based COVID-19 study performance prediction
PDF
Per capita expenditure prediction using model stacking based on satellite ima...
DOCX
The AUB Centre for AI in Media Proposal.docx
PDF
NewMind AI Weekly Chronicles - August'25 Week I
PPTX
MYSQL Presentation for SQL database connectivity
PPTX
Programs and apps: productivity, graphics, security and other tools
PDF
Chapter 3 Spatial Domain Image Processing.pdf
PDF
Dropbox Q2 2025 Financial Results & Investor Presentation
PDF
Unlocking AI with Model Context Protocol (MCP)
PPT
Teaching material agriculture food technology
PPTX
Cloud computing and distributed systems.
PPTX
sap open course for s4hana steps from ECC to s4
PDF
Encapsulation theory and applications.pdf
PDF
Blue Purple Modern Animated Computer Science Presentation.pdf.pdf
PPTX
Digital-Transformation-Roadmap-for-Companies.pptx
PPTX
VMware vSphere Foundation How to Sell Presentation-Ver1.4-2-14-2024.pptx
Peak of Data & AI Encore- AI for Metadata and Smarter Workflows
Building Integrated photovoltaic BIPV_UPV.pdf
Optimiser vos workloads AI/ML sur Amazon EC2 et AWS Graviton
Advanced methodologies resolving dimensionality complications for autism neur...
Machine learning based COVID-19 study performance prediction
Per capita expenditure prediction using model stacking based on satellite ima...
The AUB Centre for AI in Media Proposal.docx
NewMind AI Weekly Chronicles - August'25 Week I
MYSQL Presentation for SQL database connectivity
Programs and apps: productivity, graphics, security and other tools
Chapter 3 Spatial Domain Image Processing.pdf
Dropbox Q2 2025 Financial Results & Investor Presentation
Unlocking AI with Model Context Protocol (MCP)
Teaching material agriculture food technology
Cloud computing and distributed systems.
sap open course for s4hana steps from ECC to s4
Encapsulation theory and applications.pdf
Blue Purple Modern Animated Computer Science Presentation.pdf.pdf
Digital-Transformation-Roadmap-for-Companies.pptx
VMware vSphere Foundation How to Sell Presentation-Ver1.4-2-14-2024.pptx

KSQL – The Open Source SQL Streaming Engine for Apache Kafka (Big Data Spain 2018)

  • 1. KSQL The Open Source Streaming SQL Engine for Apache Kafka Kai Waehner Technology Evangelist [email protected] LinkedIn @KaiWaehner www.confluent.io www.kai-waehner.de
  • 2. 2Confidential 1.0 Enterprise Ready J A Brief History of Apache Kafka and Confluent 0.11 Exactly-once semantics 0.10 Data processing (Streams API) 0.9 Data integration (Connect API) Intra-cluster replication 0.8 2012 2014 Cluster mirroring0.7 2015 2016 20172013 2018 CP 4.1 KSQL GA
  • 3. 3Confidential KSQL – The Streaming SQL Engine for Apache Kafka
  • 4. 4KSQL- Streaming SQL for Apache Kafka Why KSQL? Population CodingSophistication Realm of Stream Processing New, Expanded Realm BI Analysts Core Developers Data Engineers Core Developers who don’t like Java Kafka Streams KSQL
  • 5. 5KSQL- Streaming SQL for Apache Kafka Shoulders of Streaming Giants subscribe(), poll(), send(), flush(), beginTransaction(), … KStream, KTable, filter(), map(), flatMap(), join(), aggregate(), transform(), … CREATE STREAM, CREATE TABLE, SELECT, JOIN, GROUP BY, SUM, … KSQL UDFs
  • 6. 6KSQL- Streaming SQL for Apache Kafka KSQL for Data Exploration and Debugging An easy way to inspect your data in Kafka SHOW TOPICS; SELECT page, user_id, status, bytes FROM clickstream WHERE user_agent LIKE 'Mozilla/5.0%'; PRINT 'my-topic' FROM BEGINNING;
  • 7. 7KSQL- Streaming SQL for Apache Kafka KSQL for Data Transformation Quickly make derivations of existing data in Kafka CREATE STREAM clicks_by_user_id WITH (PARTITIONS=6, TIMESTAMP='view_time’ VALUE_FORMAT='JSON') AS SELECT * FROM clickstream PARTITION BY user_id; Change number of partitions1 Convert data to JSON2 Repartition the data3
  • 8. 8KSQL- Streaming SQL for Apache Kafka KSQL for Real-Time, Streaming ETL Filter, cleanse, process data while it is in motion CREATE STREAM clicks_from_vip_users AS SELECT user_id, u.country, page, action FROM clickstream c LEFT JOIN users u ON c.user_id = u.user_id WHERE u.level ='Platinum'; Pick only VIP users1
  • 9. 9KSQL- Streaming SQL for Apache Kafka Example: CDC from DB via Kafka to Elastic
  • 10. 10KSQL- Streaming SQL for Apache Kafka KSQL for Real-time Data Enrichment Join data from a variety of sources to see the full picture CREATE STREAM enriched_payments AS SELECT payment_id, c.country, total FROM payments_stream p LEFT JOIN customers_table c ON p.user_id = c.user_id; Stream-Stream Join2 Stream-Table Join1
  • 11. 11KSQL- Streaming SQL for Apache Kafka Example: Retail
  • 12. 12KSQL- Streaming SQL for Apache Kafka KSQL for Real-Time Monitoring Derive insights from events (IoT, sensors, etc.) and turn them into actions CREATE TABLE failing_vehicles AS SELECT vehicle, COUNT(*) FROM vehicle_monitoring_stream WINDOW TUMBLING (SIZE 1 MINUTE) WHERE event_type = 'ERROR’ GROUP BY vehicle HAVING COUNT(*) >= 5; Now we know to alert, and whom1
  • 13. 13KSQL- Streaming SQL for Apache Kafka Example: IoT, Automotive, Connected Cars streams
  • 14. 14KSQL- Streaming SQL for Apache Kafka KSQL for Anomaly Detection Aggregate data to identify patterns and anomalies in real-time CREATE TABLE possible_fraud AS SELECT card_number, COUNT(*) FROM authorization_attempts WINDOW TUMBLING (SIZE 30 SECONDS) GROUP BY card_number HAVING COUNT(*) > 3; Aggregate data1 … per 30-sec windows2
  • 15. 15KSQL- Streaming SQL for Apache Kafka Example: Anomaly Detection with Deep Learning (Autoencoder) “CREATE STREAM AnomalyDetection AS SELECT sensor_id, detectAnomaly(sensor_values) FROM car_engine;“ User Defined Function (UDF) https://github.com/kaiwaehner/ ksql-udf-deep-learning-mqtt-iot
  • 16. 16KSQL- Streaming SQL for Apache Kafka Independent Dev / Test / Prod of different Apps and Microservices
  • 17. 17KSQL- Streaming SQL for Apache Kafka No Matter Where it Runs
  • 18. 18KSQL- Streaming SQL for Apache Kafka KSQL Concepts ● No need for source code • Zero, none at all, not even one line. • No SerDes, no generics, no lambdas, ... ● All the Kafka and Kafka Streams “magic” out-of-the-box • Exactly Once Semantics • Windowing • Event-time aggregation • Late-arriving data • Distributed, fault-tolerant, scalable, ...
  • 19. 19KSQL- Streaming SQL for Apache Kafka KSQL is Equally viable for S / M / L / XL / XXL use cases Ok. Ok. Ok. … and KSQL is ready for production, including 24/7 support!
  • 20. 20KSQL- Streaming SQL for Apache Kafka Fault-Tolerance, powered by Kafka
  • 21. 21KSQL- Streaming SQL for Apache Kafka STREAM and TABLE as first-class citizens
  • 22. 22KSQL- Streaming SQL for Apache Kafka WINDOWing ● Not ANSI SQL ! à Continuous Queries • TUMBLING • SELECT appname, ip, COUNT(appname) AS problem_count FROM logstream WINDOW TUMBLING (size 1 minute) WHERE loglevel='ERROR' GROUP BY appname, ip; • HOPPING • SELECT itemid, SUM(arraycol[0]) FROM orders WINDOW HOPPING (size 20 second, advance by 5 second) GROUP BY itemid; • SESSION • SELECT itemid, SUM(sales_price) FROM orders WINDOW SESSION (20 second) GROUP BY itemid;
  • 23. 23KSQL- Streaming SQL for Apache Kafka KSQL - Components KSQL has 3 main components: 1. The Engine which actually runs the Kafka Streams topologies 2. The REST server interface enables an Engine to receive instructions from the CLI or any other client 3. The CLI, designed to be familiar to users of MySQL, Postgres etc. (Note that you also need a Kafka Cluster… KSQL is deployed independently)
  • 24. 24KSQL- Streaming SQL for Apache Kafka KSQL can be used interactively + programmatically ksql> POST /query 1UI 2CLI 3REST 4Headless
  • 25. 25KSQL- Streaming SQL for Apache Kafka Architecture (Client – Server Mode) JVM KSQL Server KSQL CLI or any REST Client JVM KSQL Server JVM KSQL Server Kafka Cluster
  • 26. 26KSQL- Streaming SQL for Apache Kafka Architecture (Headless Mode) JVM KSQL Server JVM KSQL Server JVM KSQL Server Kafka Cluster
  • 27. 27KSQL- Streaming SQL for Apache Kafka Dedicating resources Join Engines to the same ‘service pool’ by means of the ksql.service.id property
  • 28. 28KSQL- Streaming SQL for Apache Kafka User Defined Functions (UDF, UDAF) Write UDF code in Java, mark with annotations @UdfDescription, @Udf. SELECT address, STRINGLENGTH(address->street) FROM orders; Make UDF available to KSQL (next slides), then use it like any other KSQL function in your queries: The UDF name in KSQL queries is whatever you define in the `name` field in the annotation (here: “stringLength”).
  • 29. 29KSQL- Streaming SQL for Apache Kafka Live Demo KSQL in Action
  • 30. 30KSQL- Streaming SQL for Apache Kafka KSQL Quick Start – Getting Started in Minutes! https://docs.confluent.io/ current/quickstart/index.html Local runtime or Docker container
  • 31. 31KSQL- Streaming SQL for Apache Kafka Demo - Clickstream Analysis • https://docs.confluent.io/current/ksql/docs/tutorials/clickstream-docker.html#ksql-clickstream- docker • Leverages Apache Kafka, Kafka Connect, KSQL, Elasticsearch and Grafana • 5min screencast: https://www.youtube.com/watch?v=A45uRzJiv7I • Setup in 5 minutes (with or without Docker) SELECT STREAM CEIL(timestamp TO HOUR) AS timeWindow, productId, COUNT(*) AS hourlyOrders, SUM(units) AS units FROM Orders GROUP BY CEIL(timestamp TO HOUR), productId; timeWindow | productId | hourlyOrders | units ------------+-----------+--------------+------- 08:00:00 | 10 | 2 | 5 08:00:00 | 20 | 1 | 8 09:00:00 | 10 | 4 | 22 09:00:00 | 40 | 1 | 45 ... | ... | ... | ...
  • 32. 32KSQL- Streaming SQL for Apache Kafka KSQL Recipes https://www.confluent.io/stream-processing-cookbook
  • 33. 33KSQL- Streaming SQL for Apache Kafka Resources and Next Steps Get Involved • Try the Quickstart on GitHub • Check out the code • Play with the examples KSQL is GA… You can already use it for production deployments! https://github.com/confluentinc/ksql http://confluent.io/ksql https://slackpass.io/confluentcommunity #ksql