SlideShare a Scribd company logo
UNLOCK THE POWER OF
STREAMING DATA
Mathew Hawkins, Principal Solutions Architect, Kinetica
Chong Yan, Solutions Architect, Confluent
Mathew Hawkins
Principal Solutions Architect
Kinetica
Chong Yan
Solutions Architect
Confluent
OUR SPEAKERS
CONSTANTLY
GROWING NUMBER
OF SMART DEVICES
DRIVING STREAMING
EXTREME DATA Explosion of
Data
•Structured
•Unstructured
•Smart devices
•Connected cars
•Sensors
Real-time
Demands
•Need for
driven
•Demand for
analytics on
data not stale
Existing tech
doesn’t work
•Workloads are
I/O & compute
bound
•Too complex –
involves
technologies
together
•Batch oriented
COMPANY OVERVIEW
100+ employees
worldwide
Exponential
growth
$50M Series A
Top Tier Investors
CLICK TO EDIT MASTER TITLE STYLE
CORE CONCEPTS
GPU-Accelerated Memory-first Columnar Database
KINETICA CORE DIFFERENTIATION
Location-based
Analysis, Rendering,
Discovery & Insights
Data-driven,
Streamlined Machine
Learning
BREAKTHROUGH
SPEED WITH +
Advanced Analytics on
Extreme Data:
Static & Streaming
INSIDE KINETICA
OLAP
optimized
Native Geospatial
Datatypes & Functions
Distributed, Linear
Scale Out
User Defined
Functions (UDFs) &
Models
Native REST API Full Text
Search
SQL92
Visual
Dashboards
Ecosystem
Connectors
1 NODE (1TB/2GPU)
PARALLEL
INGEST
1 NODE (1TB/2GPU)
1 NODE (1TB/2GPU)
Each node of the system can
share the task of data ingest,
provides more and faster
throughput. It can always be
made faster simply by adding
more nodes.
PARALLEL INGEST
PROVIDES HIGH
PERFORMANCE
STREAMING
0 200 400 600 800 1000 1200
Time to ingest 100M Tweets
Leading In-memory DB NoSQL DB
150s
753s
1029s
KINETICA & CONFLUENT IN YOUR ECOSYSTEM
ETL / STREAM
PROCESSING
SQL
Native
APIs
PARALLEL
INGEST
Geospatial
WMS
Custom
Connectors
BI DASHBOARDS
BI / GIS / APPS
CUSTOM APPS
& GEOSPATIAL
KINETICA ‘REVEAL’
STREAMINGDATA
UDFs
ON DEMAND SCALE OUT +
Built-in Machine Learning
CUSTOM
LOGIC
BIDMach
ERP / CRM /
TRANSACTIONA
L
CERTIFIED
CONNECTOR
CERTIFIED
CONNECTOR
BUILT FOR BUSINESS USERS & DATA SCIENTISTS
MACHINE
LEARNING
MASSIVE
PARALLEL
COMPUTING
CUSTOM
APPLICATIONS
GEOSPATIAL
VISUALIZATION
STREAMING DATA
ANALYSIS
ADVANCED
ANALYTICS
BUSINESS
USERS
DATA SCIENTISTS /DEVELOPERS
CABLE & BROADCASTING |
REAL-TIME VIEWERSHIP ANALYSIS
LARGE US CABLE PROVIDER
BUSINESS OBJECTIVE
Real-time analysis of live viewership
across all broadcasted channels
particularly for live events ( Ex. Super
bowl, Olympics)
NEW CAPABILITIES DELIVERED
Ability to collect data streaming from
set-top boxes and analyze it in real-
time to track viewership by senior
executive team
ADTECH | REAL-TIME CAMPAIGN REPORTING
BUSINESS OBJECTIVE
Be first to market with game changing
technologies that put publishers’ needs first
NEW CAPABILITIES DELIVERED
High-speed ingest, store, and persist data
processing capabilities
Ad-hoc analytics on ad impression and bid
data
13Confidential
Confluent Platform
Technical Overview
Chong Yan, Solutions Architect, Confluent
“What is APACHE Kafka®?
Confidential 15
True Storage
Real-Time
Processing
Scalability
More than messaging
A streaming platform
16
+ Distributed Clustered Storage
Kafka is a blend of messaging, stream processing, ETL and
modern database designs built around a distributed log
+ Streaming Platform
Pub/Sub
Messaging
ETL
Connectors
Spark
Flink
Beam
IBM MQ
TIBCO
RabbitMQ
Mulesoft
Talend
Informatica
Kafka is much more than messaging
+ Exactly Once
+ Designed for the Cloud+ Inter-DC
Replication
+ Schema Evolution
Stream
Processing
“Why do we need Kafka?
Confidential 18
Many systems are a bit of a mess…
Confidential 19
Confidential 20
What does a streaming platform do?
Publish and
subscribe to
streams of data
Similar to a message
queue or enterprise
messaging system
110101
010111
001101
100010
Store streams
of data
In a fault tolerant way110101
010111
001101
100010
Process streams
of data
In real time,
as they occur
110101
010111
001101
100010
Confidential 21
A streaming platform has many benefits
•Lower latency—better
customer experience
•Decoupled architecture—
future-proof, reduce risk,
reduce costs, easier to run
•Highly performant
and scalable
Confidential 22
Why Confluent?
Confidential
Kafka
Cluster
23
Apache Kafka
Kafka
●A distributed commit log
●Publish and subscribe to streams of records
●Highly scalable, high throughput
●Supports transactions
●Persisted data
Reads are a single seek and scan
Writes are
append only
Confidential 24
Apache Kafka
Kafka Streams API
Write standard Java applications and microservices
to process your data in real time
Kafka Connect API
Reliable and scalable integration of Kafka
with other systems—no coding required
Orders
Table
Customers
Kafka Streams API
Confidential 25
Confluent Open Source
Connectors and Clients
Native Apache Kafka producer/consumer client
libraries, plus connectors for Kafka Connect
KSQL
Streaming SQL engine for Apache Kafka
CREATE TABLE possible_fraud AS
SELECT card_number, count(*)
FROM authorization_attempts
WINDOW TUMBLING (SIZE 5 SECONDS)
GROUP BY card_number
HAVING count(*) > 3;
Confidential 26
Confluent Open Source
REST Proxy
Send and teceive data to/from Apache Kafka
using REST calls
Schema Registry
Store and enforce the schemas used per topic
{
"type": "record",
"name": "LOGON",
"namespace": "ORCL.SOE2",
"fields": [
{
"name": "table",
"type": [
"null",
"string"
],
"default": null
},
{
"name": "op_type",
"type": [
"null",
"string"
],
"default": null
},
Confidential 27
Confluent Open Source
Docker images, deb/yum
installers
Easier to install and deploy, also AWS and Azure
quickstart templates
Confluent CLI
Easily work with Confluent Platform on a single-
node sandbox environment
Confidential 28
Confluent Enterprise
Multi-Datacenter Replication
Easily configure and maintain cross-cluster
replication
Control Center
Kafka monitoring for the enterprise
Confidential 29
Confluent Enterprise
Auto Data Balancer
Dynamically move partitions to optimize resource
utilization and reliability
JMS Client
Integrate with existing JMS applications,
migrate seamlessly away from legacy JMS
MQs
Before
After
Rebalance
Enhanced Security
ACL support for REST Proxy and Schema
Registry
30
KSQL from Confluent
KSQL
The open source streaming SQL
engine for Apache Kafka
Confidential 31
KSQL: the streaming SQL engine for Apache Kafka from Confluent
✓All you need is SQL
✓No separate processing cluster required
✓Powered by Kafka: elastic, scalable,
distributed, battle-tested
CREATE TABLE possible_fraud AS
SELECT card_number, count(*)
FROM authorization_attempts
WINDOW TUMBLING (SIZE 5 SECONDS)
GROUP BY card_number
HAVING count(*) > 3;
CREATE STREAM vip_actions AS
SELECT userid, page, action
FROM clickstream c
LEFT JOIN users u
ON c.userid = u.userid
WHERE u.level = 'Platinum';
KSQL is the simplest way to process streams of data in real-
time
✓Perfect for streaming ETL, anomaly detection,
event monitoring and more
✓Part of Confluent Open Source
https://github.com/confluentinc/ksql
Confidential 32
CREATE STREAM possible_fraud AS
SELECT card_number, count(*)
FROM authorization_attempts
WINDOW TUMBLING (SIZE 5 SECONDS)
GROUP BY card_number
HAVING count(*) > 3;
KSQL: the simplest way to do stream processing
Confidential 33
CREATE TABLE error_counts AS
SELECT error_code, count(*)
FROM monitoring_stream
WINDOW TUMBLING (SIZE 1 MINUTE)
WHERE type = 'ERROR'
GROUP BY error_code;
CREATE STREAM vip_actions AS
SELECT userid, page, action
FROM clickstream c
LEFT JOIN users u ON c.userid = u.user_id
WHERE u.level = 'Platinum';
CREATE STREAM possible_fraud AS
SELECT card_number, count(*)
FROM authorization_attempts
WINDOW TUMBLING (SIZE 5 SECONDS)
GROUP BY card_number
HAVING count(*) > 3;
KSQL: the simplest way to do stream processing
1 2 3Streaming ETL Anomaly detection Monitoring
34
1)How to run KSQL: standalone aka “local mode”
• Starts a CLI, an engine and a REST server all in the same JVM
• Ideal for laptop development
• Start with default settings:
> bin/ksql-cli local
• Or with customized settings:
> bin/ksql-cli local –-properties-file foo/bar/ksql.properties
35
2) How to run KSQL: client-server
• Start any number of server nodes
• > bin/ksql-server-start
• Start any number of CLIs and specify “remote” server address
• >bin/ksql-cli remote http://myserver:8090
• All running engines share the processing load
• Technically, instances of the same Kafka Streams
applications
• Scale up/down without restart
36
3) How to run KSQL: as an application
• Start any number of engine instances
• Pass a file of KSQL statements to execute
> bin/ksql-node query-file=foo/bar.sql
• Ideal for streaming ETL application deployment
• Version control your queries and transformations as code
• All running engines share the processing load
• Technically, instances of the same Kafka Streams
applications
• Scale up/down without restart
DEMO
Live data from
our San
Francisco
Firewall
DEMO
Machine Learning
Reveal Dashboard for Analyst
SQL Developer to produce
INSIGHTS
Real time joins on
streaming data
INSIGHTS
Kafka
Netflow
Blacklist
KSQL Threats
SQL
DEVELOPER
FRONT-END
USER
DATA
SCIENTIST
PERSONAS
DEMO
Fast
moving data
DEMO RECAP
Bringing together all types of
analysts on real time data
Manipulate real
time data to
trigger events
INSIGHTS
Kafka
Netflow
Blacklist
KSQL Threats
RESOURCES
Kinetica Partner Page
https://www.confluent.io/connector/kine
tica-db-connector/
Website
www.confluent.io
Twitter
@ConfluentInc
Confluent Partner Page
https://www.kinetica.com/partner/con
fluent/
Website
www.kinetica.com
Twitter
@KineticaHQ
Q&A

More Related Content

PPTX
GTC-DC 2017 Session: Advanced Analytics and Machine Learning with Geospatial ...
PPTX
Playing to Win: Turbocharged Tableau with a GPU Database
PPTX
Powering Real-Time Big Data Analytics with a Next-Gen GPU Database
PDF
GPU Acceleration for Financial Services
PDF
Operationalizing Machine Learning Using GPU Accelerated, In-Database Analytics
PDF
Operationalizing Machine Learning Using GPU-accelerated, In-database Analytics
PPTX
5 Steps to Smarter, Faster, Simpler Tableau Dashboards.
PDF
Ebooks - Accelerating Time to Value of Big Data of Apache Spark | Qubole
GTC-DC 2017 Session: Advanced Analytics and Machine Learning with Geospatial ...
Playing to Win: Turbocharged Tableau with a GPU Database
Powering Real-Time Big Data Analytics with a Next-Gen GPU Database
GPU Acceleration for Financial Services
Operationalizing Machine Learning Using GPU Accelerated, In-Database Analytics
Operationalizing Machine Learning Using GPU-accelerated, In-database Analytics
5 Steps to Smarter, Faster, Simpler Tableau Dashboards.
Ebooks - Accelerating Time to Value of Big Data of Apache Spark | Qubole

What's hot (20)

PDF
How GPUs Enable XVA Pricing and Risk Calculations for Risk Aggregation
PDF
How To Achieve Real-Time Analytics On A Data Lake Using GPUs
PDF
Cloud-native Semantic Layer on Data Lake
PDF
Power Your Delta Lake with Streaming Transactional Changes
PDF
Intro to databricks delta lake
PDF
IlOUG Tech Days 2016 - Big Data for Oracle Developers - Towards Spark, Real-T...
PPTX
Synapse for mere mortals
PDF
IBM Cloud Day January 2021 - A well architected data lake
PPTX
Snowflake Datawarehouse Architecturing
PPT
Google App Engine
PPTX
Architecting Snowflake for High Concurrency and High Performance
PPTX
"An introduction to Kx Technology - a Big Data solution", Kyra Coyne, Data Sc...
PPTX
Webinar: The Modern Streaming Data Stack with Kinetica & StreamSets
PPTX
How Glidewell Moves Data to Amazon Redshift
PPTX
Best Practices for Supercharging Cloud Analytics on Amazon Redshift
PPTX
GPU 101: The Beast In Data Centers
PPTX
Delta lake and the delta architecture
PPTX
Cortana Analytics Suite
PDF
Suburface 2021 IBM Cloud Data Lake
PPTX
Accelerating Big Data Analytics
How GPUs Enable XVA Pricing and Risk Calculations for Risk Aggregation
How To Achieve Real-Time Analytics On A Data Lake Using GPUs
Cloud-native Semantic Layer on Data Lake
Power Your Delta Lake with Streaming Transactional Changes
Intro to databricks delta lake
IlOUG Tech Days 2016 - Big Data for Oracle Developers - Towards Spark, Real-T...
Synapse for mere mortals
IBM Cloud Day January 2021 - A well architected data lake
Snowflake Datawarehouse Architecturing
Google App Engine
Architecting Snowflake for High Concurrency and High Performance
"An introduction to Kx Technology - a Big Data solution", Kyra Coyne, Data Sc...
Webinar: The Modern Streaming Data Stack with Kinetica & StreamSets
How Glidewell Moves Data to Amazon Redshift
Best Practices for Supercharging Cloud Analytics on Amazon Redshift
GPU 101: The Beast In Data Centers
Delta lake and the delta architecture
Cortana Analytics Suite
Suburface 2021 IBM Cloud Data Lake
Accelerating Big Data Analytics
Ad

Similar to Webinar: Unlock the Power of Streaming Data with Kinetica and Confluent (20)

PDF
Introduction to Apache Kafka and Confluent... and why they matter!
PDF
Introduction to Apache Kafka and Confluent... and why they matter
PDF
Confluent and Elastic: a Lovely Couple - Elastic Stack in a Day 2018
PDF
Chti jug - 2018-06-26
PDF
KSQL – An Open Source Streaming Engine for Apache Kafka
PDF
Big Data LDN 2017: Look Ma, No Code! Building Streaming Data Pipelines With A...
PDF
Concepts and Patterns for Streaming Services with Kafka
PDF
Build a Bridge to Cloud with Apache Kafka® for Data Analytics Cloud Services
PDF
Integrating Apache Kafka Into Your Environment
PDF
Kafka Streams vs. KSQL for Stream Processing on top of Apache Kafka
PDF
Rethinking Stream Processing with Apache Kafka, Kafka Streams and KSQL
PDF
Jug - ecosystem
PDF
Unlocking the world of stream processing with KSQL, the streaming SQL engine ...
PDF
Apache Kafka and KSQL in Action: Let's Build a Streaming Data Pipeline!
PDF
Beyond the brokers - A tour of the Kafka ecosystem
PDF
Beyond the Brokers: A Tour of the Kafka Ecosystem
PDF
Streaming ETL to Elastic with Apache Kafka and KSQL
PDF
Introduction to Apache Kafka and why it matters - Madrid
PDF
KSQL – The Open Source SQL Streaming Engine for Apache Kafka (Big Data Spain ...
PDF
Beyond the brokers - Un tour de l'écosystème Kafka
Introduction to Apache Kafka and Confluent... and why they matter!
Introduction to Apache Kafka and Confluent... and why they matter
Confluent and Elastic: a Lovely Couple - Elastic Stack in a Day 2018
Chti jug - 2018-06-26
KSQL – An Open Source Streaming Engine for Apache Kafka
Big Data LDN 2017: Look Ma, No Code! Building Streaming Data Pipelines With A...
Concepts and Patterns for Streaming Services with Kafka
Build a Bridge to Cloud with Apache Kafka® for Data Analytics Cloud Services
Integrating Apache Kafka Into Your Environment
Kafka Streams vs. KSQL for Stream Processing on top of Apache Kafka
Rethinking Stream Processing with Apache Kafka, Kafka Streams and KSQL
Jug - ecosystem
Unlocking the world of stream processing with KSQL, the streaming SQL engine ...
Apache Kafka and KSQL in Action: Let's Build a Streaming Data Pipeline!
Beyond the brokers - A tour of the Kafka ecosystem
Beyond the Brokers: A Tour of the Kafka Ecosystem
Streaming ETL to Elastic with Apache Kafka and KSQL
Introduction to Apache Kafka and why it matters - Madrid
KSQL – The Open Source SQL Streaming Engine for Apache Kafka (Big Data Spain ...
Beyond the brokers - Un tour de l'écosystème Kafka
Ad

Recently uploaded (20)

PDF
Digital Systems & Binary Numbers (comprehensive )
PPTX
CHAPTER 2 - PM Management and IT Context
PDF
Adobe Premiere Pro 2025 (v24.5.0.057) Crack free
PDF
Salesforce Agentforce AI Implementation.pdf
PPTX
Advanced SystemCare Ultimate Crack + Portable (2025)
PPTX
Embracing Complexity in Serverless! GOTO Serverless Bengaluru
PDF
T3DD25 TYPO3 Content Blocks - Deep Dive by André Kraus
PDF
17 Powerful Integrations Your Next-Gen MLM Software Needs
PDF
Internet Downloader Manager (IDM) Crack 6.42 Build 42 Updates Latest 2025
PPTX
AMADEUS TRAVEL AGENT SOFTWARE | AMADEUS TICKETING SYSTEM
PDF
Odoo Companies in India – Driving Business Transformation.pdf
PPTX
Computer Software and OS of computer science of grade 11.pptx
PPTX
Oracle Fusion HCM Cloud Demo for Beginners
DOCX
Greta — No-Code AI for Building Full-Stack Web & Mobile Apps
PPTX
Log360_SIEM_Solutions Overview PPT_Feb 2020.pptx
PDF
CCleaner Pro 6.38.11537 Crack Final Latest Version 2025
PPTX
Transform Your Business with a Software ERP System
PPTX
Agentic AI : A Practical Guide. Undersating, Implementing and Scaling Autono...
PDF
Nekopoi APK 2025 free lastest update
PDF
How to Choose the Right IT Partner for Your Business in Malaysia
Digital Systems & Binary Numbers (comprehensive )
CHAPTER 2 - PM Management and IT Context
Adobe Premiere Pro 2025 (v24.5.0.057) Crack free
Salesforce Agentforce AI Implementation.pdf
Advanced SystemCare Ultimate Crack + Portable (2025)
Embracing Complexity in Serverless! GOTO Serverless Bengaluru
T3DD25 TYPO3 Content Blocks - Deep Dive by André Kraus
17 Powerful Integrations Your Next-Gen MLM Software Needs
Internet Downloader Manager (IDM) Crack 6.42 Build 42 Updates Latest 2025
AMADEUS TRAVEL AGENT SOFTWARE | AMADEUS TICKETING SYSTEM
Odoo Companies in India – Driving Business Transformation.pdf
Computer Software and OS of computer science of grade 11.pptx
Oracle Fusion HCM Cloud Demo for Beginners
Greta — No-Code AI for Building Full-Stack Web & Mobile Apps
Log360_SIEM_Solutions Overview PPT_Feb 2020.pptx
CCleaner Pro 6.38.11537 Crack Final Latest Version 2025
Transform Your Business with a Software ERP System
Agentic AI : A Practical Guide. Undersating, Implementing and Scaling Autono...
Nekopoi APK 2025 free lastest update
How to Choose the Right IT Partner for Your Business in Malaysia

Webinar: Unlock the Power of Streaming Data with Kinetica and Confluent

  • 1. UNLOCK THE POWER OF STREAMING DATA Mathew Hawkins, Principal Solutions Architect, Kinetica Chong Yan, Solutions Architect, Confluent
  • 2. Mathew Hawkins Principal Solutions Architect Kinetica Chong Yan Solutions Architect Confluent OUR SPEAKERS
  • 3. CONSTANTLY GROWING NUMBER OF SMART DEVICES DRIVING STREAMING EXTREME DATA Explosion of Data •Structured •Unstructured •Smart devices •Connected cars •Sensors Real-time Demands •Need for driven •Demand for analytics on data not stale Existing tech doesn’t work •Workloads are I/O & compute bound •Too complex – involves technologies together •Batch oriented
  • 5. CLICK TO EDIT MASTER TITLE STYLE CORE CONCEPTS GPU-Accelerated Memory-first Columnar Database
  • 6. KINETICA CORE DIFFERENTIATION Location-based Analysis, Rendering, Discovery & Insights Data-driven, Streamlined Machine Learning BREAKTHROUGH SPEED WITH + Advanced Analytics on Extreme Data: Static & Streaming
  • 7. INSIDE KINETICA OLAP optimized Native Geospatial Datatypes & Functions Distributed, Linear Scale Out User Defined Functions (UDFs) & Models Native REST API Full Text Search SQL92 Visual Dashboards Ecosystem Connectors
  • 8. 1 NODE (1TB/2GPU) PARALLEL INGEST 1 NODE (1TB/2GPU) 1 NODE (1TB/2GPU) Each node of the system can share the task of data ingest, provides more and faster throughput. It can always be made faster simply by adding more nodes. PARALLEL INGEST PROVIDES HIGH PERFORMANCE STREAMING 0 200 400 600 800 1000 1200 Time to ingest 100M Tweets Leading In-memory DB NoSQL DB 150s 753s 1029s
  • 9. KINETICA & CONFLUENT IN YOUR ECOSYSTEM ETL / STREAM PROCESSING SQL Native APIs PARALLEL INGEST Geospatial WMS Custom Connectors BI DASHBOARDS BI / GIS / APPS CUSTOM APPS & GEOSPATIAL KINETICA ‘REVEAL’ STREAMINGDATA UDFs ON DEMAND SCALE OUT + Built-in Machine Learning CUSTOM LOGIC BIDMach ERP / CRM / TRANSACTIONA L CERTIFIED CONNECTOR CERTIFIED CONNECTOR
  • 10. BUILT FOR BUSINESS USERS & DATA SCIENTISTS MACHINE LEARNING MASSIVE PARALLEL COMPUTING CUSTOM APPLICATIONS GEOSPATIAL VISUALIZATION STREAMING DATA ANALYSIS ADVANCED ANALYTICS BUSINESS USERS DATA SCIENTISTS /DEVELOPERS
  • 11. CABLE & BROADCASTING | REAL-TIME VIEWERSHIP ANALYSIS LARGE US CABLE PROVIDER BUSINESS OBJECTIVE Real-time analysis of live viewership across all broadcasted channels particularly for live events ( Ex. Super bowl, Olympics) NEW CAPABILITIES DELIVERED Ability to collect data streaming from set-top boxes and analyze it in real- time to track viewership by senior executive team
  • 12. ADTECH | REAL-TIME CAMPAIGN REPORTING BUSINESS OBJECTIVE Be first to market with game changing technologies that put publishers’ needs first NEW CAPABILITIES DELIVERED High-speed ingest, store, and persist data processing capabilities Ad-hoc analytics on ad impression and bid data
  • 13. 13Confidential Confluent Platform Technical Overview Chong Yan, Solutions Architect, Confluent
  • 14. “What is APACHE Kafka®?
  • 16. 16 + Distributed Clustered Storage Kafka is a blend of messaging, stream processing, ETL and modern database designs built around a distributed log + Streaming Platform Pub/Sub Messaging ETL Connectors Spark Flink Beam IBM MQ TIBCO RabbitMQ Mulesoft Talend Informatica Kafka is much more than messaging + Exactly Once + Designed for the Cloud+ Inter-DC Replication + Schema Evolution Stream Processing
  • 17. “Why do we need Kafka?
  • 18. Confidential 18 Many systems are a bit of a mess…
  • 20. Confidential 20 What does a streaming platform do? Publish and subscribe to streams of data Similar to a message queue or enterprise messaging system 110101 010111 001101 100010 Store streams of data In a fault tolerant way110101 010111 001101 100010 Process streams of data In real time, as they occur 110101 010111 001101 100010
  • 21. Confidential 21 A streaming platform has many benefits •Lower latency—better customer experience •Decoupled architecture— future-proof, reduce risk, reduce costs, easier to run •Highly performant and scalable
  • 23. Confidential Kafka Cluster 23 Apache Kafka Kafka ●A distributed commit log ●Publish and subscribe to streams of records ●Highly scalable, high throughput ●Supports transactions ●Persisted data Reads are a single seek and scan Writes are append only
  • 24. Confidential 24 Apache Kafka Kafka Streams API Write standard Java applications and microservices to process your data in real time Kafka Connect API Reliable and scalable integration of Kafka with other systems—no coding required Orders Table Customers Kafka Streams API
  • 25. Confidential 25 Confluent Open Source Connectors and Clients Native Apache Kafka producer/consumer client libraries, plus connectors for Kafka Connect KSQL Streaming SQL engine for Apache Kafka CREATE TABLE possible_fraud AS SELECT card_number, count(*) FROM authorization_attempts WINDOW TUMBLING (SIZE 5 SECONDS) GROUP BY card_number HAVING count(*) > 3;
  • 26. Confidential 26 Confluent Open Source REST Proxy Send and teceive data to/from Apache Kafka using REST calls Schema Registry Store and enforce the schemas used per topic { "type": "record", "name": "LOGON", "namespace": "ORCL.SOE2", "fields": [ { "name": "table", "type": [ "null", "string" ], "default": null }, { "name": "op_type", "type": [ "null", "string" ], "default": null },
  • 27. Confidential 27 Confluent Open Source Docker images, deb/yum installers Easier to install and deploy, also AWS and Azure quickstart templates Confluent CLI Easily work with Confluent Platform on a single- node sandbox environment
  • 28. Confidential 28 Confluent Enterprise Multi-Datacenter Replication Easily configure and maintain cross-cluster replication Control Center Kafka monitoring for the enterprise
  • 29. Confidential 29 Confluent Enterprise Auto Data Balancer Dynamically move partitions to optimize resource utilization and reliability JMS Client Integrate with existing JMS applications, migrate seamlessly away from legacy JMS MQs Before After Rebalance Enhanced Security ACL support for REST Proxy and Schema Registry
  • 30. 30 KSQL from Confluent KSQL The open source streaming SQL engine for Apache Kafka
  • 31. Confidential 31 KSQL: the streaming SQL engine for Apache Kafka from Confluent ✓All you need is SQL ✓No separate processing cluster required ✓Powered by Kafka: elastic, scalable, distributed, battle-tested CREATE TABLE possible_fraud AS SELECT card_number, count(*) FROM authorization_attempts WINDOW TUMBLING (SIZE 5 SECONDS) GROUP BY card_number HAVING count(*) > 3; CREATE STREAM vip_actions AS SELECT userid, page, action FROM clickstream c LEFT JOIN users u ON c.userid = u.userid WHERE u.level = 'Platinum'; KSQL is the simplest way to process streams of data in real- time ✓Perfect for streaming ETL, anomaly detection, event monitoring and more ✓Part of Confluent Open Source https://github.com/confluentinc/ksql
  • 32. Confidential 32 CREATE STREAM possible_fraud AS SELECT card_number, count(*) FROM authorization_attempts WINDOW TUMBLING (SIZE 5 SECONDS) GROUP BY card_number HAVING count(*) > 3; KSQL: the simplest way to do stream processing
  • 33. Confidential 33 CREATE TABLE error_counts AS SELECT error_code, count(*) FROM monitoring_stream WINDOW TUMBLING (SIZE 1 MINUTE) WHERE type = 'ERROR' GROUP BY error_code; CREATE STREAM vip_actions AS SELECT userid, page, action FROM clickstream c LEFT JOIN users u ON c.userid = u.user_id WHERE u.level = 'Platinum'; CREATE STREAM possible_fraud AS SELECT card_number, count(*) FROM authorization_attempts WINDOW TUMBLING (SIZE 5 SECONDS) GROUP BY card_number HAVING count(*) > 3; KSQL: the simplest way to do stream processing 1 2 3Streaming ETL Anomaly detection Monitoring
  • 34. 34 1)How to run KSQL: standalone aka “local mode” • Starts a CLI, an engine and a REST server all in the same JVM • Ideal for laptop development • Start with default settings: > bin/ksql-cli local • Or with customized settings: > bin/ksql-cli local –-properties-file foo/bar/ksql.properties
  • 35. 35 2) How to run KSQL: client-server • Start any number of server nodes • > bin/ksql-server-start • Start any number of CLIs and specify “remote” server address • >bin/ksql-cli remote http://myserver:8090 • All running engines share the processing load • Technically, instances of the same Kafka Streams applications • Scale up/down without restart
  • 36. 36 3) How to run KSQL: as an application • Start any number of engine instances • Pass a file of KSQL statements to execute > bin/ksql-node query-file=foo/bar.sql • Ideal for streaming ETL application deployment • Version control your queries and transformations as code • All running engines share the processing load • Technically, instances of the same Kafka Streams applications • Scale up/down without restart
  • 37. DEMO Live data from our San Francisco Firewall DEMO Machine Learning Reveal Dashboard for Analyst SQL Developer to produce INSIGHTS Real time joins on streaming data INSIGHTS Kafka Netflow Blacklist KSQL Threats
  • 39. DEMO Fast moving data DEMO RECAP Bringing together all types of analysts on real time data Manipulate real time data to trigger events INSIGHTS Kafka Netflow Blacklist KSQL Threats
  • 40. RESOURCES Kinetica Partner Page https://www.confluent.io/connector/kine tica-db-connector/ Website www.confluent.io Twitter @ConfluentInc Confluent Partner Page https://www.kinetica.com/partner/con fluent/ Website www.kinetica.com Twitter @KineticaHQ
  • 41. Q&A