SlideShare a Scribd company logo
1
Simplified Machine Learning Architecture
with an Event Streaming Platform
Kai Waehner | Technology Evangelist, Confluent
contact@kai-waehner.de | LinkedIn | @KaiWaehner | www.confluent.io | www.kai-waehner.de
2
Machine Learning to Improve Traditional
and to Build New Use Cases
Seconds Minutes Hours
Windows of Opportunity
Real Time
Tracking
Predictive
Maintenance
Fraud
Detection
Cross Selling
Transportation
Rerouting
Customer
Service
Inventory
Management
Autonomous
Driving
Face
Recognition
Robotics
Speech
Translation
Video
Generation
Supply Chain
Optimization
Strategic
Planning
3
Global Automotive Company
Builds Connected Car Infrastructure
Digital Transformation
• Improve customer experience
• Increase revenue
• Reduce risk
Time
Today 2 years in the future3 years ago
Project begins Connected car infrastructure
in production for first use cases
Improved processes leveraging
machine learning (predictive
maintenance, cross-selling)
4
Streaming Analytics for
Predictive Maintenance at Scale
IoT
Integration
Layer
Batch
Analytics
Platform
BI
Dashboard
Streaming
Platform
Big Data
Integration
Layer
Car Sensors
Streaming Platform
Other Components
Real Time
Monitoring
System
All
Data
Critical
Data
Ingest
Data
Human
Intelligence
5
Machine Learning (ML)
...allows computers to find hidden insights without
being explicitly programmed where to look.
Machine
Learning
• Decision Trees
• Naïve Bayes
• Clustering
• Neural Networks
• Etc.
Deep
Learning
• CNN
• RNN
• Transformer
• Autoencoder
• Etc.
6
Streaming Analytics for
Predictive Maintenance at Scale
IoT
Integration
Layer
Batch
Analytics
Platform
BI
Dashboard
Streaming
Platform
Big Data
Integration
Layer
Car Sensors
Streaming Platform
Analytics Platform
Other Components
Real Time
Monitoring
System
All
Data
Critical
Data
Ingest
Data
Potential
Detect
Data
Processing
Analytics
Platform
Train Analytic
Model
Consume
Data
Preprocess
Data
Analytic Model
Deploy Analytic
Model
7
The First
Analytic Models
How to deploy the models
in production?
…real-time processing?
…at scale?
…24/7 zero uptime?
8
Hidden Technical Debt
in Machine Learning Systems
https://papers.nips.cc/paper/5656-hidden-technical-debt-in-machine-learning-systems.pdf
9
Scalable,
Technology-Agnostic
Machine Learning
Infrastructures
https://www.infoq.com/presentations/netflix-ml-meson
https://eng.uber.com/michelangelo
https://www.infoq.com/presentations/paypal-data-service-fraud
10
Event Streaming Platform –
The Commit Log
Time
P
C1 C2
C3
11
Event Streaming Platform –
A Distributed System
Broker 1
Topic1
partition1
Broker 2 Broker 3 Broker 4
Topic1
partition1
Topic1
partition1
Leader Follower
Topic1
partition2
Topic1
partition2
Topic1
partition2
Topic1
partition3
Topic1
partition4
Topic1
partition3
Topic1
partition3
Topic1
partition4
Topic1
partition4
12
A Streaming Platform
is the Underpinning of an Event-driven Architecture
Microservices
DBs
SaaS apps
Mobile
Customer 360
Real-time fraud
detection
Data warehouse
Producers
Consumers
Database
change
Microservices
events
SaaS
data
Customer
experiences
Streams of real time events
Stream processing apps
Connectors
Connectors
Stream processing apps
13
Apache Kafka at Scale
at Tech Giants
> 4.5 trillion messages / day > 6 Petabytes / day
“You name it”
* Kafka Is not just used by tech giants
** Kafka is not just used for big data
14Business Value per Use Case
Business
Value
Improve
Customer
Experience
(CX)
Increase
Revenue
(make money)
Decrease
Costs
(save money)
Core Business
Platform
Increase
Operational
Efficiency
Migrate to
Cloud
Mitigate Risk
(protect money)
Key Drivers
Strategic Objectives
(sample)
Fraud
Detection
IoT sensor
ingestion
Digital
replatforming/
Mainframe Offload
Connected Car: Navigation & improved in-
car experience: Audi
Customer 360
Simplifying Omni-channel Retail at Scale:
Target
Faster transactional
processing / analysis
incl. Machine Learning / AI
Mainframe Offload: RBC
Microservices
Architecture
Online Fraud Detection
Online Security
(syslog, log aggregation,
Splunk replacement)
Middleware
replacement
Regulatory
Digital
Transformation
Application Modernization: Multiple
Examples
Website / Core
Operations
(Central Nervous System)
The [Silicon Valley] Digital Natives;
LinkedIn, Netflix, Uber, Yelp...
Predictive Maintenance: Audi
Streaming Platform in a regulated
environment (e.g. Electronic Medical
Records): Celmatix
Real-time app
updates
Real Time Streaming Platform for
Communications and Beyond: Capital One
Developer Velocity - Building Stateful
Financial Applications with Kafka Streams:
Funding Circle
Detect Fraud & Prevent Fraud in Real
Time: PayPal
Kafka as a Service - A Tale of Security and
Multi-Tenancy: Apple
Example Use Cases
$↑
$↓
$↔
Example Case Studies
(of many)
15
Apache Kafka’s
Open Ecosystem as Infrastructure for ML
16
Apache Kafka’s
Open Ecosystem as Infrastructure for ML
Kafka
Streams /
KSQL
Kafka
Connect
Rest Proxy
Schema Registry
Go/.NET /Python
Kafka Producer
KSQL
Kafka
Streams
17
Ingestion of
IoT Data
Replication
MirrorMaker /
Confluent Replicator
Kafka Connect
Analytics /
Machine
Learning
Cars
Cars
Cars
Cars
Cars
18
Data
Preprocessing
Preprocessing
Filter, transform, anonymize, extract features
Streams
Data Ready
For Model Training
19
SELECT car_id, event_id, car_model_id, sensor_input
FROM car_sensor c
LEFT JOIN car_models m ON c.car_model_id =
m.car_model_id
WHERE m.car_model_type ='Audi_A8';
Preprocessing
with KSQL
20
Data Ingestion into a Data Store for Model Training
(and Consumption by other Decoupled Applications)
Connect
Preprocessed
Data
Batch Near Real Time Real Time
21
Extreme scale using
TensorFlow and
TPUs in the cloud!
Analytic Model
Model Training
Using an Elastic
Infrastructure in
the Cloud
22
TensorFlow Model —
Autoencoder for Anomaly Detection
23
Direct streaming ingestion
for model training
with TensorFlow I/O + Kafka Plugin
(no additional data storage
like S3 or HDFS required!)
Time
Model BModel A
Producer
Distributed Commit Log
Streaming Ingestion and Model Training
with TensorFlow IO
https://github.com/tensorflow/io
24
Local Predictions
Model Training
in Cloud
Model Deployment
at the Edge
Analytic Model
Separation of
Model Training and Model Inference
25
Streams
Input Event
Prediction
Request
Response
Model Serving
TensorFlow Serving
gRPC / HTTP
Application
Stream Processing with External Model and RPC
26
Prediction
Stream Processing
Model
doPrediction()
return value
Stream Processing
with Embedded Model
Streams
Input Event
27
“CREATE STREAM AnomalyDetection AS
SELECT sensor_id, detectAnomaly(sensor_values)
FROM car_engine;“
User Defined Function (UDF)
Model Deployment with
Apache Kafka, KSQL
and TensorFlow
28
Streaming Analytics with
Kafka and TensorFlow
MQTT
Proxy
Elastic
Search
Grafana
Kafka
Cluster
Kafka
Connect
Car Sensors
Kafka Ecosystem
TensorFlow
Other Components
Kafka
Streams
Application
All
Data
Critical
Data
Ingest
Data
Potential
Detect
KSQL
TensorFlow
Train
Analytic Model
Consume
Data
Preprocess
Data
Analytic Model
Deploy Analytic
Model
29
Demo 100.000 Connected Cars
(Kafka + KSQL + MQTT + TensorFlow)
https://github.com/kaiwaehner/hivemq-mqtt-tensorflow-kafka-realtime-iot-machine-learning-training-inference
3030
Machine Learning + Apache Kafka
à Examples @ Github
https://github.com/kaiwaehner
31
Key Takeaways
Don’t underestimate the
Hidden Technical Debt
in Machine Learning
Systems
Leverage the Apache
Kafka Open Source
Ecosystem as scalable
and flexible Event
Streaming Platform
Use Streaming Machine
Learning with Kafka
and TensorFlow IO to
simplify your Big Data
Architecture
3232
11. November 2019
Steigenberger Frankfurter Hof
13. November 2019
NOVOTEL Zürich City West
Ben Stopford
Office of the CTO
Confluent
Axel Löhn
Senior Project Manager
Deutsche Bahn
Kai Waehner,
Technologist
Confluent
Ralph Debusmann
IoT Solution Architect
Bosch Power Tools
cnfl.io/cse19frankfurt cnfl.io/cse19zurich
33
Questions?
Feedback?
Let’s Connect!
Kai Waehner | Technology Evangelist
●contact@kai-waehner.de
●@KaiWaehner
●www.kai-waehner.de
●www.confluent.io
●LinkedIn

More Related Content

What's hot (20)

PDF
The Future of Data Science and Machine Learning at Scale: A Look at MLflow, D...
Databricks
 
PDF
Streaming architecture patterns
hadooparchbook
 
PDF
Introduction to Spark Streaming
datamantra
 
PPTX
PySpark dataframe
Jaemun Jung
 
PDF
Real-Life Use Cases & Architectures for Event Streaming with Apache Kafka
Kai Wähner
 
PDF
Data pipelines from zero to solid
Lars Albertsson
 
PDF
Introduction SQL Analytics on Lakehouse Architecture
Databricks
 
PDF
Fundamentals of Apache Kafka
Chhavi Parasher
 
PDF
Architecture patterns for distributed, hybrid, edge and global Apache Kafka d...
Kai Wähner
 
PDF
Introduction to Kafka Streams
Guozhang Wang
 
PDF
Best Practices for Streaming IoT Data with MQTT and Apache Kafka®
confluent
 
PDF
Nifi workshop
Yifeng Jiang
 
PDF
Apache Spark Introduction
sudhakara st
 
PDF
Intro to HBase
alexbaranau
 
PDF
Real time stock processing with apache nifi, apache flink and apache kafka
Timothy Spann
 
PDF
Introduction to Apache Flink - Fast and reliable big data processing
Till Rohrmann
 
PDF
Engineering Fast Indexes for Big-Data Applications: Spark Summit East talk by...
Spark Summit
 
PDF
Hello, kafka! (an introduction to apache kafka)
Timothy Spann
 
PPTX
Kafka presentation
Mohammed Fazuluddin
 
PDF
Best Practices for Streaming IoT Data with MQTT and Apache Kafka
Kai Wähner
 
The Future of Data Science and Machine Learning at Scale: A Look at MLflow, D...
Databricks
 
Streaming architecture patterns
hadooparchbook
 
Introduction to Spark Streaming
datamantra
 
PySpark dataframe
Jaemun Jung
 
Real-Life Use Cases & Architectures for Event Streaming with Apache Kafka
Kai Wähner
 
Data pipelines from zero to solid
Lars Albertsson
 
Introduction SQL Analytics on Lakehouse Architecture
Databricks
 
Fundamentals of Apache Kafka
Chhavi Parasher
 
Architecture patterns for distributed, hybrid, edge and global Apache Kafka d...
Kai Wähner
 
Introduction to Kafka Streams
Guozhang Wang
 
Best Practices for Streaming IoT Data with MQTT and Apache Kafka®
confluent
 
Nifi workshop
Yifeng Jiang
 
Apache Spark Introduction
sudhakara st
 
Intro to HBase
alexbaranau
 
Real time stock processing with apache nifi, apache flink and apache kafka
Timothy Spann
 
Introduction to Apache Flink - Fast and reliable big data processing
Till Rohrmann
 
Engineering Fast Indexes for Big-Data Applications: Spark Summit East talk by...
Spark Summit
 
Hello, kafka! (an introduction to apache kafka)
Timothy Spann
 
Kafka presentation
Mohammed Fazuluddin
 
Best Practices for Streaming IoT Data with MQTT and Apache Kafka
Kai Wähner
 

Similar to Simplified Machine Learning Architecture with an Event Streaming Platform (Apache Kafka + TensorFlow I/O) (20)

PDF
2019 04 seattle_meetup___kafka_machine_learning___kai_waehner
Nitin Kumar
 
PDF
Unleashing Apache Kafka and TensorFlow in the Cloud

Kai Wähner
 
PDF
IoT Sensor Analytics with Kafka, ksqlDB and TensorFlow
Kai Wähner
 
PDF
Streaming Machine Learning with Python, Jupyter, TensorFlow, Apache Kafka and...
Kai Wähner
 
PDF
Unleashing Apache Kafka and TensorFlow in Hybrid Cloud Architectures
Kai Wähner
 
PDF
Kai Waehner - Deep Learning at Extreme Scale in the Cloud with Apache Kafka a...
Codemotion
 
PDF
Apache Kafka, Tiered Storage and TensorFlow for Streaming Machine Learning wi...
confluent
 
PDF
Apache Kafka, Tiered Storage and TensorFlow for Streaming Machine Learning wi...
Kai Wähner
 
PDF
Apache Kafka Streams + Machine Learning / Deep Learning
Kai Wähner
 
PDF
Machine Learning and Deep Learning Applied to Real Time with Apache Kafka Str...
confluent
 
PDF
Event-Driven Stream Processing and Model Deployment with Apache Kafka, Kafka ...
Kai Wähner
 
PDF
Event-Driven Model Serving: Stream Processing vs. RPC with Kafka and TensorFl...
confluent
 
PDF
Fast Data – Fast Cars: Wie Apache Kafka die Datenwelt revolutioniert
confluent
 
PDF
IoT Sensor Analytics with Python, Jupyter, TensorFlow, Keras, Apache Kafka, K...
Kai Wähner
 
PDF
Real-time processing of large amounts of data
confluent
 
PPTX
Apache Kafka® + Machine Learning for Supply Chain 
confluent
 
PPTX
IIoT with Kafka and Machine Learning for Supply Chain Optimization In Real Ti...
Kai Wähner
 
PDF
Apache kafka event_streaming___kai_waehner
confluent
 
PDF
The Rise Of Event Streaming – Why Apache Kafka Changes Everything
Kai Wähner
 
PDF
Apache Kafka Open Source Ecosystem for Machine Learning at Extreme Scale (Apa...
Kai Wähner
 
2019 04 seattle_meetup___kafka_machine_learning___kai_waehner
Nitin Kumar
 
Unleashing Apache Kafka and TensorFlow in the Cloud

Kai Wähner
 
IoT Sensor Analytics with Kafka, ksqlDB and TensorFlow
Kai Wähner
 
Streaming Machine Learning with Python, Jupyter, TensorFlow, Apache Kafka and...
Kai Wähner
 
Unleashing Apache Kafka and TensorFlow in Hybrid Cloud Architectures
Kai Wähner
 
Kai Waehner - Deep Learning at Extreme Scale in the Cloud with Apache Kafka a...
Codemotion
 
Apache Kafka, Tiered Storage and TensorFlow for Streaming Machine Learning wi...
confluent
 
Apache Kafka, Tiered Storage and TensorFlow for Streaming Machine Learning wi...
Kai Wähner
 
Apache Kafka Streams + Machine Learning / Deep Learning
Kai Wähner
 
Machine Learning and Deep Learning Applied to Real Time with Apache Kafka Str...
confluent
 
Event-Driven Stream Processing and Model Deployment with Apache Kafka, Kafka ...
Kai Wähner
 
Event-Driven Model Serving: Stream Processing vs. RPC with Kafka and TensorFl...
confluent
 
Fast Data – Fast Cars: Wie Apache Kafka die Datenwelt revolutioniert
confluent
 
IoT Sensor Analytics with Python, Jupyter, TensorFlow, Keras, Apache Kafka, K...
Kai Wähner
 
Real-time processing of large amounts of data
confluent
 
Apache Kafka® + Machine Learning for Supply Chain 
confluent
 
IIoT with Kafka and Machine Learning for Supply Chain Optimization In Real Ti...
Kai Wähner
 
Apache kafka event_streaming___kai_waehner
confluent
 
The Rise Of Event Streaming – Why Apache Kafka Changes Everything
Kai Wähner
 
Apache Kafka Open Source Ecosystem for Machine Learning at Extreme Scale (Apa...
Kai Wähner
 
Ad

More from Kai Wähner (20)

PDF
Apache Kafka as Data Hub for Crypto, NFT, Metaverse (Beyond the Buzz!)
Kai Wähner
 
PDF
When NOT to use Apache Kafka?
Kai Wähner
 
PDF
Kafka for Live Commerce to Transform the Retail and Shopping Metaverse
Kai Wähner
 
PDF
The Heart of the Data Mesh Beats in Real-Time with Apache Kafka
Kai Wähner
 
PDF
Apache Kafka vs. Cloud-native iPaaS Integration Platform Middleware
Kai Wähner
 
PDF
Data Warehouse vs. Data Lake vs. Data Streaming – Friends, Enemies, Frenemies?
Kai Wähner
 
PDF
Serverless Kafka and Spark in a Multi-Cloud Lakehouse Architecture
Kai Wähner
 
PDF
Resilient Real-time Data Streaming across the Edge and Hybrid Cloud with Apac...
Kai Wähner
 
PDF
Data Streaming with Apache Kafka in the Defence and Cybersecurity Industry
Kai Wähner
 
PDF
Apache Kafka in the Healthcare Industry
Kai Wähner
 
PDF
Apache Kafka in the Healthcare Industry
Kai Wähner
 
PDF
Apache Kafka for Real-time Supply Chain in the Food and Retail Industry
Kai Wähner
 
PDF
Kafka for Real-Time Replication between Edge and Hybrid Cloud
Kai Wähner
 
PDF
Apache Kafka for Predictive Maintenance in Industrial IoT / Industry 4.0
Kai Wähner
 
PDF
Apache Kafka Landscape for Automotive and Manufacturing
Kai Wähner
 
PDF
Kappa vs Lambda Architectures and Technology Comparison
Kai Wähner
 
PPTX
The Top 5 Apache Kafka Use Cases and Architectures in 2022
Kai Wähner
 
PDF
Event Streaming CTO Roundtable for Cloud-native Kafka Architectures
Kai Wähner
 
PDF
Apache Kafka in the Public Sector (Government, National Security, Citizen Ser...
Kai Wähner
 
PDF
Telco 4.0 - Payment and FinServ Integration for Data in Motion with 5G and Ap...
Kai Wähner
 
Apache Kafka as Data Hub for Crypto, NFT, Metaverse (Beyond the Buzz!)
Kai Wähner
 
When NOT to use Apache Kafka?
Kai Wähner
 
Kafka for Live Commerce to Transform the Retail and Shopping Metaverse
Kai Wähner
 
The Heart of the Data Mesh Beats in Real-Time with Apache Kafka
Kai Wähner
 
Apache Kafka vs. Cloud-native iPaaS Integration Platform Middleware
Kai Wähner
 
Data Warehouse vs. Data Lake vs. Data Streaming – Friends, Enemies, Frenemies?
Kai Wähner
 
Serverless Kafka and Spark in a Multi-Cloud Lakehouse Architecture
Kai Wähner
 
Resilient Real-time Data Streaming across the Edge and Hybrid Cloud with Apac...
Kai Wähner
 
Data Streaming with Apache Kafka in the Defence and Cybersecurity Industry
Kai Wähner
 
Apache Kafka in the Healthcare Industry
Kai Wähner
 
Apache Kafka in the Healthcare Industry
Kai Wähner
 
Apache Kafka for Real-time Supply Chain in the Food and Retail Industry
Kai Wähner
 
Kafka for Real-Time Replication between Edge and Hybrid Cloud
Kai Wähner
 
Apache Kafka for Predictive Maintenance in Industrial IoT / Industry 4.0
Kai Wähner
 
Apache Kafka Landscape for Automotive and Manufacturing
Kai Wähner
 
Kappa vs Lambda Architectures and Technology Comparison
Kai Wähner
 
The Top 5 Apache Kafka Use Cases and Architectures in 2022
Kai Wähner
 
Event Streaming CTO Roundtable for Cloud-native Kafka Architectures
Kai Wähner
 
Apache Kafka in the Public Sector (Government, National Security, Citizen Ser...
Kai Wähner
 
Telco 4.0 - Payment and FinServ Integration for Data in Motion with 5G and Ap...
Kai Wähner
 
Ad

Recently uploaded (20)

DOCX
Best AI-Powered Wearable Tech for Remote Health Monitoring in 2025
SEOLIFT - SEO Company London
 
PDF
CodeCleaner: Mitigating Data Contamination for LLM Benchmarking
arabelatso
 
PDF
Why Edge Computing Matters in Mobile Application Tech.pdf
IMG Global Infotech
 
PDF
Building scalbale cloud native apps with .NET 8
GillesMathieu10
 
PPTX
CV-Project_2024 version 01222222222.pptx
MohammadSiddiqui70
 
PDF
What Is an Internal Quality Audit and Why It Matters for Your QMS
BizPortals365
 
PPTX
For my supp to finally picking supp that work
necas19388
 
PDF
Rewards and Recognition (2).pdf
ethan Talor
 
PDF
CodeCleaner: Mitigating Data Contamination for LLM Benchmarking
arabelatso
 
PDF
Best Software Development at Best Prices
softechies7
 
PPTX
Threat Modeling a Batch Job Framework - Teri Radichel - AWS re:Inforce 2025
2nd Sight Lab
 
PPTX
Iobit Driver Booster Pro 12 Crack Free Download
chaudhryakashoo065
 
PDF
Humans vs AI Call Agents - Qcall.ai's Special Report
Udit Goenka
 
PPTX
IObit Driver Booster Pro Crack Download Latest Version
chaudhryakashoo065
 
PDF
capitulando la keynote de GrafanaCON 2025 - Madrid
Imma Valls Bernaus
 
PPTX
Agentforce – TDX 2025 Hackathon Achievement
GetOnCRM Solutions
 
PPTX
IDM Crack with Internet Download Manager 6.42 [Latest 2025]
HyperPc soft
 
PDF
Mastering VPC Architecture Build for Scale from Day 1.pdf
Devseccops.ai
 
PPTX
ERP Systems in the UAE: Driving Business Transformation with Smart Solutions
dheeodoo
 
PDF
TEASMA: A Practical Methodology for Test Adequacy Assessment of Deep Neural N...
Lionel Briand
 
Best AI-Powered Wearable Tech for Remote Health Monitoring in 2025
SEOLIFT - SEO Company London
 
CodeCleaner: Mitigating Data Contamination for LLM Benchmarking
arabelatso
 
Why Edge Computing Matters in Mobile Application Tech.pdf
IMG Global Infotech
 
Building scalbale cloud native apps with .NET 8
GillesMathieu10
 
CV-Project_2024 version 01222222222.pptx
MohammadSiddiqui70
 
What Is an Internal Quality Audit and Why It Matters for Your QMS
BizPortals365
 
For my supp to finally picking supp that work
necas19388
 
Rewards and Recognition (2).pdf
ethan Talor
 
CodeCleaner: Mitigating Data Contamination for LLM Benchmarking
arabelatso
 
Best Software Development at Best Prices
softechies7
 
Threat Modeling a Batch Job Framework - Teri Radichel - AWS re:Inforce 2025
2nd Sight Lab
 
Iobit Driver Booster Pro 12 Crack Free Download
chaudhryakashoo065
 
Humans vs AI Call Agents - Qcall.ai's Special Report
Udit Goenka
 
IObit Driver Booster Pro Crack Download Latest Version
chaudhryakashoo065
 
capitulando la keynote de GrafanaCON 2025 - Madrid
Imma Valls Bernaus
 
Agentforce – TDX 2025 Hackathon Achievement
GetOnCRM Solutions
 
IDM Crack with Internet Download Manager 6.42 [Latest 2025]
HyperPc soft
 
Mastering VPC Architecture Build for Scale from Day 1.pdf
Devseccops.ai
 
ERP Systems in the UAE: Driving Business Transformation with Smart Solutions
dheeodoo
 
TEASMA: A Practical Methodology for Test Adequacy Assessment of Deep Neural N...
Lionel Briand
 

Simplified Machine Learning Architecture with an Event Streaming Platform (Apache Kafka + TensorFlow I/O)

  • 1. 1 Simplified Machine Learning Architecture with an Event Streaming Platform Kai Waehner | Technology Evangelist, Confluent [email protected] | LinkedIn | @KaiWaehner | www.confluent.io | www.kai-waehner.de
  • 2. 2 Machine Learning to Improve Traditional and to Build New Use Cases Seconds Minutes Hours Windows of Opportunity Real Time Tracking Predictive Maintenance Fraud Detection Cross Selling Transportation Rerouting Customer Service Inventory Management Autonomous Driving Face Recognition Robotics Speech Translation Video Generation Supply Chain Optimization Strategic Planning
  • 3. 3 Global Automotive Company Builds Connected Car Infrastructure Digital Transformation • Improve customer experience • Increase revenue • Reduce risk Time Today 2 years in the future3 years ago Project begins Connected car infrastructure in production for first use cases Improved processes leveraging machine learning (predictive maintenance, cross-selling)
  • 4. 4 Streaming Analytics for Predictive Maintenance at Scale IoT Integration Layer Batch Analytics Platform BI Dashboard Streaming Platform Big Data Integration Layer Car Sensors Streaming Platform Other Components Real Time Monitoring System All Data Critical Data Ingest Data Human Intelligence
  • 5. 5 Machine Learning (ML) ...allows computers to find hidden insights without being explicitly programmed where to look. Machine Learning • Decision Trees • Naïve Bayes • Clustering • Neural Networks • Etc. Deep Learning • CNN • RNN • Transformer • Autoencoder • Etc.
  • 6. 6 Streaming Analytics for Predictive Maintenance at Scale IoT Integration Layer Batch Analytics Platform BI Dashboard Streaming Platform Big Data Integration Layer Car Sensors Streaming Platform Analytics Platform Other Components Real Time Monitoring System All Data Critical Data Ingest Data Potential Detect Data Processing Analytics Platform Train Analytic Model Consume Data Preprocess Data Analytic Model Deploy Analytic Model
  • 7. 7 The First Analytic Models How to deploy the models in production? …real-time processing? …at scale? …24/7 zero uptime?
  • 8. 8 Hidden Technical Debt in Machine Learning Systems https://papers.nips.cc/paper/5656-hidden-technical-debt-in-machine-learning-systems.pdf
  • 10. 10 Event Streaming Platform – The Commit Log Time P C1 C2 C3
  • 11. 11 Event Streaming Platform – A Distributed System Broker 1 Topic1 partition1 Broker 2 Broker 3 Broker 4 Topic1 partition1 Topic1 partition1 Leader Follower Topic1 partition2 Topic1 partition2 Topic1 partition2 Topic1 partition3 Topic1 partition4 Topic1 partition3 Topic1 partition3 Topic1 partition4 Topic1 partition4
  • 12. 12 A Streaming Platform is the Underpinning of an Event-driven Architecture Microservices DBs SaaS apps Mobile Customer 360 Real-time fraud detection Data warehouse Producers Consumers Database change Microservices events SaaS data Customer experiences Streams of real time events Stream processing apps Connectors Connectors Stream processing apps
  • 13. 13 Apache Kafka at Scale at Tech Giants > 4.5 trillion messages / day > 6 Petabytes / day “You name it” * Kafka Is not just used by tech giants ** Kafka is not just used for big data
  • 14. 14Business Value per Use Case Business Value Improve Customer Experience (CX) Increase Revenue (make money) Decrease Costs (save money) Core Business Platform Increase Operational Efficiency Migrate to Cloud Mitigate Risk (protect money) Key Drivers Strategic Objectives (sample) Fraud Detection IoT sensor ingestion Digital replatforming/ Mainframe Offload Connected Car: Navigation & improved in- car experience: Audi Customer 360 Simplifying Omni-channel Retail at Scale: Target Faster transactional processing / analysis incl. Machine Learning / AI Mainframe Offload: RBC Microservices Architecture Online Fraud Detection Online Security (syslog, log aggregation, Splunk replacement) Middleware replacement Regulatory Digital Transformation Application Modernization: Multiple Examples Website / Core Operations (Central Nervous System) The [Silicon Valley] Digital Natives; LinkedIn, Netflix, Uber, Yelp... Predictive Maintenance: Audi Streaming Platform in a regulated environment (e.g. Electronic Medical Records): Celmatix Real-time app updates Real Time Streaming Platform for Communications and Beyond: Capital One Developer Velocity - Building Stateful Financial Applications with Kafka Streams: Funding Circle Detect Fraud & Prevent Fraud in Real Time: PayPal Kafka as a Service - A Tale of Security and Multi-Tenancy: Apple Example Use Cases $↑ $↓ $↔ Example Case Studies (of many)
  • 15. 15 Apache Kafka’s Open Ecosystem as Infrastructure for ML
  • 16. 16 Apache Kafka’s Open Ecosystem as Infrastructure for ML Kafka Streams / KSQL Kafka Connect Rest Proxy Schema Registry Go/.NET /Python Kafka Producer KSQL Kafka Streams
  • 17. 17 Ingestion of IoT Data Replication MirrorMaker / Confluent Replicator Kafka Connect Analytics / Machine Learning Cars Cars Cars Cars Cars
  • 18. 18 Data Preprocessing Preprocessing Filter, transform, anonymize, extract features Streams Data Ready For Model Training
  • 19. 19 SELECT car_id, event_id, car_model_id, sensor_input FROM car_sensor c LEFT JOIN car_models m ON c.car_model_id = m.car_model_id WHERE m.car_model_type ='Audi_A8'; Preprocessing with KSQL
  • 20. 20 Data Ingestion into a Data Store for Model Training (and Consumption by other Decoupled Applications) Connect Preprocessed Data Batch Near Real Time Real Time
  • 21. 21 Extreme scale using TensorFlow and TPUs in the cloud! Analytic Model Model Training Using an Elastic Infrastructure in the Cloud
  • 22. 22 TensorFlow Model — Autoencoder for Anomaly Detection
  • 23. 23 Direct streaming ingestion for model training with TensorFlow I/O + Kafka Plugin (no additional data storage like S3 or HDFS required!) Time Model BModel A Producer Distributed Commit Log Streaming Ingestion and Model Training with TensorFlow IO https://github.com/tensorflow/io
  • 24. 24 Local Predictions Model Training in Cloud Model Deployment at the Edge Analytic Model Separation of Model Training and Model Inference
  • 25. 25 Streams Input Event Prediction Request Response Model Serving TensorFlow Serving gRPC / HTTP Application Stream Processing with External Model and RPC
  • 26. 26 Prediction Stream Processing Model doPrediction() return value Stream Processing with Embedded Model Streams Input Event
  • 27. 27 “CREATE STREAM AnomalyDetection AS SELECT sensor_id, detectAnomaly(sensor_values) FROM car_engine;“ User Defined Function (UDF) Model Deployment with Apache Kafka, KSQL and TensorFlow
  • 28. 28 Streaming Analytics with Kafka and TensorFlow MQTT Proxy Elastic Search Grafana Kafka Cluster Kafka Connect Car Sensors Kafka Ecosystem TensorFlow Other Components Kafka Streams Application All Data Critical Data Ingest Data Potential Detect KSQL TensorFlow Train Analytic Model Consume Data Preprocess Data Analytic Model Deploy Analytic Model
  • 29. 29 Demo 100.000 Connected Cars (Kafka + KSQL + MQTT + TensorFlow) https://github.com/kaiwaehner/hivemq-mqtt-tensorflow-kafka-realtime-iot-machine-learning-training-inference
  • 30. 3030 Machine Learning + Apache Kafka à Examples @ Github https://github.com/kaiwaehner
  • 31. 31 Key Takeaways Don’t underestimate the Hidden Technical Debt in Machine Learning Systems Leverage the Apache Kafka Open Source Ecosystem as scalable and flexible Event Streaming Platform Use Streaming Machine Learning with Kafka and TensorFlow IO to simplify your Big Data Architecture
  • 32. 3232 11. November 2019 Steigenberger Frankfurter Hof 13. November 2019 NOVOTEL Zürich City West Ben Stopford Office of the CTO Confluent Axel Löhn Senior Project Manager Deutsche Bahn Kai Waehner, Technologist Confluent Ralph Debusmann IoT Solution Architect Bosch Power Tools cnfl.io/cse19frankfurt cnfl.io/cse19zurich
  • 33. 33 Questions? Feedback? Let’s Connect! Kai Waehner | Technology Evangelist ●[email protected] ●@KaiWaehner ●www.kai-waehner.de ●www.confluent.io ●LinkedIn