1
User behavior analysis with
Session Windows and Apache
Kafka’s Streams API
Michael G. Noll
Product Manager
2
Attend the whole series!
Simplify Governance for Streaming Data in Apache Kafka
Date: Thursday, April 6, 2017
Time: 9:30 am - 10:00 am PT | 12:30 pm - 1:00 pm ET
Speaker: Gwen Shapira, Product Manager, Confluent
Using Apache Kafka to Analyze Session Windows
Date: Thursday, March 30, 2017
Time: 9:30 am - 10:00 am PT | 12:30 pm - 1:00 pm ET
Speaker: Michael Noll, Product Manager, Confluent
Monitoring and Alerting Apache Kafka with Confluent Control
Center
Date: Thursday, March 16, 2017
Time: 9:30 am - 10:00 am PT | 12:30 pm - 1:00 pm ET
Speaker: Nick Dearden, Director, Engineering and Product
Data Pipelines Made Simple with Apache Kafka
Date: Thursday, March 23, 2017
Time: 9:30 am - 10:00 am PT | 12:30 pm - 1:00 pm ET
Speaker: Ewen Cheslack-Postava, Engineer, Confluent
https://www.confluent.io/online-talk/online-talk-series-five-steps-to-production-with-apache-kafka/
What’s New in Apache Kafka 0.10.2 and Confluent 3.2
Date: Thursday, March 9, 2017
Time: 9:30 am - 10:00 am PT | 12:30 pm - 1:00 pm ET
Speaker: Clarke Patterson, Senior Director, Product Marketing
3
Kafka Streams API: to build real-time apps that power your core business
Key benefits
• Makes your Java apps highly scalable,
elastic, fault-tolerant, stateful,
distributed
• No additional cluster
• Easy to run as a service
• Supports large aggregations and joins
• Security and permissions fully
integrated from Kafka
Example Use Cases
• Microservices
• Reactive applications
• Continuous queries
• Continuous transformations
• Event-triggered processes
Streams
API
App Instance 1
Kafka
Cluster
Streams
API
App Instance N
Your
App ...
4
Use case examples
Industry Use case examples
Travel Build applications with the Kafka Streams API to make real-time decisions to find
best suitable pricing for individual customers, to cross-sell additional services,
and to process bookings and reservations
Finance Build applications to aggregate data sources for real-time views of potential
exposures and for detecting and minimizing fraudulent transactions
Logistics Build applications to track shipments fast, reliably, and in real-time
Retail Build applications to decide in real-time on next best offers, personalized
promotions, pricing, and inventory management
Automotive,
Manufacturing
Build applications to ensure their production lines perform optimally, to gain real-
time insights into supply chains, and to monitor telemetry data from connected
cars to decide if an inspection is needed
And many more …
5
Some public use cases in the wild
• Why Kafka Streams: towards a real-time streaming architecture, by Sky Betting and Gaming
• http://engineering.skybettingandgaming.com/2017/01/23/streaming-architectures/
• Applying Kafka’s Streams API for social messaging at LINE Corp.
• http://developers.linecorp.com/blog/?p=3960
• Production pipeline at LINE, a social platform based in Japan with 220+ million users
• Microservices and Reactive Applications at Capital One
• https://speakerdeck.com/bobbycalderwood/commander-decoupled-immutable-rest-apis-with-kafka-streams
• Containerized Kafka Streams applications in Scala, by Hive Streaming
• https://www.madewithtea.com/processing-tweets-with-kafka-streams.html
• Geo-spatial data analysis
• http://www.infolace.com/blog/2016/07/14/simple-spatial-windowing-with-kafka-streams/
• Language classification with machine learning
• https://dzone.com/articles/machine-learning-with-kafka-streams
6
Kafka Summit NYC, May 09
Here, the community will share
latest Kafka Streams use cases.
http://kafka-summit.org/
7
Agenda
• Why are session windows so important?
• Recap: What is windowing?
• Session windows – example use case
• Session windows – how they work
• Session windows – API
8
Why are session windows so important?
• We want to analyze user behavior, which is a very common use case area
• To analyze user behavior on newspapers, social platforms, video sharing sites, booking sites, etc.
• AND tailor the analysis to the individual user
• Specifically, analyses of the type “how many X in one go?” – how many movies watched in one go?
• Achieved through a per-user sessionization step on the input data.
• AND this tailoring must be convenient and scalable
• Achieved through automating the sessionization step, i.e. auto-discovery of sessions
• Session-based analyses can range from simple metrics (e.g. count of user visits on a news
website or social platform) to more complex metrics (e.g. customer conversion funnel and event
flows).
9
What is windowing?
• Aggregations such as “counting things” are key-based operations
• Before you can aggregate your input data, it must first be grouped by key
event-time8 AM7 AM6 AM event-time
Alice
Bob
Dave
8 AM7 AM6 AM
10
What is windowing?
• Aggregations such as “counting things” are key-based operations
Alice: 10 movies
Bob: 11 movies
Dave: 8 movies
“Let me COUNT how many movies each user has watched (IN TOTAL)”
event-time
Alice
Bob
Dave
Feb 7Feb 6Feb 5
11
What is windowing?
• Windowing allows you to further “sub-group” the input data for each user
event-time
Alice
Bob
Dave
“Let me COUNT how many movies each user has watched PER DAY”
Alice: 4 movies
Bob: 3 movies
Dave: 2 movies
Feb 5
Feb 7Feb 6Feb 5
12
What is windowing?
• Windowing allows you to further “sub-group” the input data for each user
event-time
Alice
Bob
Dave
Alice: 1 movie
Bob: 2 movies
Dave: 4 movies
Feb 6
Feb 7Feb 6Feb 5
“Let me COUNT how many movies each user has watched PER DAY”
13
What is windowing?
• Windowing allows you to further “sub-group” the input data for each user
event-time
Alice
Bob
Dave
Alice: 4 movies
Bob: 4 movies
Dave: 1 movie
Feb 7
Feb 7Feb 6Feb 5
“Let me COUNT how many movies each user has watched PER DAY”
14
Session windows: use case
• Session windows allow for “how many X in one go?” analyses, tailored to each key
• Sessions are auto-discovered from the input data (we see how later)
event-time
Alice
Bob
Dave
Alice: 1, 4, 1, 4 movies
(4 sessions)
Bob: 4, 6 movies
(2 sessions)
Dave: 3, 5 movies
(2 sessions)
Feb 7Feb 6Feb 5
“Let me COUNT how many movies each user has watched PER SESSION”
15
Comparing results
• Let’s compare how results differ
Alice
Bob
Dave
IN TOTAL
10
11
8
PER DAY
3.0 (avg)
3.0 (avg)
2.3 (avg)
time windows
PER SESSION
2.5 (avg)
5.0 (avg)
4.0 (avg)
session windowsno windows
16
Comparing results
• Let’s compare how results differ if we our task was to rank the top users
Alice
Bob
Dave
IN TOTAL
#2
#1
#3
PER DAY
#1
#1
#3
time windows
PER SESSION
#3
#1
#2
session windowsno windows
17Confidential
Session windows: how they work
18
Session windows: how they work
• Definition of a session in Kafka Streams API is based on a configurable period of inactivity
• Example: “If Alice hasn’t watched another movie in the past 3 hours, then next movie = new
session!”
Inactivity period
19
Auto-discovering sessions, per user
event-time
Alice
Bob
Dave
… …
… …
… …
20
Auto-discovering sessions, per user
event-time
Alice
Bob
Dave
… …
… …
… …
Example: How many movies does Alice watch on average per session?”
Inactivity period (e.g. 3 hours)
21
Auto-discovering sessions, per user
event-time
Alice
Bob
Dave
… …
… …
… …
Example: How many movies does Alice watch on average per session?”
22
Late-arriving data is handled transparently
• Handling of late-arriving data is important because, in practice, a lot of data arrives late
23
Late-arriving data: example
Users with mobile phones enter
airplane, lose Internet connectivity
Emails are being written
during the 8h flight
Internet connectivity is restored,
phones will send queued emails now,
though with an 8h delay
Bob  writes  Alice  an  
email  at  2  P.M.
Bob’s  email  is  finally  
being  sent  at  10  P.M.
24
Late-arriving data is handled transparently
• Handling of late-arriving data is important because, in practice, a lot of data arrives late
• Good news: late-arriving data is handled transparently and efficiently for you
• Also, in your applications, you can define a grace period after which late-arriving data will be
discarded (default: 1 day), and you can define this granularly per windowed operation
• Example: “I want to sessionize the input data based on 15-min inactivity periods, and late-arriving
data should be discarded if it is more than 12 hours late”
25
Late-arriving data is handled transparently
event-time
Alice
Bob
Dave
… …
… …
… …
• Late-arriving data may (1) create new sessions or (2) merge existing sessions
26
Sessions potentially merge as new events arrive
Session Window
27
Sessions potentially merge as new events arrive
Session Window
28
Late-arriving data is handled transparently
event-time
Alice
Bob
Dave
… …
… …
… …
29
Late-arriving data is handled transparently
event-time
Alice
Bob
Dave
… …
… …
… …
30Confidential
Session windows: API
31Confidential
Session windows: API in Confluent 3.2 / Apache Kafka 0.10.2
//  A  session  window  with  an  inactivity  gap  of  3h;  discard  data  that  is  12h late
SessionWindows.with(TimeUnit.HOURS.toMillis(3)).until(TimeUnit.HOURS.toMillis(12));
Defining a session window
//  Key  (String)  is  user,  value  (Avro  record)  is  the  movie  view  event  for  that  user.
KStream<String,  GenericRecord>  movieViews =  ...;
//  Count  movie  views  per  session,  per  user
KTable<Windowed<String>,  Long>  sessionizedMovieCounts =
movieViews
.groupByKey(Serdes.String(),  genericAvroSerde)        
.count(SessionWindows.with(TimeUnit.HOURS.toMillis(3)),  "views-­‐per-­‐session");
Full example: aggregating with session windows
More details with documentation and examples at:
http://docs.confluent.io/current/streams/developer-guide.html#session-windows
https://github.com/confluentinc/examples
32Confidential
Attend the whole series!
Simplify Governance for Streaming Data in Apache Kafka
Date: Thursday, April 6, 2017
Time: 9:30 am - 10:00 am PT | 12:30 pm - 1:00 pm ET
Speaker: Gwen Shapira, Product Manager, Confluent
Using Apache Kafka to Analyze Session Windows
Date: Thursday, March 30, 2017
Time: 9:30 am - 10:00 am PT | 12:30 pm - 1:00 pm ET
Speaker: Michael Noll, Product Manager, Confluent
Monitoring and Alerting Apache Kafka with Confluent Control
Center
Date: Thursday, March 16, 2017
Time: 9:30 am - 10:00 am PT | 12:30 pm - 1:00 pm ET
Speaker: Nick Dearden, Director, Engineering and Product
Data Pipelines Made Simple with Apache Kafka
Date: Thursday, March 23, 2017
Time: 9:30 am - 10:00 am PT | 12:30 pm - 1:00 pm ET
Speaker: Ewen Cheslack-Postava, Engineer, Confluent
https://www.confluent.io/online-talk/online-talk-series-five-steps-to-production-with-apache-kafka/
What’s New in Apache Kafka 0.10.2 and Confluent 3.2
Date: Thursday, March 9, 2017
Time: 9:30 am - 10:00 am PT | 12:30 pm - 1:00 pm ET
Speaker: Clarke Patterson, Senior Director, Product Marketing
UP
NEXT
33
Why Confluent? More than just enterprise software
Confluent Platform
The only enterprise open
source streaming platform
based entirely on Apache
Kafka
Professional Services
Best practice consultation for
future Kafka deployments and
optimize for performance and
scalability of existing ones
Enterprise Support
24x7 support for the entire
Apache Kafka project, not just
a portion of it
Complete support across the entire adoption lifecycle
Kafka Training
Comprehensive hands-on
courses for developers and
operators from the Apache
Kafka experts
34
Get Started with Apache Kafka Today!
https://www.confluent.io/downloads/
THE place to start with Apache Kafka!
Thoroughly tested and quality
assured
More extensible developer
experience
Easy upgrade path to
Confluent Enterprise
35
Discount code: kafcom17
  Use the Apache Kafka community discount code to get $50 off
  www.kafka-summit.org
Kafka Summit New York: May 8
Kafka Summit San Francisco: August 28
Presented by

More Related Content

PDF
A Thorough Comparison of Delta Lake, Iceberg and Hudi
PDF
Kafka Streams State Stores Being Persistent
PPTX
Extending Flink SQL for stream processing use cases
PDF
Introduction to Apache Flink - Fast and reliable big data processing
PDF
Building a fully managed stream processing platform on Flink at scale for Lin...
PPTX
Evening out the uneven: dealing with skew in Flink
PDF
ksqlDB - Stream Processing simplified!
PDF
Kafka streams windowing behind the curtain
A Thorough Comparison of Delta Lake, Iceberg and Hudi
Kafka Streams State Stores Being Persistent
Extending Flink SQL for stream processing use cases
Introduction to Apache Flink - Fast and reliable big data processing
Building a fully managed stream processing platform on Flink at scale for Lin...
Evening out the uneven: dealing with skew in Flink
ksqlDB - Stream Processing simplified!
Kafka streams windowing behind the curtain

What's hot (20)

PPTX
Real-time Stream Processing with Apache Flink
PDF
Inside MongoDB: the Internals of an Open-Source Database
PPTX
Rocks db state store in structured streaming
PDF
MyRocks Deep Dive
PDF
Changelog Stream Processing with Apache Flink
PPTX
The Current State of Table API in 2022
PDF
Tame the small files problem and optimize data layout for streaming ingestion...
PDF
Introducing the Apache Flink Kubernetes Operator
PDF
Confluent REST Proxy and Schema Registry (Concepts, Architecture, Features)
PDF
Producer Performance Tuning for Apache Kafka
PDF
Apache Kafka Streams + Machine Learning / Deep Learning
PPTX
“Alexa, be quiet!”: End-to-end near-real time model building and evaluation i...
PDF
Performance Tuning RocksDB for Kafka Streams' State Stores (Dhruba Borthakur,...
PPTX
Hadoop Meetup Jan 2019 - HDFS Scalability and Consistent Reads from Standby Node
PDF
ksqlDB: Building Consciousness on Real Time Events
PPTX
Using the New Apache Flink Kubernetes Operator in a Production Deployment
PDF
Delta Lake Streaming: Under the Hood
PDF
Tech Talk: RocksDB Slides by Dhruba Borthakur & Haobo Xu of Facebook
PDF
Arbitrary Stateful Aggregations using Structured Streaming in Apache Spark
PDF
Confluent Workshop Series: ksqlDB로 스트리밍 앱 빌드
Real-time Stream Processing with Apache Flink
Inside MongoDB: the Internals of an Open-Source Database
Rocks db state store in structured streaming
MyRocks Deep Dive
Changelog Stream Processing with Apache Flink
The Current State of Table API in 2022
Tame the small files problem and optimize data layout for streaming ingestion...
Introducing the Apache Flink Kubernetes Operator
Confluent REST Proxy and Schema Registry (Concepts, Architecture, Features)
Producer Performance Tuning for Apache Kafka
Apache Kafka Streams + Machine Learning / Deep Learning
“Alexa, be quiet!”: End-to-end near-real time model building and evaluation i...
Performance Tuning RocksDB for Kafka Streams' State Stores (Dhruba Borthakur,...
Hadoop Meetup Jan 2019 - HDFS Scalability and Consistent Reads from Standby Node
ksqlDB: Building Consciousness on Real Time Events
Using the New Apache Flink Kubernetes Operator in a Production Deployment
Delta Lake Streaming: Under the Hood
Tech Talk: RocksDB Slides by Dhruba Borthakur & Haobo Xu of Facebook
Arbitrary Stateful Aggregations using Structured Streaming in Apache Spark
Confluent Workshop Series: ksqlDB로 스트리밍 앱 빌드
Ad

Viewers also liked (7)

PPTX
Open Metadata and Governance with Apache Atlas
PDF
Bulletproof Jobs: Patterns For Large-Scale Spark Processing
PPTX
Introducing Exactly Once Semantics To Apache Kafka
PPTX
No data loss pipeline with apache kafka
PPTX
Avro Tutorial - Records with Schema for Kafka and Hadoop
PDF
Intro to Pinot (2016-01-04)
PDF
Pinot: Realtime Distributed OLAP datastore
Open Metadata and Governance with Apache Atlas
Bulletproof Jobs: Patterns For Large-Scale Spark Processing
Introducing Exactly Once Semantics To Apache Kafka
No data loss pipeline with apache kafka
Avro Tutorial - Records with Schema for Kafka and Hadoop
Intro to Pinot (2016-01-04)
Pinot: Realtime Distributed OLAP datastore
Ad

Similar to user Behavior Analysis with Session Windows and Apache Kafka's Streams API (20)

PDF
Using Apache Kafka to Analyze Session Windows
PDF
Streaming analytics better than batch – when and why by Dawid Wysakowicz and ...
PDF
Streaming analytics better than batch when and why - (Big Data Tech 2017)
PDF
How to Build Streaming Apps with Confluent II
PPTX
Introducing Apache Kafka's Streams API - Kafka meetup Munich, Jan 25 2017
PDF
APAC Kafka Summit - Best Of
ODP
Stream processing using Kafka
PDF
Kafka Vienna Meetup 020719
PDF
Stream Processing with Flink and Stream Sharing
PDF
Event streaming: A paradigm shift in enterprise software architecture
PDF
Apache Kafka as Event-Driven Open Source Streaming Platform (Prague Meetup)
PDF
Technical Deep Dive: Using Apache Kafka to Optimize Real-Time Analytics in Fi...
PDF
Unified Stream & Batch Processing with Apache Flink (Hadoop Summit Dublin 2016)
PPTX
Kafka Streams: The Stream Processing Engine of Apache Kafka
PDF
Kafka Streams
PDF
Streaming Analytics for Financial Enterprises
PPTX
Kafka Streams for Java enthusiasts
PDF
Streaming Analytics
PDF
Kafka Streams Windows: Behind the Curtain
PDF
Event Driven Architecture with a RESTful Microservices Architecture (Kyle Ben...
Using Apache Kafka to Analyze Session Windows
Streaming analytics better than batch – when and why by Dawid Wysakowicz and ...
Streaming analytics better than batch when and why - (Big Data Tech 2017)
How to Build Streaming Apps with Confluent II
Introducing Apache Kafka's Streams API - Kafka meetup Munich, Jan 25 2017
APAC Kafka Summit - Best Of
Stream processing using Kafka
Kafka Vienna Meetup 020719
Stream Processing with Flink and Stream Sharing
Event streaming: A paradigm shift in enterprise software architecture
Apache Kafka as Event-Driven Open Source Streaming Platform (Prague Meetup)
Technical Deep Dive: Using Apache Kafka to Optimize Real-Time Analytics in Fi...
Unified Stream & Batch Processing with Apache Flink (Hadoop Summit Dublin 2016)
Kafka Streams: The Stream Processing Engine of Apache Kafka
Kafka Streams
Streaming Analytics for Financial Enterprises
Kafka Streams for Java enthusiasts
Streaming Analytics
Kafka Streams Windows: Behind the Curtain
Event Driven Architecture with a RESTful Microservices Architecture (Kyle Ben...

More from confluent (20)

PDF
Stream Processing Handson Workshop - Flink SQL Hands-on Workshop (Korean)
PPTX
Webinar Think Right - Shift Left - 19-03-2025.pptx
PDF
Migration, backup and restore made easy using Kannika
PDF
Five Things You Need to Know About Data Streaming in 2025
PDF
Data in Motion Tour Seoul 2024 - Keynote
PDF
Data in Motion Tour Seoul 2024 - Roadmap Demo
PDF
From Stream to Screen: Real-Time Data Streaming to Web Frontends with Conflue...
PDF
Confluent per il settore FSI: Accelerare l'Innovazione con il Data Streaming...
PDF
Data in Motion Tour 2024 Riyadh, Saudi Arabia
PDF
Build a Real-Time Decision Support Application for Financial Market Traders w...
PDF
Strumenti e Strategie di Stream Governance con Confluent Platform
PDF
Compose Gen-AI Apps With Real-Time Data - In Minutes, Not Weeks
PDF
Building Real-Time Gen AI Applications with SingleStore and Confluent
PDF
Unlocking value with event-driven architecture by Confluent
PDF
Il Data Streaming per un’AI real-time di nuova generazione
PDF
Unleashing the Future: Building a Scalable and Up-to-Date GenAI Chatbot with ...
PDF
Break data silos with real-time connectivity using Confluent Cloud Connectors
PDF
Building API data products on top of your real-time data infrastructure
PDF
Speed Wins: From Kafka to APIs in Minutes
PDF
Evolving Data Governance for the Real-time Streaming and AI Era
Stream Processing Handson Workshop - Flink SQL Hands-on Workshop (Korean)
Webinar Think Right - Shift Left - 19-03-2025.pptx
Migration, backup and restore made easy using Kannika
Five Things You Need to Know About Data Streaming in 2025
Data in Motion Tour Seoul 2024 - Keynote
Data in Motion Tour Seoul 2024 - Roadmap Demo
From Stream to Screen: Real-Time Data Streaming to Web Frontends with Conflue...
Confluent per il settore FSI: Accelerare l'Innovazione con il Data Streaming...
Data in Motion Tour 2024 Riyadh, Saudi Arabia
Build a Real-Time Decision Support Application for Financial Market Traders w...
Strumenti e Strategie di Stream Governance con Confluent Platform
Compose Gen-AI Apps With Real-Time Data - In Minutes, Not Weeks
Building Real-Time Gen AI Applications with SingleStore and Confluent
Unlocking value with event-driven architecture by Confluent
Il Data Streaming per un’AI real-time di nuova generazione
Unleashing the Future: Building a Scalable and Up-to-Date GenAI Chatbot with ...
Break data silos with real-time connectivity using Confluent Cloud Connectors
Building API data products on top of your real-time data infrastructure
Speed Wins: From Kafka to APIs in Minutes
Evolving Data Governance for the Real-time Streaming and AI Era

Recently uploaded (20)

PDF
AI/ML Infra Meetup | Beyond S3's Basics: Architecting for AI-Native Data Access
PPTX
Cybersecurity-and-Fraud-Protecting-Your-Digital-Life.pptx
PPTX
Computer Software - Technology and Livelihood Education
DOCX
Modern SharePoint Intranet Templates That Boost Employee Engagement in 2025.docx
PDF
CCleaner 6.39.11548 Crack 2025 License Key
PPTX
MLforCyber_MLDataSetsandFeatures_Presentation.pptx
PDF
The Dynamic Duo Transforming Financial Accounting Systems Through Modern Expe...
PPTX
Introduction to Windows Operating System
PDF
E-Commerce Website Development Companyin india
PPTX
Tech Workshop Escape Room Tech Workshop
PDF
novaPDF Pro 11.9.482 Crack + License Key [Latest 2025]
PPTX
GSA Content Generator Crack (2025 Latest)
PPTX
most interesting chapter in the world ppt
PDF
Microsoft Office 365 Crack Download Free
PDF
Wondershare Recoverit Full Crack New Version (Latest 2025)
PDF
Introduction to Ragic - #1 No Code Tool For Digitalizing Your Business Proces...
PDF
AI/ML Infra Meetup | LLM Agents and Implementation Challenges
PDF
iTop VPN Crack Latest Version Full Key 2025
PPTX
How to Odoo 19 Installation on Ubuntu - CandidRoot
DOCX
How to Use SharePoint as an ISO-Compliant Document Management System
AI/ML Infra Meetup | Beyond S3's Basics: Architecting for AI-Native Data Access
Cybersecurity-and-Fraud-Protecting-Your-Digital-Life.pptx
Computer Software - Technology and Livelihood Education
Modern SharePoint Intranet Templates That Boost Employee Engagement in 2025.docx
CCleaner 6.39.11548 Crack 2025 License Key
MLforCyber_MLDataSetsandFeatures_Presentation.pptx
The Dynamic Duo Transforming Financial Accounting Systems Through Modern Expe...
Introduction to Windows Operating System
E-Commerce Website Development Companyin india
Tech Workshop Escape Room Tech Workshop
novaPDF Pro 11.9.482 Crack + License Key [Latest 2025]
GSA Content Generator Crack (2025 Latest)
most interesting chapter in the world ppt
Microsoft Office 365 Crack Download Free
Wondershare Recoverit Full Crack New Version (Latest 2025)
Introduction to Ragic - #1 No Code Tool For Digitalizing Your Business Proces...
AI/ML Infra Meetup | LLM Agents and Implementation Challenges
iTop VPN Crack Latest Version Full Key 2025
How to Odoo 19 Installation on Ubuntu - CandidRoot
How to Use SharePoint as an ISO-Compliant Document Management System

user Behavior Analysis with Session Windows and Apache Kafka's Streams API

  • 1. 1 User behavior analysis with Session Windows and Apache Kafka’s Streams API Michael G. Noll Product Manager
  • 2. 2 Attend the whole series! Simplify Governance for Streaming Data in Apache Kafka Date: Thursday, April 6, 2017 Time: 9:30 am - 10:00 am PT | 12:30 pm - 1:00 pm ET Speaker: Gwen Shapira, Product Manager, Confluent Using Apache Kafka to Analyze Session Windows Date: Thursday, March 30, 2017 Time: 9:30 am - 10:00 am PT | 12:30 pm - 1:00 pm ET Speaker: Michael Noll, Product Manager, Confluent Monitoring and Alerting Apache Kafka with Confluent Control Center Date: Thursday, March 16, 2017 Time: 9:30 am - 10:00 am PT | 12:30 pm - 1:00 pm ET Speaker: Nick Dearden, Director, Engineering and Product Data Pipelines Made Simple with Apache Kafka Date: Thursday, March 23, 2017 Time: 9:30 am - 10:00 am PT | 12:30 pm - 1:00 pm ET Speaker: Ewen Cheslack-Postava, Engineer, Confluent https://www.confluent.io/online-talk/online-talk-series-five-steps-to-production-with-apache-kafka/ What’s New in Apache Kafka 0.10.2 and Confluent 3.2 Date: Thursday, March 9, 2017 Time: 9:30 am - 10:00 am PT | 12:30 pm - 1:00 pm ET Speaker: Clarke Patterson, Senior Director, Product Marketing
  • 3. 3 Kafka Streams API: to build real-time apps that power your core business Key benefits • Makes your Java apps highly scalable, elastic, fault-tolerant, stateful, distributed • No additional cluster • Easy to run as a service • Supports large aggregations and joins • Security and permissions fully integrated from Kafka Example Use Cases • Microservices • Reactive applications • Continuous queries • Continuous transformations • Event-triggered processes Streams API App Instance 1 Kafka Cluster Streams API App Instance N Your App ...
  • 4. 4 Use case examples Industry Use case examples Travel Build applications with the Kafka Streams API to make real-time decisions to find best suitable pricing for individual customers, to cross-sell additional services, and to process bookings and reservations Finance Build applications to aggregate data sources for real-time views of potential exposures and for detecting and minimizing fraudulent transactions Logistics Build applications to track shipments fast, reliably, and in real-time Retail Build applications to decide in real-time on next best offers, personalized promotions, pricing, and inventory management Automotive, Manufacturing Build applications to ensure their production lines perform optimally, to gain real- time insights into supply chains, and to monitor telemetry data from connected cars to decide if an inspection is needed And many more …
  • 5. 5 Some public use cases in the wild • Why Kafka Streams: towards a real-time streaming architecture, by Sky Betting and Gaming • http://engineering.skybettingandgaming.com/2017/01/23/streaming-architectures/ • Applying Kafka’s Streams API for social messaging at LINE Corp. • http://developers.linecorp.com/blog/?p=3960 • Production pipeline at LINE, a social platform based in Japan with 220+ million users • Microservices and Reactive Applications at Capital One • https://speakerdeck.com/bobbycalderwood/commander-decoupled-immutable-rest-apis-with-kafka-streams • Containerized Kafka Streams applications in Scala, by Hive Streaming • https://www.madewithtea.com/processing-tweets-with-kafka-streams.html • Geo-spatial data analysis • http://www.infolace.com/blog/2016/07/14/simple-spatial-windowing-with-kafka-streams/ • Language classification with machine learning • https://dzone.com/articles/machine-learning-with-kafka-streams
  • 6. 6 Kafka Summit NYC, May 09 Here, the community will share latest Kafka Streams use cases. http://kafka-summit.org/
  • 7. 7 Agenda • Why are session windows so important? • Recap: What is windowing? • Session windows – example use case • Session windows – how they work • Session windows – API
  • 8. 8 Why are session windows so important? • We want to analyze user behavior, which is a very common use case area • To analyze user behavior on newspapers, social platforms, video sharing sites, booking sites, etc. • AND tailor the analysis to the individual user • Specifically, analyses of the type “how many X in one go?” – how many movies watched in one go? • Achieved through a per-user sessionization step on the input data. • AND this tailoring must be convenient and scalable • Achieved through automating the sessionization step, i.e. auto-discovery of sessions • Session-based analyses can range from simple metrics (e.g. count of user visits on a news website or social platform) to more complex metrics (e.g. customer conversion funnel and event flows).
  • 9. 9 What is windowing? • Aggregations such as “counting things” are key-based operations • Before you can aggregate your input data, it must first be grouped by key event-time8 AM7 AM6 AM event-time Alice Bob Dave 8 AM7 AM6 AM
  • 10. 10 What is windowing? • Aggregations such as “counting things” are key-based operations Alice: 10 movies Bob: 11 movies Dave: 8 movies “Let me COUNT how many movies each user has watched (IN TOTAL)” event-time Alice Bob Dave Feb 7Feb 6Feb 5
  • 11. 11 What is windowing? • Windowing allows you to further “sub-group” the input data for each user event-time Alice Bob Dave “Let me COUNT how many movies each user has watched PER DAY” Alice: 4 movies Bob: 3 movies Dave: 2 movies Feb 5 Feb 7Feb 6Feb 5
  • 12. 12 What is windowing? • Windowing allows you to further “sub-group” the input data for each user event-time Alice Bob Dave Alice: 1 movie Bob: 2 movies Dave: 4 movies Feb 6 Feb 7Feb 6Feb 5 “Let me COUNT how many movies each user has watched PER DAY”
  • 13. 13 What is windowing? • Windowing allows you to further “sub-group” the input data for each user event-time Alice Bob Dave Alice: 4 movies Bob: 4 movies Dave: 1 movie Feb 7 Feb 7Feb 6Feb 5 “Let me COUNT how many movies each user has watched PER DAY”
  • 14. 14 Session windows: use case • Session windows allow for “how many X in one go?” analyses, tailored to each key • Sessions are auto-discovered from the input data (we see how later) event-time Alice Bob Dave Alice: 1, 4, 1, 4 movies (4 sessions) Bob: 4, 6 movies (2 sessions) Dave: 3, 5 movies (2 sessions) Feb 7Feb 6Feb 5 “Let me COUNT how many movies each user has watched PER SESSION”
  • 15. 15 Comparing results • Let’s compare how results differ Alice Bob Dave IN TOTAL 10 11 8 PER DAY 3.0 (avg) 3.0 (avg) 2.3 (avg) time windows PER SESSION 2.5 (avg) 5.0 (avg) 4.0 (avg) session windowsno windows
  • 16. 16 Comparing results • Let’s compare how results differ if we our task was to rank the top users Alice Bob Dave IN TOTAL #2 #1 #3 PER DAY #1 #1 #3 time windows PER SESSION #3 #1 #2 session windowsno windows
  • 18. 18 Session windows: how they work • Definition of a session in Kafka Streams API is based on a configurable period of inactivity • Example: “If Alice hasn’t watched another movie in the past 3 hours, then next movie = new session!” Inactivity period
  • 19. 19 Auto-discovering sessions, per user event-time Alice Bob Dave … … … … … …
  • 20. 20 Auto-discovering sessions, per user event-time Alice Bob Dave … … … … … … Example: How many movies does Alice watch on average per session?” Inactivity period (e.g. 3 hours)
  • 21. 21 Auto-discovering sessions, per user event-time Alice Bob Dave … … … … … … Example: How many movies does Alice watch on average per session?”
  • 22. 22 Late-arriving data is handled transparently • Handling of late-arriving data is important because, in practice, a lot of data arrives late
  • 23. 23 Late-arriving data: example Users with mobile phones enter airplane, lose Internet connectivity Emails are being written during the 8h flight Internet connectivity is restored, phones will send queued emails now, though with an 8h delay Bob  writes  Alice  an   email  at  2  P.M. Bob’s  email  is  finally   being  sent  at  10  P.M.
  • 24. 24 Late-arriving data is handled transparently • Handling of late-arriving data is important because, in practice, a lot of data arrives late • Good news: late-arriving data is handled transparently and efficiently for you • Also, in your applications, you can define a grace period after which late-arriving data will be discarded (default: 1 day), and you can define this granularly per windowed operation • Example: “I want to sessionize the input data based on 15-min inactivity periods, and late-arriving data should be discarded if it is more than 12 hours late”
  • 25. 25 Late-arriving data is handled transparently event-time Alice Bob Dave … … … … … … • Late-arriving data may (1) create new sessions or (2) merge existing sessions
  • 26. 26 Sessions potentially merge as new events arrive Session Window
  • 27. 27 Sessions potentially merge as new events arrive Session Window
  • 28. 28 Late-arriving data is handled transparently event-time Alice Bob Dave … … … … … …
  • 29. 29 Late-arriving data is handled transparently event-time Alice Bob Dave … … … … … …
  • 31. 31Confidential Session windows: API in Confluent 3.2 / Apache Kafka 0.10.2 //  A  session  window  with  an  inactivity  gap  of  3h;  discard  data  that  is  12h late SessionWindows.with(TimeUnit.HOURS.toMillis(3)).until(TimeUnit.HOURS.toMillis(12)); Defining a session window //  Key  (String)  is  user,  value  (Avro  record)  is  the  movie  view  event  for  that  user. KStream<String,  GenericRecord>  movieViews =  ...; //  Count  movie  views  per  session,  per  user KTable<Windowed<String>,  Long>  sessionizedMovieCounts = movieViews .groupByKey(Serdes.String(),  genericAvroSerde)         .count(SessionWindows.with(TimeUnit.HOURS.toMillis(3)),  "views-­‐per-­‐session"); Full example: aggregating with session windows More details with documentation and examples at: http://docs.confluent.io/current/streams/developer-guide.html#session-windows https://github.com/confluentinc/examples
  • 32. 32Confidential Attend the whole series! Simplify Governance for Streaming Data in Apache Kafka Date: Thursday, April 6, 2017 Time: 9:30 am - 10:00 am PT | 12:30 pm - 1:00 pm ET Speaker: Gwen Shapira, Product Manager, Confluent Using Apache Kafka to Analyze Session Windows Date: Thursday, March 30, 2017 Time: 9:30 am - 10:00 am PT | 12:30 pm - 1:00 pm ET Speaker: Michael Noll, Product Manager, Confluent Monitoring and Alerting Apache Kafka with Confluent Control Center Date: Thursday, March 16, 2017 Time: 9:30 am - 10:00 am PT | 12:30 pm - 1:00 pm ET Speaker: Nick Dearden, Director, Engineering and Product Data Pipelines Made Simple with Apache Kafka Date: Thursday, March 23, 2017 Time: 9:30 am - 10:00 am PT | 12:30 pm - 1:00 pm ET Speaker: Ewen Cheslack-Postava, Engineer, Confluent https://www.confluent.io/online-talk/online-talk-series-five-steps-to-production-with-apache-kafka/ What’s New in Apache Kafka 0.10.2 and Confluent 3.2 Date: Thursday, March 9, 2017 Time: 9:30 am - 10:00 am PT | 12:30 pm - 1:00 pm ET Speaker: Clarke Patterson, Senior Director, Product Marketing UP NEXT
  • 33. 33 Why Confluent? More than just enterprise software Confluent Platform The only enterprise open source streaming platform based entirely on Apache Kafka Professional Services Best practice consultation for future Kafka deployments and optimize for performance and scalability of existing ones Enterprise Support 24x7 support for the entire Apache Kafka project, not just a portion of it Complete support across the entire adoption lifecycle Kafka Training Comprehensive hands-on courses for developers and operators from the Apache Kafka experts
  • 34. 34 Get Started with Apache Kafka Today! https://www.confluent.io/downloads/ THE place to start with Apache Kafka! Thoroughly tested and quality assured More extensible developer experience Easy upgrade path to Confluent Enterprise
  • 35. 35 Discount code: kafcom17  Use the Apache Kafka community discount code to get $50 off  www.kafka-summit.org Kafka Summit New York: May 8 Kafka Summit San Francisco: August 28 Presented by