SlideShare a Scribd company logo
1
Introducing Exactly Once
Semantics in Apache Kafka
Jason Gustafson, Guozhang Wang, Sriram
Subramaniam, and Apurva Mehta
2
On deck..
• Kafka’s existing delivery semantics.
• Why did we improve them?
• What’s new?
• How do you use it?
• Summary.
3
Apache Kafka’s existing semantics
4
Existing Semantics
5
Existing Semantics
6
Existing Semantics
7
Existing Semantics
8
Existing Semantics
9
Existing Semantics
10
Existing Semantics
11
Existing Semantics
12
Existing Semantics
13
TL;DR – What we have today
• At least once in order delivery per partition.
• Producer retries can introduce duplicates.
14
Why improve?
15
Why improve?
• Stream processing is becoming an ever bigger part of the
data landscape.
• Apache Kafka is the heart of the streams platform.
• Strengthening Kafka’s semantics expands the universe of
streaming applications.
16
A motivating example..
A peer to peer lending platform which processes micro-loans
between users.
17
A Peer to Peer Lender
18
The Basic Flow
19
Offset commits
20
Reprocessed transfer, eek!
21
Lost money! Eek eek!
22
What’s new?
23
What’s new
• Exactly once in order delivery per partition
• Atomic writes across multiple partitions
• Performance considerations
24
What’s new, Part 1
Exactly once, in order, delivery per partition
25
The idempotent producer
26
The idempotent producer
27
The idempotent producer
28
The idempotent producer
29
The idempotent producer
30
The idempotent producer
31
The idempotent producer
32
The idempotent producer
33
TL;DR
• Sequence numbers and producer ids:
• enable de-dup
• are in the log.
• Hence de-dup works transparently across leader
changes.
• Will not de-dup application-level resends.
• Works transparently – no API changes.
34
What’s new, part 2
Multi partition writes.
35
Introducing ‘transactions’
producer.initTransactions();
try {
producer.beginTransaction();
producer.send(record0);
producer.send(record1);
producer.sendOffsetsToTxn(…);
producer.commitTransaction();
} catch (ProducerFencedException e) {
producer.close();
} catch (KafkaException e) {
producer.abortTransaction();
}
36
Introducing ‘transactions’
37
Initializing ‘transactions’
38
Transactional sends – part 1
39
Transactional sends – part 2
40
Commit – phase 1
41
Commit – phase 2
42
Commit – phase 2
43
Success!
44
Let’s review the APIs
producer.initTransactions();
try {
producer.beginTransaction();
producer.send(record0);
producer.send(record1);
producer.sendOffsetsToTxn(…);
producer.commitTransaction();
} catch (ProducerFencedException e) {
producer.close();
} catch (KafkaException e) {
producer.abortTransaction();
}
45
Let’s review the APIs
producer.initTransactions();
try {
producer.beginTransaction();
producer.send(record0);
producer.send(record1);
producer.sendOffsetsToTxn(…);
producer.commitTransaction();
} catch (ProducerFencedException e) {
producer.close();
} catch (KafkaException e) {
producer.abortTransaction();
}
46
Let’s review the APIs
producer.initTransactions();
try {
producer.beginTransaction();
producer.send(record0);
producer.send(record1);
producer.sendOffsetsToTxn(…);
producer.commitTransaction();
} catch (ProducerFencedException e) {
producer.close();
} catch (KafkaException e) {
producer.abortTransaction();
}
47
Let’s review the APIs
producer.initTransactions();
try {
producer.beginTransaction();
producer.send(record0);
producer.send(record1);
producer.sendOffsetsToTxn(…);
producer.commitTransaction();
} catch (ProducerFencedException e) {
producer.close();
} catch (KafkaException e) {
producer.abortTransaction();
}
48
Let’s review the APIs
producer.initTransactions();
try {
producer.beginTransaction();
producer.send(record0);
producer.send(record1);
producer.sendOffsetsToTxn(…);
producer.commitTransaction();
} catch (ProducerFencedException e) {
producer.close();
} catch (KafkaException e) {
producer.abortTransaction();
}
49
Consumer returns only committed messages
50
Some notes on consuming transactions
• Two ‘isolation levels’ : read_committed, and
read_uncommitted.
• Messages read in offset order.
• read_committed consumers read to the point where there
are no open transactions.
51
TL;DR
• Transaction coordinator and transaction log maintain
transaction state.
• Use the new producer APIs for transactions.
• Consumers can read only committed messages.
52
Part 3
Performance!
53
What’s new, part 3: Performance boost!
• Up to +20% producer throughput
• Up to +50% consumer throughput
• Up to -20% disk utilization
• Savings start when you batch
• Details: https://bit.ly/kafka-eos-perf
54
Too good to be true?
Let’s understand how!
55
The old message format
56
The new format
57
The new format -> new fields
58
The new format -> new fields
59
The new format -> delta encoding
60
A visual comparison with 7 records, 10 bytes each
61
TL;DR
• With a batch size of 2, the new format starts saving
space.
• Savings are maximal for large batches of small
messages.
• Hence higher throughput when IO bound.
• Works as soon as you upgrade to the new format.
62
Cool!
But how do I use this?
63
Producer Configs
• enable.idempotence = true
• max.inflight.requests.per.connection=1
• acks = “all”
• retries > 1 (preferably MAX_INT)
• transactional.id = ‘some unique id’
• enable.idempotence = true
64
Consumer configs
• isolation.level:
• “read_committed”, or
• “read_uncommitted”
65
Streams config
• processing.mode = “exactly_once”
66
Putting it together
• We understood Kafka’s existing delivery semantics
• Understood why we want to improve them
• Learned how these have been strengthened
• Learned how the new semantics work
67
When is it available?
Available to try in Kafka 0.11, June 2017.
68
Thank You!

More Related Content

PDF
How do Chatbots Work? A Guide to Chatbot Architecture
PPTX
[NDC 2018] 신입 개발자가 알아야 할 윈도우 메모리릭 디버깅
PPTX
온라인 게임에서 사례로 살펴보는 디버깅 in NDC2010
PPTX
GPGPU(CUDA)를 이용한 MMOG 캐릭터 충돌처리
PDF
Introduction to flutter
PDF
Improving GStreamer performance on large pipelines: from profiling to optimiz...
PPTX
chatGPT.txt
PPT
NVIDIA CUDA
How do Chatbots Work? A Guide to Chatbot Architecture
[NDC 2018] 신입 개발자가 알아야 할 윈도우 메모리릭 디버깅
온라인 게임에서 사례로 살펴보는 디버깅 in NDC2010
GPGPU(CUDA)를 이용한 MMOG 캐릭터 충돌처리
Introduction to flutter
Improving GStreamer performance on large pipelines: from profiling to optimiz...
chatGPT.txt
NVIDIA CUDA

What's hot (20)

PDF
Building beautiful apps using google flutter
PDF
TFLite NNAPI and GPU Delegates
PDF
Why I ❤️ Kotlin Multiplatform (and want YOU to also ❤️ Kotlin Multiplatform)
PPTX
Chatbot_Presentation
PDF
python internship with making some intresting programs
PDF
CPU vs. GPU presentation
PDF
Android binder introduction
PDF
An introduction to MQTT
PDF
How to Build an App with ChatGPT.pdf
PPTX
Android PPT
PDF
Architecture at Scale
PPTX
PPTX
CPU vs GPU Comparison
PPTX
What is Kotlin Multiplaform? Why & How?
PDF
Large Language Models - Chat AI.pdf
PDF
Dialogflow
PDF
Introduction to the Disruptor
PDF
Machine learning with firebase ml kit
PDF
ChatGPT PPT
PDF
Uses of AI text bot.pdf
Building beautiful apps using google flutter
TFLite NNAPI and GPU Delegates
Why I ❤️ Kotlin Multiplatform (and want YOU to also ❤️ Kotlin Multiplatform)
Chatbot_Presentation
python internship with making some intresting programs
CPU vs. GPU presentation
Android binder introduction
An introduction to MQTT
How to Build an App with ChatGPT.pdf
Android PPT
Architecture at Scale
CPU vs GPU Comparison
What is Kotlin Multiplaform? Why & How?
Large Language Models - Chat AI.pdf
Dialogflow
Introduction to the Disruptor
Machine learning with firebase ml kit
ChatGPT PPT
Uses of AI text bot.pdf
Ad

Viewers also liked (8)

PPT
Working Effectively With Legacy Code
PDF
Bulletproof Jobs: Patterns For Large-Scale Spark Processing
PPTX
Open Metadata and Governance with Apache Atlas
PDF
user Behavior Analysis with Session Windows and Apache Kafka's Streams API
PPTX
Avro Tutorial - Records with Schema for Kafka and Hadoop
PPTX
PDF
Intro to Pinot (2016-01-04)
PDF
Pinot: Realtime Distributed OLAP datastore
Working Effectively With Legacy Code
Bulletproof Jobs: Patterns For Large-Scale Spark Processing
Open Metadata and Governance with Apache Atlas
user Behavior Analysis with Session Windows and Apache Kafka's Streams API
Avro Tutorial - Records with Schema for Kafka and Hadoop
Intro to Pinot (2016-01-04)
Pinot: Realtime Distributed OLAP datastore
Ad

Similar to Introducing Exactly Once Semantics To Apache Kafka (20)

PDF
Kafka Summit NYC 2017 - Introducing Exactly Once Semantics in Apache Kafka
PDF
Exactly-once Semantics in Apache Kafka
PDF
Springone2gx 2014 Reactive Streams and Reactor
PDF
BigDataSpain 2016: Stream Processing Applications with Apache Apex
PPTX
Apache Flink(tm) - A Next-Generation Stream Processor
PDF
Exactly-once Stream Processing Done Right with Matthias J Sax
PDF
Stream Processing use cases and applications with Apache Apex by Thomas Weise
PDF
Stream processing in python with Apache Samza and Beam
PDF
My internship presentation at WSO2
PDF
Highly concurrent yet natural programming
PDF
Why scala is not my ideal language and what I can do with this
PPTX
Apache Kafka
PDF
The Future of Messaging: RabbitMQ and AMQP
PDF
/* pOrt80BKK */ - PHP Day - PHP Performance with APC + Memcached for Windows
PDF
Journey into Reactive Streams and Akka Streams
PDF
NoSQL afternoon in Japan kumofs & MessagePack
PDF
NoSQL afternoon in Japan Kumofs & MessagePack
PPTX
Asynchronous PHP - reactPHP et oui, ça existe!
PDF
High Availability for OpenStack
PPT
Reactive programming with examples
Kafka Summit NYC 2017 - Introducing Exactly Once Semantics in Apache Kafka
Exactly-once Semantics in Apache Kafka
Springone2gx 2014 Reactive Streams and Reactor
BigDataSpain 2016: Stream Processing Applications with Apache Apex
Apache Flink(tm) - A Next-Generation Stream Processor
Exactly-once Stream Processing Done Right with Matthias J Sax
Stream Processing use cases and applications with Apache Apex by Thomas Weise
Stream processing in python with Apache Samza and Beam
My internship presentation at WSO2
Highly concurrent yet natural programming
Why scala is not my ideal language and what I can do with this
Apache Kafka
The Future of Messaging: RabbitMQ and AMQP
/* pOrt80BKK */ - PHP Day - PHP Performance with APC + Memcached for Windows
Journey into Reactive Streams and Akka Streams
NoSQL afternoon in Japan kumofs & MessagePack
NoSQL afternoon in Japan Kumofs & MessagePack
Asynchronous PHP - reactPHP et oui, ça existe!
High Availability for OpenStack
Reactive programming with examples

Recently uploaded (20)

PPTX
CYBER-CRIMES AND SECURITY A guide to understanding
PDF
Embodied AI: Ushering in the Next Era of Intelligent Systems
PDF
SM_6th-Sem__Cse_Internet-of-Things.pdf IOT
PDF
PPT on Performance Review to get promotions
PPT
Project quality management in manufacturing
PPTX
CH1 Production IntroductoryConcepts.pptx
PPTX
Internet of Things (IOT) - A guide to understanding
PPTX
M Tech Sem 1 Civil Engineering Environmental Sciences.pptx
PDF
July 2025 - Top 10 Read Articles in International Journal of Software Enginee...
PPTX
web development for engineering and engineering
PPTX
IOT PPTs Week 10 Lecture Material.pptx of NPTEL Smart Cities contd
PDF
Digital Logic Computer Design lecture notes
PDF
Mitigating Risks through Effective Management for Enhancing Organizational Pe...
PDF
Well-logging-methods_new................
PPT
Mechanical Engineering MATERIALS Selection
PDF
Operating System & Kernel Study Guide-1 - converted.pdf
PPTX
Infosys Presentation by1.Riyan Bagwan 2.Samadhan Naiknavare 3.Gaurav Shinde 4...
PDF
The CXO Playbook 2025 – Future-Ready Strategies for C-Suite Leaders Cerebrai...
PPTX
CARTOGRAPHY AND GEOINFORMATION VISUALIZATION chapter1 NPTE (2).pptx
PPTX
UNIT-1 - COAL BASED THERMAL POWER PLANTS
CYBER-CRIMES AND SECURITY A guide to understanding
Embodied AI: Ushering in the Next Era of Intelligent Systems
SM_6th-Sem__Cse_Internet-of-Things.pdf IOT
PPT on Performance Review to get promotions
Project quality management in manufacturing
CH1 Production IntroductoryConcepts.pptx
Internet of Things (IOT) - A guide to understanding
M Tech Sem 1 Civil Engineering Environmental Sciences.pptx
July 2025 - Top 10 Read Articles in International Journal of Software Enginee...
web development for engineering and engineering
IOT PPTs Week 10 Lecture Material.pptx of NPTEL Smart Cities contd
Digital Logic Computer Design lecture notes
Mitigating Risks through Effective Management for Enhancing Organizational Pe...
Well-logging-methods_new................
Mechanical Engineering MATERIALS Selection
Operating System & Kernel Study Guide-1 - converted.pdf
Infosys Presentation by1.Riyan Bagwan 2.Samadhan Naiknavare 3.Gaurav Shinde 4...
The CXO Playbook 2025 – Future-Ready Strategies for C-Suite Leaders Cerebrai...
CARTOGRAPHY AND GEOINFORMATION VISUALIZATION chapter1 NPTE (2).pptx
UNIT-1 - COAL BASED THERMAL POWER PLANTS

Introducing Exactly Once Semantics To Apache Kafka

Editor's Notes

  • #34: Stress the application level resends bit. Encourage people to rely on the producer retries and not re-send messages from their apps.
  • #42: New concept – control messages. Mention that commit markers are special message with log the producer id and the result of the transaction. These messages are not passed on to application – the client interprets them and acts accordingly.
  • #50: Mention ’read_uncommitted’ Mention that the buffering is broker side.
  • #51: Mention transaction expiration.
  • #62: Specify that you don’t need to use any of the new features to get these performance savings.
  • #64: max.inflight is required for idempotence. It will cause a slowdown because you now have a sync producer.