Apache Kafka - Message Compression
Last Updated :
18 Mar, 2023
Kafka Producers are going to write data to topics and topics are made of partitions. Now the producers in Kafka will automatically know to which broker and partition to write based on your message and in case there is a Kafka broker failure in your cluster the producers will automatically recover from it which makes Kafka resilient and which makes Kafka so good and used today. So if we look at a diagram to have the data in our topic partitions we’re going to have a producer on the left-hand side sending data into each of the partitions of our topics.
So here is another setting that's so important which is Message Compression. Before that let's understand the Kafka Message Anatomy first.
Kafka Message Anatomy
The Kafka messages are created by the producer and the first fundamental concept we discussed is the Key. The key can be null and the type of the key is binary. So binary is 0 and 1, but it can be strings and numbers and we’ll see how this happens to convert a string or a number into a binary.
Please refer to the above image. So we have the key which is a binary field that can be null and then we have the value which is the content of your message and again this can be null as well. So the Key-Value is some of the two most important things in your message but there are other things that go into your message. For example, your message can be compressed and so the compression type can be indicated as part of your message. For example, none means no compression but we have four different kinds of compressions available in Kafka that are mentioned below.
Apache Kafka Message Compression
Basically, our producer usually sends data in the text-based form. For example, most of the time the producers are sending some JSON data. And JSON is text. In this case, it's important that you apply compression to the producer. JSON is very text heavy and it's big in size So we must compress it.
Compression types can have multiple values. It can be none, which is a default, no compression, gzip, lz4, and snappy that we have discussed above. Compression is more useful when we send a bigger batch of messages. So the more data you send to Kafka the more compression is going to be helpful. So here's how it works.
We have our producer batch and a producer batch is basically Kafka batching messages on its own. So it will have Message 1, Message 2, Message 3, up to, Message 100. It's because our producer sends a lot of messages and it wants to send them altogether if possible. Now the producer batch will get compressed because the producer, before sending the batch to Kafka, will start compressing the batch to make it much smaller. That only happens when you enable compression. Now when we send this to Kafka, well we have a big decrease in size and automatically, sending to Kafka and replicating it across brokers is so much quicker. So you have decreased latency in this size. So that's why compression is so important. And because you decrease stuff in size and so Kafka brokers have to do less replication, you use less network bandwidth. So the advantages to compress a batch are those.
Advantages of Kafka Message Compression
- We get a much smaller producer request size when it sends data to Kafka.
- It's also faster to transfer data over the network which leads to less latency and better throughput.
- We also get better disk utilization in Kafka because in Kafka on the brokers, our messages will be stored in a compressed format. So, our disk has now more capacity for more messages.
Disadvantages of Kafka Message Compression
- When you do compression, producers must commit some CPU cycles to complete that compression.
- Similarly, the consumers must commit some CPU cycles to decompress the data.
Which Compression Type You Should Choose?
So as we have discussed above there are mainly four different kinds of compressions available in Kafka, gzip, snappy, lz4, and zstd. It is recommended to use snappy or lz4 because both have the same optimal speed or compression ratio. On the other hand, Gzip is going to have the highest compression ratio, but it's not very fast. So choose, and test, it's super simple. You just change one setting and everything works. There's not one algorithm that works for everyone, so you just try them based on the kind of plan that you have and see the one that works best for you. And finally. it is highly recommended that always use compression in production, especially if you have a high throughput.
Advantages of snappy over other message compressions:
- snappy is very useful if your messages are text-based, for example, JSON documents or logs
- snappy has a good balance of compression ratio or CPU.
Similar Reads
Java Tutorial
Java is a high-level, object-oriented programming language used to build web apps, mobile applications, and enterprise software systems. It is known for its Write Once, Run Anywhere capability, which means code written in Java can run on any device that supports the Java Virtual Machine (JVM).Java s
10 min read
Java OOP(Object Oriented Programming) Concepts
Java Object-Oriented Programming (OOPs) is a fundamental concept in Java that every developer must understand. It allows developers to structure code using classes and objects, making it more modular, reusable, and scalable.The core idea of OOPs is to bind data and the functions that operate on it,
13 min read
Java Interview Questions and Answers
Java is one of the most popular programming languages in the world, known for its versatility, portability, and wide range of applications. Java is the most used language in top companies such as Uber, Airbnb, Google, Netflix, Instagram, Spotify, Amazon, and many more because of its features and per
15+ min read
Arrays in Java
Arrays in Java are one of the most fundamental data structures that allow us to store multiple values of the same type in a single variable. They are useful for storing and managing collections of data. Arrays in Java are objects, which makes them work differently from arrays in C/C++ in terms of me
15+ min read
Inheritance in Java
Java Inheritance is a fundamental concept in OOP(Object-Oriented Programming). It is the mechanism in Java by which one class is allowed to inherit the features(fields and methods) of another class. In Java, Inheritance means creating new classes based on existing ones. A class that inherits from an
13 min read
Collections in Java
Any group of individual objects that are represented as a single unit is known as a Java Collection of Objects. In Java, a separate framework named the "Collection Framework" has been defined in JDK 1.2 which holds all the Java Collection Classes and Interface in it. In Java, the Collection interfac
15+ min read
Java Exception Handling
Exception handling in Java allows developers to manage runtime errors effectively by using mechanisms like try-catch block, finally block, throwing Exceptions, Custom Exception handling, etc. An Exception is an unwanted or unexpected event that occurs during the execution of a program, i.e., at runt
10 min read
Java Interface
An Interface in Java programming language is defined as an abstract type used to specify the behaviour of a class. An interface in Java is a blueprint of a behaviour. A Java interface contains static constants and abstract methods. Key Properties of Interface:The interface in Java is a mechanism to
12 min read
Java Programs - Java Programming Examples
In this article, we will learn and prepare for Interviews using Java Programming Examples. From basic Java programs like the Fibonacci series, Prime numbers, Factorial numbers, and Palindrome numbers to advanced Java programs.Java is one of the most popular programming languages today because of its
8 min read
Polymorphism in Java
Polymorphism in Java is one of the core concepts in object-oriented programming (OOP) that allows objects to behave differently based on their specific class type. The word polymorphism means having many forms, and it comes from the Greek words poly (many) and morph (forms), this means one entity ca
7 min read