SlideShare a Scribd company logo
Mind the App
How to monitor your Kafka Streams applications
Bruno Cadonna, Kafka Summit 2021 Europe
Copyright 2021, Confluent, Inc. All rights reserved. This document may not be reproduced in any manner without the express written permission of Confluent, Inc.
About me
2
Bruno Cadonna
Contributor to Apache Kafka &
Software Developer at Confluent
Copyright 2021, Confluent, Inc. All rights reserved. This document may not be reproduced in any manner without the express written permission of Confluent, Inc.
Content
3
• Basics about metrics in Kafka
• Metrics in Kafka Streams
• KIP-444: Improving Kafka Streams’ metrics
• KIP-471 and KIP-607: RocksDB metrics
• KIP-613: End-to-end latency metrics
• Takeaways
Basics about metrics in Kafka
Copyright 2021, Confluent, Inc. All rights reserved. This document may not be reproduced in any manner without the express written permission of Confluent, Inc.
A metric in Kafka
5
• consists of a name, a value, and a configuration
Copyright 2021, Confluent, Inc. All rights reserved. This document may not be reproduced in any manner without the express written permission of Confluent, Inc.
A metric in Kafka
6
• consists of a name, a value, and a configuration
• a metric name is composed of
• name
• group
• tags
• description
Copyright 2021, Confluent, Inc. All rights reserved. This document may not be reproduced in any manner without the express written permission of Confluent, Inc.
A metric in Kafka
7
• consists of a name, a value, and a configuration
• a metric name is composed of
• name
• group
• tags
• description
• a metric value inherits from the Object class, e.g. integral number, decimal number, string, …
Copyright 2021, Confluent, Inc. All rights reserved. This document may not be reproduced in any manner without the express written permission of Confluent, Inc.
A metric in Kafka
8
• consists of a name, a value, and a configuration
• a metric name is composed of
• name
• group
• tags
• description
• a metric value inherits from the Object class, e.g. integral number, decimal number, string, …
• metric config contains the recording level which can be INFO, DEBUG, TRACE
Copyright 2021, Confluent, Inc. All rights reserved. This document may not be reproduced in any manner without the express written permission of Confluent, Inc.
A metric in Kafka
9
• consists of a name, a value, and a configuration
• a metric name is composed of
• name
• group
• tags
• description
• a metric value inherits from the Object class, e.g. integral number, decimal number, string, …
• metric config contains the recording level which can be INFO, DEBUG, TRACE
• example:
• name: process-rate
• group: stream-thread-metrics
• tags: thread-id = myapp-2d0b492c-87f1-11eb-8dcd-0242ac130003-StreamThread-1
• description: The average number of processed records per second
• value: 123456.78
• recording level: INFO
Copyright 2021, Confluent, Inc. All rights reserved. This document may not be reproduced in any manner without the express written permission of Confluent, Inc.
A sensor in Kafka
10
• maintains a sequence of recorded values
Copyright 2021, Confluent, Inc. All rights reserved. This document may not be reproduced in any manner without the express written permission of Confluent, Inc.
A sensor in Kafka
11
• maintains a sequence of recorded values
• maintains a set of metrics
Copyright 2021, Confluent, Inc. All rights reserved. This document may not be reproduced in any manner without the express written permission of Confluent, Inc.
A sensor in Kafka
12
• maintains a sequence of recorded values
• maintains a set of metrics
• each metric specifies an aggregation on the recorded values
Copyright 2021, Confluent, Inc. All rights reserved. This document may not be reproduced in any manner without the express written permission of Confluent, Inc.
A sensor in Kafka
13
• maintains a sequence of recorded values
• maintains a set of metrics
• each metric specifies an aggregation on the recorded values
• each time a value is recorded all metrics in a sensor are updated
Copyright 2021, Confluent, Inc. All rights reserved. This document may not be reproduced in any manner without the express written permission of Confluent, Inc.
A sensor in Kafka
14
• maintains a sequence of recorded values
• maintains a set of metrics
• each metric specifies an aggregation for the recorded values
• each time a value is recorded all metrics in a sensor are updated
• example:
• process-rate and process-total are recorded by the same sensor
• process-rate computes the number of processed records over time
• process-total computes the total number of processed records
Metrics in Kafka Streams
Copyright 2021, Confluent, Inc. All rights reserved. This document may not be reproduced in any manner without the express written permission of Confluent, Inc.
Anatomy of a Kafka Streams application
16
Kafka Streams client
Copyright 2021, Confluent, Inc. All rights reserved. This document may not be reproduced in any manner without the express written permission of Confluent, Inc.
Anatomy of a Kafka Streams application
17
stream thread 1
stream thread 2
Kafka Streams client
Copyright 2021, Confluent, Inc. All rights reserved. This document may not be reproduced in any manner without the express written permission of Confluent, Inc.
Anatomy of a Kafka Streams application
18
stream thread 1
task 1
task 2
task 3
task 4
task 5
processor node
state store
cache
stream thread 2
Kafka Streams client
Copyright 2021, Confluent, Inc. All rights reserved. This document may not be reproduced in any manner without the express written permission of Confluent, Inc.
How does Kafka Streams report metrics?
19
Kafka Streams client
metrics()
read-only map of metrics
Copyright 2021, Confluent, Inc. All rights reserved. This document may not be reproduced in any manner without the express written permission of Confluent, Inc.
How does Kafka Streams report metrics?
20
metrics()
read-only map of metrics
JMX reporter
implements
MetricsReporter
my reporter
implements
MetricsReporter
Kafka Streams config:
metric.reporter
by default,
no need to set
Kafka Streams client
Copyright 2021, Confluent, Inc. All rights reserved. This document may not be reproduced in any manner without the express written permission of Confluent, Inc.
How does Kafka Streams report metrics?
21
metrics()
read-only map of metrics
JMX reporter
implements
MetricsReporter
my reporter
implements
MetricsReporter
Kafka Streams config:
metric.reporter
interface MetricsReporter {
// called when a metric is added or updated
void metricChange(KafkaMetric metric);
// called when a metric is removed
void metricRemoval(KafkaMetric metric);
}
by default,
no need to set
Kafka Streams client
Copyright 2021, Confluent, Inc. All rights reserved. This document may not be reproduced in any manner without the express written permission of Confluent, Inc.
jconsole
22
Copyright 2021, Confluent, Inc. All rights reserved. This document may not be reproduced in any manner without the express written permission of Confluent, Inc.
jconsole
23
metric name
metric description
metric value
Copyright 2021, Confluent, Inc. All rights reserved. This document may not be reproduced in any manner without the express written permission of Confluent, Inc.
jconsole
24
metric name
tag: thread-id
metric group
metric description
metric value
Copyright 2021, Confluent, Inc. All rights reserved. This document may not be reproduced in any manner without the express written permission of Confluent, Inc.
Datadog
25
Copyright 2021, Confluent, Inc. All rights reserved. This document may not be reproduced in any manner without the express written permission of Confluent, Inc.
Datadog
26
metric name
Copyright 2021, Confluent, Inc. All rights reserved. This document may not be reproduced in any manner without the express written permission of Confluent, Inc.
Datadog
27
metric group
tags
metric name
Copyright 2021, Confluent, Inc. All rights reserved. This document may not be reproduced in any manner without the express written permission of Confluent, Inc.
What metrics does Kafka Streams expose?
28
• Kafka Streams client level:
• name: state
• group: stream-metrics
• tags: client-id = myapp-2d0b492c-87f1-11eb-8dcd-0242ac130003
Copyright 2021, Confluent, Inc. All rights reserved. This document may not be reproduced in any manner without the express written permission of Confluent, Inc.
What metrics does Kafka Streams expose?
29
• Kafka Streams client level:
• name: state
• group: stream-metrics
• tags: client-id = myapp-2d0b492c-87f1-11eb-8dcd-0242ac130003
• stream thread level:
• name: process-rate
• group: stream-thread-metrics
• tags: thread-id = myapp-2d0b492c-87f1-11eb-8dcd-0242ac130003-StreamThread-1
Copyright 2021, Confluent, Inc. All rights reserved. This document may not be reproduced in any manner without the express written permission of Confluent, Inc.
What metrics does Kafka Streams expose?
30
• Kafka Streams client level:
• name: state
• group: stream-metrics
• tags: client-id = myapp-2d0b492c-87f1-11eb-8dcd-0242ac130003
• stream thread level:
• name: process-rate
• group: stream-thread-metrics
• tags: thread-id = myapp-2d0b492c-87f1-11eb-8dcd-0242ac130003-StreamThread-1
• task level:
• name: process-latency-avg
• group: stream-task-metrics
• tags: thread-id = myapp-2d0b492c-87f1-11eb-8dcd-0242ac130003-StreamThread-1,
task-id = 0_1
Copyright 2021, Confluent, Inc. All rights reserved. This document may not be reproduced in any manner without the express written permission of Confluent, Inc.
…some more metrics
31
• processor node level
• name: process-rate
• group: stream-processor-node-metrics
• tags: thread-id = myapp-2d0b492c-87f1-11eb-8dcd-0242ac130003-StreamThread-1,
task-id = 0_1,
processor-node-id = KSTREAM-SINK-0000000004
Copyright 2021, Confluent, Inc. All rights reserved. This document may not be reproduced in any manner without the express written permission of Confluent, Inc.
…some more metrics
32
• processor node level
• name: process-rate
• group: stream-processor-node-metrics
• tags: thread-id = myapp-2d0b492c-87f1-11eb-8dcd-0242ac130003-StreamThread-1,
task-id = 0_1,
processor-node-id = KSTREAM-SINK-0000000004
• state store level
• name: put-rate
• group: stream-state-metrics
• tags: thread-id = myapp-2d0b492c-87f1-11eb-8dcd-0242ac130003-StreamThread-1,
task-id = 0_1,
rocksdb-state-id = count-items
Copyright 2021, Confluent, Inc. All rights reserved. This document may not be reproduced in any manner without the express written permission of Confluent, Inc.
…some more metrics
33
• processor node level
• name: process-rate
• group: stream-processor-node-metrics
• tags: thread-id = myapp-2d0b492c-87f1-11eb-8dcd-0242ac130003-StreamThread-1,
task-id = 0_1,
processor-node-id = KSTREAM-SINK-0000000004
• state store level
• name: put-rate
• group: stream-state-metrics
• tags: thread-id = myapp-2d0b492c-87f1-11eb-8dcd-0242ac130003-StreamThread-1,
task-id = 0_1,
rocksdb-state-id = count-items
• cache level
• name: hit-ratio-avg
• group: stream-record-cache-metrics
• tags: thread-id = myapp-2d0b492c-87f1-11eb-8dcd-0242ac130003-StreamThread-1,
task-id = 0_1,
record-cache-id = 0_1-count-items
Copyright 2021, Confluent, Inc. All rights reserved. This document may not be reproduced in any manner without the express written permission of Confluent, Inc.
… and finally
34
• all metrics of embedded consumers, producers, and admin client
• name: last-rebalance-seconds-ago
• group: consumer-coordinator-metrics
• tags: thread-id = myapp-2d0b492c-87f1-11eb-8dcd-0242ac130003-StreamThread-1-consumer
KIP-444:
Improving Kafka Streams’ metrics
Copyright 2021, Confluent, Inc. All rights reserved. This document may not be reproduced in any manner without the express written permission of Confluent, Inc.
New metrics
36
• introduces client-level metrics
• version,
• commit-id,
• application-id,
• topology-description,
• state,
• alive-stream-threads
Copyright 2021, Confluent, Inc. All rights reserved. This document may not be reproduced in any manner without the express written permission of Confluent, Inc.
New metrics
37
• introduces client-level metrics
• version,
• commit-id,
• application-id,
• topology-description,
• state,
• alive-stream-threads
• introduces new task level metrics
• active-process-ratio,
• standby-process-ratio (not yet implemented),
• dropped-records
Copyright 2021, Confluent, Inc. All rights reserved. This document may not be reproduced in any manner without the express written permission of Confluent, Inc.
Refactorings
38
• renames some metric names and some metric tags
• client-level and stream thread-level metrics on INFO and most metrics on lower levels on
DEBUG
• removes all parent metrics except one and let users do the roll-up themselves
• removes overlapping metrics
• dropped-records (task-level, INFO) replaces
• late-records-drop (processor node, INFO),
• skipped-records (processor node, INFO),
• expired-window-record-drop (state store, DEBUG)
Copyright 2021, Confluent, Inc. All rights reserved. This document may not be reproduced in any manner without the express written permission of Confluent, Inc.
Improving custom metrics
39
• Sensor addLatencyRateTotalSensor(final String scopeName,
final String entityName,
final String operationName,
final Sensor.RecordingLevel recordingLevel,
final String... tags);
• Sensor addRateTotalSensor(final String scopeName,
final String entityName,
final String operationName,
final Sensor.RecordingLevel recordingLevel,
final String... tags);
Copyright 2021, Confluent, Inc. All rights reserved. This document may not be reproduced in any manner without the express written permission of Confluent, Inc.
Improving custom metrics
40
• Sensor addLatencyRateTotalSensor(final String scopeName,
final String entityName,
final String operationName,
final Sensor.RecordingLevel recordingLevel,
final String... tags);
• Sensor addRateTotalSensor(final String scopeName,
final String entityName,
final String operationName,
final Sensor.RecordingLevel recordingLevel,
final String... tags);
• only available where you have access to the ProcessorContext
Copyright 2021, Confluent, Inc. All rights reserved. This document may not be reproduced in any manner without the express written permission of Confluent, Inc.
Improving custom metrics
41
• Sensor addLatencyRateTotalSensor(final String scopeName,
final String entityName,
final String operationName,
final Sensor.RecordingLevel recordingLevel,
final String... tags);
• Sensor addRateTotalSensor(final String scopeName,
final String entityName,
final String operationName,
final Sensor.RecordingLevel recordingLevel,
final String... tags);
• only available where you have access to the ProcessorContext
• you can add additional metrics to the sensor with Sensor#add()
Copyright 2021, Confluent, Inc. All rights reserved. This document may not be reproduced in any manner without the express written permission of Confluent, Inc.
Example of custom metrics
42
public class Processor<String, String, String, String>() {
private ProcessorContext context;
private KeyValueStore<String, Integer> kvStore;
private Sensor countEmptyRecords;
@Overrid
public void init(final ProcessorContext<String, String> context) {
this.context = context;
countEmptyRecords = context.metrics().addRateTotalSensor(
"word-counter",
"word-counter" + context.taskId(),
"count-empty-messages",
RecordingLevel.INFO
);
kvStore = context.getStateStore("Counts");
}
@Override
public void process(final Record<String, String> record) {
final String[] words = record.value().toLowerCase(Locale.getDefault()).split(" ");
if (words.length == 0) {
countEmptyRecords.record();
}
for (final String word : words) {
final Integer oldValue = kvStore.get(word);
if (oldValue == null) {
kvStore.put(word, 1);
} else {
kvStore.put(word, oldValue + 1);
}
}
}
};
KIP-471 and KIP-607:
RocksDB metrics
Copyright 2021, Confluent, Inc. All rights reserved. This document may not be reproduced in any manner without the express written permission of Confluent, Inc.
RocksDB metrics
44
• RocksDB is the default state store in Kafka Streams
Copyright 2021, Confluent, Inc. All rights reserved. This document may not be reproduced in any manner without the express written permission of Confluent, Inc.
RocksDB metrics
45
• RocksDB is the default state store in Kafka Streams
• statistics-based metrics (KIP-471, AK 2.4): cumulative measurements over time collected by
RocksDB
• name: bytes-written-rate
• group: stream-state-metrics
• tags: thread-id = myapp-2d0b492c-87f1-11eb-8dcd-0242ac130003-StreamThread-1,
task-id = 0_1,
rocksdb-state-id = count-items
Copyright 2021, Confluent, Inc. All rights reserved. This document may not be reproduced in any manner without the express written permission of Confluent, Inc.
RocksDB metrics
46
• RocksDB is the default state store in Kafka Streams
• statistics-based metrics (KIP-471, AK 2.4): cumulative measurements over time collected by
RocksDB
• name: bytes-written-rate
• group: stream-state-metrics
• tags: thread-id = myapp-2d0b492c-87f1-11eb-8dcd-0242ac130003-StreamThread-1,
task-id = 0_1,
rocksdb-state-id = count-items
• properties-based metrics (KIP-607, AK 2.7): properties exposed by RocksDB providing current
measurements
• name: block-cache-usage
• group: stream-state-metrics
• tags: thread-id = myapp-2d0b492c-87f1-11eb-8dcd-0242ac130003-StreamThread-1,
task-id = 0_1,
rocksdb-state-id = count-items
Copyright 2021, Confluent, Inc. All rights reserved. This document may not be reproduced in any manner without the express written permission of Confluent, Inc.
Recording RocksDB metrics
47
• statistics-based metrics
• collecting statistics-based metrics may have an impact on performance
• recording metrics during state store operations might be costly
• instead each state store has a metric recorder
• all metric recorders are triggered once per minute by one dedicated thread that is started at Kafka Streams client start-up
Copyright 2021, Confluent, Inc. All rights reserved. This document may not be reproduced in any manner without the express written permission of Confluent, Inc.
Recording RocksDB metrics
48
• statistics-based metrics
• collecting statistics-based metrics may have an impact on performance
• recording metrics during state store operations might be costly
• instead each state store has a metric recorder
• all metric recorders are triggered once per minute by one dedicated thread that is started at Kafka Streams client start-up
• properties-based metrics
• all properties-based metrics are gauges
• a gauge executes some given code each time the metric is queried
• properties-based metrics query RocksDB properties
Copyright 2021, Confluent, Inc. All rights reserved. This document may not be reproduced in any manner without the express written permission of Confluent, Inc.
When to look at RocksDB metrics?
49
• high memory usage
• size-all-mem-tables
• block-cache-usage
• block-cache-pinned-usage
• estimate-table-readers-mem
statistics-based metrics
properties-based metrics
Copyright 2021, Confluent, Inc. All rights reserved. This document may not be reproduced in any manner without the express written permission of Confluent, Inc.
When to look at RocksDB metrics?
50
• high memory usage
• size-all-mem-tables
• block-cache-usage
• block-cache-pinned-usage
• estimate-table-readers-mem
• high disk usage
• total-sst-files-size
statistics-based metrics
properties-based metrics
Copyright 2021, Confluent, Inc. All rights reserved. This document may not be reproduced in any manner without the express written permission of Confluent, Inc.
When to look at RocksDB metrics?
51
• high memory usage
• size-all-mem-tables
• block-cache-usage
• block-cache-pinned-usage
• estimate-table-readers-mem
• high disk usage
• total-sst-files-size
• high disk I/O and write stalls
• memtable-bytes-flushed-[rate | total]
• bytes-[read | written]-compaction-rate
• write-stall-duration-[avg | total]
• memtable-hit-ratio
• block-cache-[data | index | filter]-hit-ratio
statistics-based metrics
properties-based metrics
Copyright 2021, Confluent, Inc. All rights reserved. This document may not be reproduced in any manner without the express written permission of Confluent, Inc.
When to look at RocksDB metrics?
52
• high memory usage
• size-all-mem-tables
• block-cache-usage
• block-cache-pinned-usage
• estimate-table-readers-mem
• high disk usage
• total-sst-files-size
• high disk I/O and write stalls
• memtable-bytes-flushed-[rate | total]
• bytes-[read | written]-compaction-rate
• write-stall-duration-[avg | total]
• memtable-hit-ratio
• block-cache-[data | index | filter]-hit-ratio
• too many open files
• number-open-files
statistics-based metrics
properties-based metrics
Copyright 2021, Confluent, Inc. All rights reserved. This document may not be reproduced in any manner without the express written permission of Confluent, Inc.
When to look at RocksDB metrics?
53
• high memory usage
• size-all-mem-tables
• block-cache-usage
• block-cache-pinned-usage
• estimate-table-readers-mem
• high disk usage
• total-sst-files-size
• high disk I/O and write stalls
• memtable-bytes-flushed-[rate | total]
• bytes-[read | written]-compaction-rate
• write-stall-duration-[avg | total]
• memtable-hit-ratio
• block-cache-[data | index | filter]-hit-ratio
• too many open files
• number-open-files
for more details, check out the blog post:
How to Tune RocksDB for Your Kafka Streams Application
https://www.confluent.io/blog/how-to-tune-rocksdb-kafka-streams-state-stores-performance/
statistics-based metrics
properties-based metrics
KIP-613:
End-to-end latency metrics
Copyright 2021, Confluent, Inc. All rights reserved. This document may not be reproduced in any manner without the express written permission of Confluent, Inc.
End-to-end-latency metrics
55
source node filter
aggregation
sink node
Copyright 2021, Confluent, Inc. All rights reserved. This document may not be reproduced in any manner without the express written permission of Confluent, Inc.
End-to-end-latency metrics
56
source node filter
aggregation
sink node
consumption latency (INFO) name: record-e2e-latency-[min | max | avg]
group: stream-processor-node-metrics
tags: thread-id = myapp-…,
task-id = 0_1,
processor-node-id = KSTREAM-SOURCE-0000000004
event time processing time
Copyright 2021, Confluent, Inc. All rights reserved. This document may not be reproduced in any manner without the express written permission of Confluent, Inc.
End-to-end-latency metrics
57
source node filter
aggregation
sink node
consumption latency (INFO) name: record-e2e-latency-[min | max | avg]
group: stream-processor-node-metrics
tags: thread-id = myapp-…,
task-id = 0_1,
processor-node-id = KSTREAM-SOURCE-0000000004
event time processing time
full end-to-end latency (INFO) name: record-e2e-latency-[min | max | avg]
group: stream-processor-node-metrics
tags: thread-id = myapp-…,
task-id = 0_1,
processor-node-id = KSTREAM-SINK-0000000004
event time processing time
Copyright 2021, Confluent, Inc. All rights reserved. This document may not be reproduced in any manner without the express written permission of Confluent, Inc.
End-to-end-latency metrics
58
source node filter
aggregation
sink node
begin-to-state latency (TRACE)
event time processing time
name: record-e2e-latency-[min | max | avg]
group: stream-state-metrics
tags: thread-id = myapp-…,
task-id = 0_1,
rocksdb-state-id = count-items
consumption latency (INFO) name: record-e2e-latency-[min | max | avg]
group: stream-processor-node-metrics
tags: thread-id = myapp-…,
task-id = 0_1,
processor-node-id = KSTREAM-SOURCE-0000000004
event time processing time
full end-to-end latency (INFO) name: record-e2e-latency-[min | max | avg]
group: stream-processor-node-metrics
tags: thread-id = myapp-…,
task-id = 0_1,
processor-node-id = KSTREAM-SINK-0000000004
event time processing time
Copyright 2021, Confluent, Inc. All rights reserved. This document may not be reproduced in any manner without the express written permission of Confluent, Inc.
End-to-end-latency metrics (advanced)
59
source node filter
aggregation
sink node source node filter
aggregation
sink node
task 1 task 2
event time processing time
processing time
event time
Copyright 2021, Confluent, Inc. All rights reserved. This document may not be reproduced in any manner without the express written permission of Confluent, Inc.
End-to-end-latency metrics (advanced)
60
source node filter
aggregation
sink node source node filter
aggregation
sink node
task 1 task 2
event time processing time
processing time
event time
event time processing time
processing delay of task 2
Takeaways
Copyright 2021, Confluent, Inc. All rights reserved. This document may not be reproduced in any manner without the express written permission of Confluent, Inc.
Takeaways
62
• Kafka Streams exposes various metrics on different levels
• metrics were consolidated recently-ish
• RocksDB metrics let you gain insight into state stores
• Kafka Streams allows monitoring record end-to-end latencies
Thank you!
bruno@confluent.io
63
cnfl.io/slack
cnfl.io/blog
cnfl.io/meetups
cnfl.io/forum

More Related Content

PDF
From Zero to Hero with Kafka Connect
ODP
Stream processing using Kafka
PDF
What is the State of my Kafka Streams Application? Unleashing Metrics. | Neil...
PPTX
No data loss pipeline with apache kafka
PDF
Performance Tuning RocksDB for Kafka Streams’ State Stores
PDF
Kafka Streams State Stores Being Persistent
PDF
Introduction to Apache Kafka and Confluent... and why they matter
PDF
Apache Kafka Fundamentals for Architects, Admins and Developers
From Zero to Hero with Kafka Connect
Stream processing using Kafka
What is the State of my Kafka Streams Application? Unleashing Metrics. | Neil...
No data loss pipeline with apache kafka
Performance Tuning RocksDB for Kafka Streams’ State Stores
Kafka Streams State Stores Being Persistent
Introduction to Apache Kafka and Confluent... and why they matter
Apache Kafka Fundamentals for Architects, Admins and Developers

What's hot (20)

PPTX
Evening out the uneven: dealing with skew in Flink
PDF
Apache Kafka® Security Overview
PDF
Performance Tuning RocksDB for Kafka Streams' State Stores (Dhruba Borthakur,...
PDF
Building a fully managed stream processing platform on Flink at scale for Lin...
PDF
Improving fault tolerance and scaling out in Kafka Streams with Bill Bejeck |...
PDF
Apache Kafka Architecture & Fundamentals Explained
PDF
Apache kafka 모니터링을 위한 Metrics 이해 및 최적화 방안
PPTX
Deep Dive into Apache Kafka
PDF
Introduction to Kafka Streams
PPTX
Virtual Flink Forward 2020: A deep dive into Flink SQL - Jark Wu
PPTX
Extending Flink SQL for stream processing use cases
PPTX
Apache Kafka
PDF
Kafka Streams: What it is, and how to use it?
PDF
ksqlDB - Stream Processing simplified!
PPTX
Exactly-Once Financial Data Processing at Scale with Flink and Pinot
PDF
Apache flink
PDF
Everything You Always Wanted to Know About Kafka’s Rebalance Protocol but Wer...
PDF
Handle Large Messages In Apache Kafka
PPTX
Kafka 101
PPTX
Introduction to Apache Flink
Evening out the uneven: dealing with skew in Flink
Apache Kafka® Security Overview
Performance Tuning RocksDB for Kafka Streams' State Stores (Dhruba Borthakur,...
Building a fully managed stream processing platform on Flink at scale for Lin...
Improving fault tolerance and scaling out in Kafka Streams with Bill Bejeck |...
Apache Kafka Architecture & Fundamentals Explained
Apache kafka 모니터링을 위한 Metrics 이해 및 최적화 방안
Deep Dive into Apache Kafka
Introduction to Kafka Streams
Virtual Flink Forward 2020: A deep dive into Flink SQL - Jark Wu
Extending Flink SQL for stream processing use cases
Apache Kafka
Kafka Streams: What it is, and how to use it?
ksqlDB - Stream Processing simplified!
Exactly-Once Financial Data Processing at Scale with Flink and Pinot
Apache flink
Everything You Always Wanted to Know About Kafka’s Rebalance Protocol but Wer...
Handle Large Messages In Apache Kafka
Kafka 101
Introduction to Apache Flink
Ad

Similar to Mind the App: How to Monitor Your Kafka Streams Applications | Bruno Cadonna, Confluent (20)

PPTX
Data Pipelines with Kafka Connect
PDF
A Practical Deep Dive into Observability of Streaming Applications with Kosta...
PDF
Introducing Kafka's Streams API
PPTX
Not Your Mother's Kafka - Deep Dive into Confluent Cloud Infrastructure | Gwe...
PPTX
What’s new in Apache Spark 2.3
PDF
Confluent kafka meetupseattle jan2017
PPT
Kubernetes for Cloud-Native Environments
PDF
Deploying Kafka Streams Applications with Docker and Kubernetes
PDF
PortoTechHub - Hail Hydrate! From Stream to Lake with Apache Pulsar and Friends
PDF
dA Platform Overview
PDF
Flink Forward San Francisco 2018: Robert Metzger & Patrick Lucas - "dA Platfo...
PDF
Web Scale Reasoning and the LarKC Project
PDF
Au delà des brokers, un tour de l’environnement Kafka | Florent Ramière
PPTX
Building streaming data applications using Kafka*[Connect + Core + Streams] b...
PDF
Concepts and Patterns for Streaming Services with Kafka
PDF
ОЛЕГ МАЦЬКІВ «Crash course on Operator Framework» Lviv DevOps Conference 2019
PDF
Apache spark 2.4 and beyond
PDF
Resume2015
PDF
Presentación11.pdf
PDF
Load Balancing in the Cloud using Nginx & Kubernetes
Data Pipelines with Kafka Connect
A Practical Deep Dive into Observability of Streaming Applications with Kosta...
Introducing Kafka's Streams API
Not Your Mother's Kafka - Deep Dive into Confluent Cloud Infrastructure | Gwe...
What’s new in Apache Spark 2.3
Confluent kafka meetupseattle jan2017
Kubernetes for Cloud-Native Environments
Deploying Kafka Streams Applications with Docker and Kubernetes
PortoTechHub - Hail Hydrate! From Stream to Lake with Apache Pulsar and Friends
dA Platform Overview
Flink Forward San Francisco 2018: Robert Metzger & Patrick Lucas - "dA Platfo...
Web Scale Reasoning and the LarKC Project
Au delà des brokers, un tour de l’environnement Kafka | Florent Ramière
Building streaming data applications using Kafka*[Connect + Core + Streams] b...
Concepts and Patterns for Streaming Services with Kafka
ОЛЕГ МАЦЬКІВ «Crash course on Operator Framework» Lviv DevOps Conference 2019
Apache spark 2.4 and beyond
Resume2015
Presentación11.pdf
Load Balancing in the Cloud using Nginx & Kubernetes
Ad

More from HostedbyConfluent (20)

PDF
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
PDF
Renaming a Kafka Topic | Kafka Summit London
PDF
Evolution of NRT Data Ingestion Pipeline at Trendyol
PDF
Ensuring Kafka Service Resilience: A Dive into Health-Checking Techniques
PDF
Exactly-once Stream Processing with Arroyo and Kafka
PDF
Fish Plays Pokemon | Kafka Summit London
PDF
Tiered Storage 101 | Kafla Summit London
PDF
Building a Self-Service Stream Processing Portal: How And Why
PDF
From the Trenches: Improving Kafka Connect Source Connector Ingestion from 7 ...
PDF
Future with Zero Down-Time: End-to-end Resiliency with Chaos Engineering and ...
PDF
Navigating Private Network Connectivity Options for Kafka Clusters
PDF
Apache Flink: Building a Company-wide Self-service Streaming Data Platform
PDF
Explaining How Real-Time GenAI Works in a Noisy Pub
PDF
TL;DR Kafka Metrics | Kafka Summit London
PDF
A Window Into Your Kafka Streams Tasks | KSL
PDF
Mastering Kafka Producer Configs: A Guide to Optimizing Performance
PDF
Data Contracts Management: Schema Registry and Beyond
PDF
Code-First Approach: Crafting Efficient Flink Apps
PDF
Debezium vs. the World: An Overview of the CDC Ecosystem
PDF
Beyond Tiered Storage: Serverless Kafka with No Local Disks
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Renaming a Kafka Topic | Kafka Summit London
Evolution of NRT Data Ingestion Pipeline at Trendyol
Ensuring Kafka Service Resilience: A Dive into Health-Checking Techniques
Exactly-once Stream Processing with Arroyo and Kafka
Fish Plays Pokemon | Kafka Summit London
Tiered Storage 101 | Kafla Summit London
Building a Self-Service Stream Processing Portal: How And Why
From the Trenches: Improving Kafka Connect Source Connector Ingestion from 7 ...
Future with Zero Down-Time: End-to-end Resiliency with Chaos Engineering and ...
Navigating Private Network Connectivity Options for Kafka Clusters
Apache Flink: Building a Company-wide Self-service Streaming Data Platform
Explaining How Real-Time GenAI Works in a Noisy Pub
TL;DR Kafka Metrics | Kafka Summit London
A Window Into Your Kafka Streams Tasks | KSL
Mastering Kafka Producer Configs: A Guide to Optimizing Performance
Data Contracts Management: Schema Registry and Beyond
Code-First Approach: Crafting Efficient Flink Apps
Debezium vs. the World: An Overview of the CDC Ecosystem
Beyond Tiered Storage: Serverless Kafka with No Local Disks

Recently uploaded (20)

PDF
Per capita expenditure prediction using model stacking based on satellite ima...
PDF
Diabetes mellitus diagnosis method based random forest with bat algorithm
PDF
Mushroom cultivation and it's methods.pdf
PDF
August Patch Tuesday
PPTX
OMC Textile Division Presentation 2021.pptx
PPTX
Digital-Transformation-Roadmap-for-Companies.pptx
PDF
Machine learning based COVID-19 study performance prediction
PDF
Agricultural_Statistics_at_a_Glance_2022_0.pdf
PDF
Encapsulation theory and applications.pdf
PPTX
Spectroscopy.pptx food analysis technology
PDF
Reach Out and Touch Someone: Haptics and Empathic Computing
PPT
Teaching material agriculture food technology
PDF
Accuracy of neural networks in brain wave diagnosis of schizophrenia
PDF
MIND Revenue Release Quarter 2 2025 Press Release
PPTX
Machine Learning_overview_presentation.pptx
PDF
Profit Center Accounting in SAP S/4HANA, S4F28 Col11
PDF
Network Security Unit 5.pdf for BCA BBA.
PDF
Spectral efficient network and resource selection model in 5G networks
PPTX
cloud_computing_Infrastucture_as_cloud_p
PDF
Assigned Numbers - 2025 - Bluetooth® Document
Per capita expenditure prediction using model stacking based on satellite ima...
Diabetes mellitus diagnosis method based random forest with bat algorithm
Mushroom cultivation and it's methods.pdf
August Patch Tuesday
OMC Textile Division Presentation 2021.pptx
Digital-Transformation-Roadmap-for-Companies.pptx
Machine learning based COVID-19 study performance prediction
Agricultural_Statistics_at_a_Glance_2022_0.pdf
Encapsulation theory and applications.pdf
Spectroscopy.pptx food analysis technology
Reach Out and Touch Someone: Haptics and Empathic Computing
Teaching material agriculture food technology
Accuracy of neural networks in brain wave diagnosis of schizophrenia
MIND Revenue Release Quarter 2 2025 Press Release
Machine Learning_overview_presentation.pptx
Profit Center Accounting in SAP S/4HANA, S4F28 Col11
Network Security Unit 5.pdf for BCA BBA.
Spectral efficient network and resource selection model in 5G networks
cloud_computing_Infrastucture_as_cloud_p
Assigned Numbers - 2025 - Bluetooth® Document

Mind the App: How to Monitor Your Kafka Streams Applications | Bruno Cadonna, Confluent

  • 1. Mind the App How to monitor your Kafka Streams applications Bruno Cadonna, Kafka Summit 2021 Europe
  • 2. Copyright 2021, Confluent, Inc. All rights reserved. This document may not be reproduced in any manner without the express written permission of Confluent, Inc. About me 2 Bruno Cadonna Contributor to Apache Kafka & Software Developer at Confluent
  • 3. Copyright 2021, Confluent, Inc. All rights reserved. This document may not be reproduced in any manner without the express written permission of Confluent, Inc. Content 3 • Basics about metrics in Kafka • Metrics in Kafka Streams • KIP-444: Improving Kafka Streams’ metrics • KIP-471 and KIP-607: RocksDB metrics • KIP-613: End-to-end latency metrics • Takeaways
  • 5. Copyright 2021, Confluent, Inc. All rights reserved. This document may not be reproduced in any manner without the express written permission of Confluent, Inc. A metric in Kafka 5 • consists of a name, a value, and a configuration
  • 6. Copyright 2021, Confluent, Inc. All rights reserved. This document may not be reproduced in any manner without the express written permission of Confluent, Inc. A metric in Kafka 6 • consists of a name, a value, and a configuration • a metric name is composed of • name • group • tags • description
  • 7. Copyright 2021, Confluent, Inc. All rights reserved. This document may not be reproduced in any manner without the express written permission of Confluent, Inc. A metric in Kafka 7 • consists of a name, a value, and a configuration • a metric name is composed of • name • group • tags • description • a metric value inherits from the Object class, e.g. integral number, decimal number, string, …
  • 8. Copyright 2021, Confluent, Inc. All rights reserved. This document may not be reproduced in any manner without the express written permission of Confluent, Inc. A metric in Kafka 8 • consists of a name, a value, and a configuration • a metric name is composed of • name • group • tags • description • a metric value inherits from the Object class, e.g. integral number, decimal number, string, … • metric config contains the recording level which can be INFO, DEBUG, TRACE
  • 9. Copyright 2021, Confluent, Inc. All rights reserved. This document may not be reproduced in any manner without the express written permission of Confluent, Inc. A metric in Kafka 9 • consists of a name, a value, and a configuration • a metric name is composed of • name • group • tags • description • a metric value inherits from the Object class, e.g. integral number, decimal number, string, … • metric config contains the recording level which can be INFO, DEBUG, TRACE • example: • name: process-rate • group: stream-thread-metrics • tags: thread-id = myapp-2d0b492c-87f1-11eb-8dcd-0242ac130003-StreamThread-1 • description: The average number of processed records per second • value: 123456.78 • recording level: INFO
  • 10. Copyright 2021, Confluent, Inc. All rights reserved. This document may not be reproduced in any manner without the express written permission of Confluent, Inc. A sensor in Kafka 10 • maintains a sequence of recorded values
  • 11. Copyright 2021, Confluent, Inc. All rights reserved. This document may not be reproduced in any manner without the express written permission of Confluent, Inc. A sensor in Kafka 11 • maintains a sequence of recorded values • maintains a set of metrics
  • 12. Copyright 2021, Confluent, Inc. All rights reserved. This document may not be reproduced in any manner without the express written permission of Confluent, Inc. A sensor in Kafka 12 • maintains a sequence of recorded values • maintains a set of metrics • each metric specifies an aggregation on the recorded values
  • 13. Copyright 2021, Confluent, Inc. All rights reserved. This document may not be reproduced in any manner without the express written permission of Confluent, Inc. A sensor in Kafka 13 • maintains a sequence of recorded values • maintains a set of metrics • each metric specifies an aggregation on the recorded values • each time a value is recorded all metrics in a sensor are updated
  • 14. Copyright 2021, Confluent, Inc. All rights reserved. This document may not be reproduced in any manner without the express written permission of Confluent, Inc. A sensor in Kafka 14 • maintains a sequence of recorded values • maintains a set of metrics • each metric specifies an aggregation for the recorded values • each time a value is recorded all metrics in a sensor are updated • example: • process-rate and process-total are recorded by the same sensor • process-rate computes the number of processed records over time • process-total computes the total number of processed records
  • 15. Metrics in Kafka Streams
  • 16. Copyright 2021, Confluent, Inc. All rights reserved. This document may not be reproduced in any manner without the express written permission of Confluent, Inc. Anatomy of a Kafka Streams application 16 Kafka Streams client
  • 17. Copyright 2021, Confluent, Inc. All rights reserved. This document may not be reproduced in any manner without the express written permission of Confluent, Inc. Anatomy of a Kafka Streams application 17 stream thread 1 stream thread 2 Kafka Streams client
  • 18. Copyright 2021, Confluent, Inc. All rights reserved. This document may not be reproduced in any manner without the express written permission of Confluent, Inc. Anatomy of a Kafka Streams application 18 stream thread 1 task 1 task 2 task 3 task 4 task 5 processor node state store cache stream thread 2 Kafka Streams client
  • 19. Copyright 2021, Confluent, Inc. All rights reserved. This document may not be reproduced in any manner without the express written permission of Confluent, Inc. How does Kafka Streams report metrics? 19 Kafka Streams client metrics() read-only map of metrics
  • 20. Copyright 2021, Confluent, Inc. All rights reserved. This document may not be reproduced in any manner without the express written permission of Confluent, Inc. How does Kafka Streams report metrics? 20 metrics() read-only map of metrics JMX reporter implements MetricsReporter my reporter implements MetricsReporter Kafka Streams config: metric.reporter by default, no need to set Kafka Streams client
  • 21. Copyright 2021, Confluent, Inc. All rights reserved. This document may not be reproduced in any manner without the express written permission of Confluent, Inc. How does Kafka Streams report metrics? 21 metrics() read-only map of metrics JMX reporter implements MetricsReporter my reporter implements MetricsReporter Kafka Streams config: metric.reporter interface MetricsReporter { // called when a metric is added or updated void metricChange(KafkaMetric metric); // called when a metric is removed void metricRemoval(KafkaMetric metric); } by default, no need to set Kafka Streams client
  • 22. Copyright 2021, Confluent, Inc. All rights reserved. This document may not be reproduced in any manner without the express written permission of Confluent, Inc. jconsole 22
  • 23. Copyright 2021, Confluent, Inc. All rights reserved. This document may not be reproduced in any manner without the express written permission of Confluent, Inc. jconsole 23 metric name metric description metric value
  • 24. Copyright 2021, Confluent, Inc. All rights reserved. This document may not be reproduced in any manner without the express written permission of Confluent, Inc. jconsole 24 metric name tag: thread-id metric group metric description metric value
  • 25. Copyright 2021, Confluent, Inc. All rights reserved. This document may not be reproduced in any manner without the express written permission of Confluent, Inc. Datadog 25
  • 26. Copyright 2021, Confluent, Inc. All rights reserved. This document may not be reproduced in any manner without the express written permission of Confluent, Inc. Datadog 26 metric name
  • 27. Copyright 2021, Confluent, Inc. All rights reserved. This document may not be reproduced in any manner without the express written permission of Confluent, Inc. Datadog 27 metric group tags metric name
  • 28. Copyright 2021, Confluent, Inc. All rights reserved. This document may not be reproduced in any manner without the express written permission of Confluent, Inc. What metrics does Kafka Streams expose? 28 • Kafka Streams client level: • name: state • group: stream-metrics • tags: client-id = myapp-2d0b492c-87f1-11eb-8dcd-0242ac130003
  • 29. Copyright 2021, Confluent, Inc. All rights reserved. This document may not be reproduced in any manner without the express written permission of Confluent, Inc. What metrics does Kafka Streams expose? 29 • Kafka Streams client level: • name: state • group: stream-metrics • tags: client-id = myapp-2d0b492c-87f1-11eb-8dcd-0242ac130003 • stream thread level: • name: process-rate • group: stream-thread-metrics • tags: thread-id = myapp-2d0b492c-87f1-11eb-8dcd-0242ac130003-StreamThread-1
  • 30. Copyright 2021, Confluent, Inc. All rights reserved. This document may not be reproduced in any manner without the express written permission of Confluent, Inc. What metrics does Kafka Streams expose? 30 • Kafka Streams client level: • name: state • group: stream-metrics • tags: client-id = myapp-2d0b492c-87f1-11eb-8dcd-0242ac130003 • stream thread level: • name: process-rate • group: stream-thread-metrics • tags: thread-id = myapp-2d0b492c-87f1-11eb-8dcd-0242ac130003-StreamThread-1 • task level: • name: process-latency-avg • group: stream-task-metrics • tags: thread-id = myapp-2d0b492c-87f1-11eb-8dcd-0242ac130003-StreamThread-1, task-id = 0_1
  • 31. Copyright 2021, Confluent, Inc. All rights reserved. This document may not be reproduced in any manner without the express written permission of Confluent, Inc. …some more metrics 31 • processor node level • name: process-rate • group: stream-processor-node-metrics • tags: thread-id = myapp-2d0b492c-87f1-11eb-8dcd-0242ac130003-StreamThread-1, task-id = 0_1, processor-node-id = KSTREAM-SINK-0000000004
  • 32. Copyright 2021, Confluent, Inc. All rights reserved. This document may not be reproduced in any manner without the express written permission of Confluent, Inc. …some more metrics 32 • processor node level • name: process-rate • group: stream-processor-node-metrics • tags: thread-id = myapp-2d0b492c-87f1-11eb-8dcd-0242ac130003-StreamThread-1, task-id = 0_1, processor-node-id = KSTREAM-SINK-0000000004 • state store level • name: put-rate • group: stream-state-metrics • tags: thread-id = myapp-2d0b492c-87f1-11eb-8dcd-0242ac130003-StreamThread-1, task-id = 0_1, rocksdb-state-id = count-items
  • 33. Copyright 2021, Confluent, Inc. All rights reserved. This document may not be reproduced in any manner without the express written permission of Confluent, Inc. …some more metrics 33 • processor node level • name: process-rate • group: stream-processor-node-metrics • tags: thread-id = myapp-2d0b492c-87f1-11eb-8dcd-0242ac130003-StreamThread-1, task-id = 0_1, processor-node-id = KSTREAM-SINK-0000000004 • state store level • name: put-rate • group: stream-state-metrics • tags: thread-id = myapp-2d0b492c-87f1-11eb-8dcd-0242ac130003-StreamThread-1, task-id = 0_1, rocksdb-state-id = count-items • cache level • name: hit-ratio-avg • group: stream-record-cache-metrics • tags: thread-id = myapp-2d0b492c-87f1-11eb-8dcd-0242ac130003-StreamThread-1, task-id = 0_1, record-cache-id = 0_1-count-items
  • 34. Copyright 2021, Confluent, Inc. All rights reserved. This document may not be reproduced in any manner without the express written permission of Confluent, Inc. … and finally 34 • all metrics of embedded consumers, producers, and admin client • name: last-rebalance-seconds-ago • group: consumer-coordinator-metrics • tags: thread-id = myapp-2d0b492c-87f1-11eb-8dcd-0242ac130003-StreamThread-1-consumer
  • 36. Copyright 2021, Confluent, Inc. All rights reserved. This document may not be reproduced in any manner without the express written permission of Confluent, Inc. New metrics 36 • introduces client-level metrics • version, • commit-id, • application-id, • topology-description, • state, • alive-stream-threads
  • 37. Copyright 2021, Confluent, Inc. All rights reserved. This document may not be reproduced in any manner without the express written permission of Confluent, Inc. New metrics 37 • introduces client-level metrics • version, • commit-id, • application-id, • topology-description, • state, • alive-stream-threads • introduces new task level metrics • active-process-ratio, • standby-process-ratio (not yet implemented), • dropped-records
  • 38. Copyright 2021, Confluent, Inc. All rights reserved. This document may not be reproduced in any manner without the express written permission of Confluent, Inc. Refactorings 38 • renames some metric names and some metric tags • client-level and stream thread-level metrics on INFO and most metrics on lower levels on DEBUG • removes all parent metrics except one and let users do the roll-up themselves • removes overlapping metrics • dropped-records (task-level, INFO) replaces • late-records-drop (processor node, INFO), • skipped-records (processor node, INFO), • expired-window-record-drop (state store, DEBUG)
  • 39. Copyright 2021, Confluent, Inc. All rights reserved. This document may not be reproduced in any manner without the express written permission of Confluent, Inc. Improving custom metrics 39 • Sensor addLatencyRateTotalSensor(final String scopeName, final String entityName, final String operationName, final Sensor.RecordingLevel recordingLevel, final String... tags); • Sensor addRateTotalSensor(final String scopeName, final String entityName, final String operationName, final Sensor.RecordingLevel recordingLevel, final String... tags);
  • 40. Copyright 2021, Confluent, Inc. All rights reserved. This document may not be reproduced in any manner without the express written permission of Confluent, Inc. Improving custom metrics 40 • Sensor addLatencyRateTotalSensor(final String scopeName, final String entityName, final String operationName, final Sensor.RecordingLevel recordingLevel, final String... tags); • Sensor addRateTotalSensor(final String scopeName, final String entityName, final String operationName, final Sensor.RecordingLevel recordingLevel, final String... tags); • only available where you have access to the ProcessorContext
  • 41. Copyright 2021, Confluent, Inc. All rights reserved. This document may not be reproduced in any manner without the express written permission of Confluent, Inc. Improving custom metrics 41 • Sensor addLatencyRateTotalSensor(final String scopeName, final String entityName, final String operationName, final Sensor.RecordingLevel recordingLevel, final String... tags); • Sensor addRateTotalSensor(final String scopeName, final String entityName, final String operationName, final Sensor.RecordingLevel recordingLevel, final String... tags); • only available where you have access to the ProcessorContext • you can add additional metrics to the sensor with Sensor#add()
  • 42. Copyright 2021, Confluent, Inc. All rights reserved. This document may not be reproduced in any manner without the express written permission of Confluent, Inc. Example of custom metrics 42 public class Processor<String, String, String, String>() { private ProcessorContext context; private KeyValueStore<String, Integer> kvStore; private Sensor countEmptyRecords; @Overrid public void init(final ProcessorContext<String, String> context) { this.context = context; countEmptyRecords = context.metrics().addRateTotalSensor( "word-counter", "word-counter" + context.taskId(), "count-empty-messages", RecordingLevel.INFO ); kvStore = context.getStateStore("Counts"); } @Override public void process(final Record<String, String> record) { final String[] words = record.value().toLowerCase(Locale.getDefault()).split(" "); if (words.length == 0) { countEmptyRecords.record(); } for (final String word : words) { final Integer oldValue = kvStore.get(word); if (oldValue == null) { kvStore.put(word, 1); } else { kvStore.put(word, oldValue + 1); } } } };
  • 44. Copyright 2021, Confluent, Inc. All rights reserved. This document may not be reproduced in any manner without the express written permission of Confluent, Inc. RocksDB metrics 44 • RocksDB is the default state store in Kafka Streams
  • 45. Copyright 2021, Confluent, Inc. All rights reserved. This document may not be reproduced in any manner without the express written permission of Confluent, Inc. RocksDB metrics 45 • RocksDB is the default state store in Kafka Streams • statistics-based metrics (KIP-471, AK 2.4): cumulative measurements over time collected by RocksDB • name: bytes-written-rate • group: stream-state-metrics • tags: thread-id = myapp-2d0b492c-87f1-11eb-8dcd-0242ac130003-StreamThread-1, task-id = 0_1, rocksdb-state-id = count-items
  • 46. Copyright 2021, Confluent, Inc. All rights reserved. This document may not be reproduced in any manner without the express written permission of Confluent, Inc. RocksDB metrics 46 • RocksDB is the default state store in Kafka Streams • statistics-based metrics (KIP-471, AK 2.4): cumulative measurements over time collected by RocksDB • name: bytes-written-rate • group: stream-state-metrics • tags: thread-id = myapp-2d0b492c-87f1-11eb-8dcd-0242ac130003-StreamThread-1, task-id = 0_1, rocksdb-state-id = count-items • properties-based metrics (KIP-607, AK 2.7): properties exposed by RocksDB providing current measurements • name: block-cache-usage • group: stream-state-metrics • tags: thread-id = myapp-2d0b492c-87f1-11eb-8dcd-0242ac130003-StreamThread-1, task-id = 0_1, rocksdb-state-id = count-items
  • 47. Copyright 2021, Confluent, Inc. All rights reserved. This document may not be reproduced in any manner without the express written permission of Confluent, Inc. Recording RocksDB metrics 47 • statistics-based metrics • collecting statistics-based metrics may have an impact on performance • recording metrics during state store operations might be costly • instead each state store has a metric recorder • all metric recorders are triggered once per minute by one dedicated thread that is started at Kafka Streams client start-up
  • 48. Copyright 2021, Confluent, Inc. All rights reserved. This document may not be reproduced in any manner without the express written permission of Confluent, Inc. Recording RocksDB metrics 48 • statistics-based metrics • collecting statistics-based metrics may have an impact on performance • recording metrics during state store operations might be costly • instead each state store has a metric recorder • all metric recorders are triggered once per minute by one dedicated thread that is started at Kafka Streams client start-up • properties-based metrics • all properties-based metrics are gauges • a gauge executes some given code each time the metric is queried • properties-based metrics query RocksDB properties
  • 49. Copyright 2021, Confluent, Inc. All rights reserved. This document may not be reproduced in any manner without the express written permission of Confluent, Inc. When to look at RocksDB metrics? 49 • high memory usage • size-all-mem-tables • block-cache-usage • block-cache-pinned-usage • estimate-table-readers-mem statistics-based metrics properties-based metrics
  • 50. Copyright 2021, Confluent, Inc. All rights reserved. This document may not be reproduced in any manner without the express written permission of Confluent, Inc. When to look at RocksDB metrics? 50 • high memory usage • size-all-mem-tables • block-cache-usage • block-cache-pinned-usage • estimate-table-readers-mem • high disk usage • total-sst-files-size statistics-based metrics properties-based metrics
  • 51. Copyright 2021, Confluent, Inc. All rights reserved. This document may not be reproduced in any manner without the express written permission of Confluent, Inc. When to look at RocksDB metrics? 51 • high memory usage • size-all-mem-tables • block-cache-usage • block-cache-pinned-usage • estimate-table-readers-mem • high disk usage • total-sst-files-size • high disk I/O and write stalls • memtable-bytes-flushed-[rate | total] • bytes-[read | written]-compaction-rate • write-stall-duration-[avg | total] • memtable-hit-ratio • block-cache-[data | index | filter]-hit-ratio statistics-based metrics properties-based metrics
  • 52. Copyright 2021, Confluent, Inc. All rights reserved. This document may not be reproduced in any manner without the express written permission of Confluent, Inc. When to look at RocksDB metrics? 52 • high memory usage • size-all-mem-tables • block-cache-usage • block-cache-pinned-usage • estimate-table-readers-mem • high disk usage • total-sst-files-size • high disk I/O and write stalls • memtable-bytes-flushed-[rate | total] • bytes-[read | written]-compaction-rate • write-stall-duration-[avg | total] • memtable-hit-ratio • block-cache-[data | index | filter]-hit-ratio • too many open files • number-open-files statistics-based metrics properties-based metrics
  • 53. Copyright 2021, Confluent, Inc. All rights reserved. This document may not be reproduced in any manner without the express written permission of Confluent, Inc. When to look at RocksDB metrics? 53 • high memory usage • size-all-mem-tables • block-cache-usage • block-cache-pinned-usage • estimate-table-readers-mem • high disk usage • total-sst-files-size • high disk I/O and write stalls • memtable-bytes-flushed-[rate | total] • bytes-[read | written]-compaction-rate • write-stall-duration-[avg | total] • memtable-hit-ratio • block-cache-[data | index | filter]-hit-ratio • too many open files • number-open-files for more details, check out the blog post: How to Tune RocksDB for Your Kafka Streams Application https://www.confluent.io/blog/how-to-tune-rocksdb-kafka-streams-state-stores-performance/ statistics-based metrics properties-based metrics
  • 55. Copyright 2021, Confluent, Inc. All rights reserved. This document may not be reproduced in any manner without the express written permission of Confluent, Inc. End-to-end-latency metrics 55 source node filter aggregation sink node
  • 56. Copyright 2021, Confluent, Inc. All rights reserved. This document may not be reproduced in any manner without the express written permission of Confluent, Inc. End-to-end-latency metrics 56 source node filter aggregation sink node consumption latency (INFO) name: record-e2e-latency-[min | max | avg] group: stream-processor-node-metrics tags: thread-id = myapp-…, task-id = 0_1, processor-node-id = KSTREAM-SOURCE-0000000004 event time processing time
  • 57. Copyright 2021, Confluent, Inc. All rights reserved. This document may not be reproduced in any manner without the express written permission of Confluent, Inc. End-to-end-latency metrics 57 source node filter aggregation sink node consumption latency (INFO) name: record-e2e-latency-[min | max | avg] group: stream-processor-node-metrics tags: thread-id = myapp-…, task-id = 0_1, processor-node-id = KSTREAM-SOURCE-0000000004 event time processing time full end-to-end latency (INFO) name: record-e2e-latency-[min | max | avg] group: stream-processor-node-metrics tags: thread-id = myapp-…, task-id = 0_1, processor-node-id = KSTREAM-SINK-0000000004 event time processing time
  • 58. Copyright 2021, Confluent, Inc. All rights reserved. This document may not be reproduced in any manner without the express written permission of Confluent, Inc. End-to-end-latency metrics 58 source node filter aggregation sink node begin-to-state latency (TRACE) event time processing time name: record-e2e-latency-[min | max | avg] group: stream-state-metrics tags: thread-id = myapp-…, task-id = 0_1, rocksdb-state-id = count-items consumption latency (INFO) name: record-e2e-latency-[min | max | avg] group: stream-processor-node-metrics tags: thread-id = myapp-…, task-id = 0_1, processor-node-id = KSTREAM-SOURCE-0000000004 event time processing time full end-to-end latency (INFO) name: record-e2e-latency-[min | max | avg] group: stream-processor-node-metrics tags: thread-id = myapp-…, task-id = 0_1, processor-node-id = KSTREAM-SINK-0000000004 event time processing time
  • 59. Copyright 2021, Confluent, Inc. All rights reserved. This document may not be reproduced in any manner without the express written permission of Confluent, Inc. End-to-end-latency metrics (advanced) 59 source node filter aggregation sink node source node filter aggregation sink node task 1 task 2 event time processing time processing time event time
  • 60. Copyright 2021, Confluent, Inc. All rights reserved. This document may not be reproduced in any manner without the express written permission of Confluent, Inc. End-to-end-latency metrics (advanced) 60 source node filter aggregation sink node source node filter aggregation sink node task 1 task 2 event time processing time processing time event time event time processing time processing delay of task 2
  • 62. Copyright 2021, Confluent, Inc. All rights reserved. This document may not be reproduced in any manner without the express written permission of Confluent, Inc. Takeaways 62 • Kafka Streams exposes various metrics on different levels • metrics were consolidated recently-ish • RocksDB metrics let you gain insight into state stores • Kafka Streams allows monitoring record end-to-end latencies