1) Apache Kafka is a distributed system with many moving parts to monitor, including brokers, topics, partitions, and the applications that use Kafka. It is critical to monitor Kafka performance to ensure high availability and catch problems early.
2) Key metrics to monitor include partition replication, broker resource usage, request latencies, and end-to-end message delivery. Monitoring message rates and comparing production to consumption helps identify issues like under- or over-consumption.
3) Identifying performance bottlenecks like slow request handling or network saturation helps optimize the Kafka cluster. Drilling down on request latency metrics provides insight into where bottlenecks exist in the request lifecycle.