Setting Up Service Alerting
In the previous chapter, we covered a service telemetry topic and described various types of telemetry data, such as logs, metrics, and traces. We also provided some examples of setting up telemetry data collection, allowing us to troubleshoot service performance issues and use the collected data to improve the reliability of our services.
In this chapter, we will illustrate how to use telemetry data to automatically detect incidents by setting up alerts for our microservices. You will learn which types of service metrics to collect, how to define the conditions for various incidents, and how to establish the complete alerting pipeline for your microservices using a popular monitoring and alerting tool, Prometheus.
We will cover the following topics:
- Alerting basics
- Introduction to Prometheus
- Setting up Prometheus alerting for our microservices
- Alerting best practices
Now, we are going to proceed to the overview...