SlideShare a Scribd company logo
How to monitor your micro-service with
Prometheus?
How to design the metrics?
WOJCIECH BARCZYŃSKI - SMACC.IO | 2 OCTOBER 2018
ABOUT ME
Lead So ware Developer - SMACC (FinTech/AI)
Before:
System Engineer i Developer Lyke
Before:
1000+ nodes, 20 data centers with Openstack
Point of view:
Startups, fast-moving environment
WHY?
MONOLIT ;)
WHY?
MICROSERVICES ;)
OBSERVABILITY
Monitoring
Logging
Tracing
OBSERVABILITY
Go for Industrial Programming by Peter Bourgon
NOT A SILVER-BULLET
but:
Easy to setup
Immediately value
Suprisengly: the last one implemented
CENTRALIZED LOGGING
Usually much too late
Post-mortem
Hard to find the needle
Like a debugging vs testing
MONITORING
Numbers
Trends
Dependencies
+ Actions
METRIC
Name Label Value
traefik_requests_total code="200",
method="GET"
3001
MONITORING
Demo app
MONITORING
Example from couchbase blog
HOW TO FIND THE RIGHT METRIC?
HOW TO FIND THE RIGHT METRIC?
USE
RED
USE
Utilization the average time that the resource was
busy servicing work
Saturation extra work which it can't service, o en
queued
Errors the count of error events
Documented and Promoted by Berdan Gregg
USE
Utilization: as a percent over a time interval: "one
disk is running at 90% utilization".
Saturation:
Errors:
USE
Utilization:
Saturation: as a queue length. eg, "the CPUs have
an average run queue length of four".
Errors:
USE
utilization:
saturation:
errors: scalar counts. eg, "this network interface
drops packages".
USE
traditionaly more instance oriented
still useful in the microservices world
RED
Rate How busy is your service?
Error Errors
Duration What is the latency of my service?
.Tom Wilkie's guideline for instrumenting applications
RED
Rate - how many request per seconds handled
Error
Duration (distribution)
RED
Rate
Error - how many request per seconds handled we
failed
Duration
RED
Rate
Error
Duration - how long the requests took
RED
Follow Four Golden Signals by Google SREs [1]
Focus on what matters for end-users
[1] Latency, Traffic, Errors, Saturation ( )src
RED
Not recommended for:
batch-oriented
streaming services
PROMETHEUS
WHAT PROMETHEUS IS?
Aggregation of time-series data
Not an event-based system
PROMETHEUS STACK
Prometheus - collect
Alertmanager - alerts
Grafana - visualize
PROMETHEUS
Wide support for languages
Metrics collected over HTTP
Pull model (see scrape time), push-mode possible
integration with k8s
PromQL
metrics/
METRICS IN PLAIN TEXT
# HELP order_mgmt_audit_duration_seconds Multiprocess metric
# TYPE order_mgmt_audit_duration_seconds summary
order_mgmt_audit_duration_seconds_count{status_code="200"} 41.
order_mgmt_audit_duration_seconds_sum{status_code="200"} 27.44
order_mgmt_audit_duration_seconds_count{status_code="500"} 1.0
order_mgmt_audit_duration_seconds_sum{status_code="500"} 0.716
# HELP order_mgmt_duration_seconds Multiprocess metric
# TYPE order_mgmt_duration_seconds summary
order_mgmt_duration_seconds_count{method="GET",path="/complex"
order_mgmt_duration_seconds_sum{method="GET",path="/complex",s
order_mgmt_duration_seconds_count{method="GET",path="/",status
order_mgmt_duration_seconds_sum{method="GET",path="/",status_c
order_mgmt_duration_seconds_count{method="GET",path="/complex"
order_mgmt_duration_seconds_sum{method="GET",path="/complex",s
METRICS IN PLAIN TEXT
# HELP go_gc_duration_seconds A summary of the GC invocation d
# TYPE go_gc_duration_seconds summary
go_gc_duration_seconds{quantile="0"} 9.01e-05
go_gc_duration_seconds{quantile="0.25"} 0.000141101
go_gc_duration_seconds{quantile="0.5"} 0.000178902
go_gc_duration_seconds{quantile="0.75"} 0.000226903
go_gc_duration_seconds{quantile="1"} 0.006099658
go_gc_duration_seconds_sum 18.749046756
go_gc_duration_seconds_count 89273
EXPORTERS
Mongodb
Mysql
Postgresql
rabbitmq
...
also Blackbox exporter
examples: ,memcached psql
CLOUD-NATIVE PROJECTS INTEGRATION
API
BACKOFFICE 1
DATA
WEB
ADMIN
BACKOFFICE 2
BACKOFFICE 3
API.DOMAIN.COM
DOMAIN.COM/WEB
BACKOFFICE.DOMAIN.COM
ORCHESTRATOR
PRIVATE NETWORKINTERNET
API
LISTEN
(DOCKER, SWARM, MESOS...)
- --web.metrics.prometheus
PROMETHEUS PromQL
working with historams:
rates:
more complex:
histogram_quantile(0.9,
rate(http_req_duration_seconds_bucket[10m]
rate(http_requests_total{job="api-server"}[5
irate(http_requests_total{job="api-server"}
redict_linear()
holt_winters()
PROMETHEUS PromQL
Alarming:
ALERT ProductionAppServiceInstanceDown
IF up { environment = "production", app =~ ".+"} == 0
FOR 4m
ANNOTATIONS {
summary = "Instance of {{$labels.app}} is down",
description = " Instance {{$labels.instance}} of app
}
METRICS
Counter - just up
Gauge - up/down
Histogram
Summary
HISTOGRAM
traefik_duration_seconds_bucket
{method="GET,code="200"}
{le="0.1"} 2229
{le="0.3"} 107
{le="1.2"} 100
{le="5"} 4
{le="+Inf"} 2
_sum
_count 2342
SUMMARY
http_request_duration_seconds
{quantile="0.5"} 4
{quantile="0.9"} 5
http_request_duration_seconds_sum 9
http_request_duration_seconds_count 3
HISTOGRAM / SUMMARY:
Latency of services
Request or Request size
Histograms recommended
RED
Metric + PromQL:
sum(irate(order_mgmt_duration_seconds_count
{job=~".*"}[1m])) by (status_code)
METRIC AND LABEL NAMING
Best practises on :
service name is your prefix user_
state the bae unit _seconds and _bytes
metric names
PROMETHEUS + PYTHON
PYTHON CLIENT
client_python
Counter
Gauge
Summary
Histogram
DEMO: SIMPLE REST SERVICE
----------- ---------------
| App | ----->| Audit Service |
| OrderMgmt | | |
----------- ---------------
|
| ---------------
-------->| Database |
---------------
DEMO:
- service
- prometheus
- grafana
- alertmanager
http://127.0.0.1:8080
http://127.0.0.1:8080/metrics/
http://127.0.0.1:9090
http://127.0.0.1:3000
http://127.0.0.1:9093
DEMO
☁ src ⚡ make docker_run
☁ src ⚡ docker ps
CONTAINER ID IMAGE PORTS
5f824d1bc789 grafana/grafana:5.2.2 0.0.0.0:3000->3
d681a414a8b6 prom/prometheus:v2.1.0 0.0.0.0:9090->9
ea0d9233e159 prom/alertmanager:v0.15.1 0.0.0.0:9093->9
DEMO: GENERATE CALLS
With error injection
☁ src ⚡ make srv_wrk_random
How to monitor your micro-service with Prometheus?
GRAFANA
PROMETHEUS
PROMETHEUS
PROMETHEUS
KILL THE SERVICE
☁ src ⚡ docker stop pycode-prom-flask_order-manager_1
PROMETHEUS
PROMETHEUS
ALERTMANAGER
GRAFANA
GRAFANA
GITHUB
DEMO: PYTHON CODE
Metric Definition
Metric Collection
DEMO: SIMULATING CALLS
make docker_build
make docker_run
DEMO: SIMULATING CALLS
curl 127.0.0.1:8080/hello
curl 127.0.0.1:8080/world
curl 127.0.0.1:8080/complex
DEMO: SIMULATING CALLS
curl 127.0.0.1:8080/complex?is_srv_error=True
curl 127.0.0.1:8080/complex?is_db_error=True
curl 127.0.0.1:8080/complex?db_sleep=3&srv_sleep=2
# load generator
make srv_wrk_random
DEMO: PROM STACK
Prometheus dashboard and config
AlertManager dashboard and config
Simulate the successful and failed calls
Simple Queries for rate
PromQL
sum(irate(order_mgmt_duration_seconds_count{job=~".*"}[1m]))
by (status_code)
PromQL
order_mgmt_duration_seconds_sum{job=~".*"} or
order_mgmt_database_duration_seconds_sum{job=~".*"} or
order_mgmt_audit_duration_seconds_sum{job=~".*"}
BEST PRACTISES
Py: higher load requires muliprocessing
Start simple (up/down), later add more complex
rules
Sum over Summaries with Q leads to incorrect
results, see prom docs
SUMMARY
Monitoring saves your time
Checking logs Kibana vs Grafana is like debuging vs
having tests
Logging -> high TCO
SUMMARY
Testing
Testing in Production
Smoke tests / Acceptance Tests
Monitoring Simple
(up/down + KPI)
Monitoring
Explorations / Logs
THANK YOU
QUESTIONS?
ps. We are hiring.
How to monitor your micro-service with Prometheus?
BACKUP SLIDES
PROMETHUS - LABELS IN ALERT RULES
The labels are propageted to alert rules:
see ../src/prometheus/etc/alert.rules
ALERT ProductionAppServiceInstanceDown
IF up { environment = "production", app =~ ".+"} == 0
FOR 4m
ANNOTATIONS {
summary = "Instance of {{$labels.app}} is down",
description = " Instance {{$labels.instance}} of app
}
ALERTMANGER - LABELS IN ALERTMANGER
Call somebody if the label is severity=page:
see ../src/alertmanager/*.conf
---
group_by: [cluster]
# If an alert isn't caught by a route, send it to the pager.
receiver: team-pager
routes:
- match:
severity: page
receiver: team-pager
receivers:
- name: team-pager
opsgenie_configs:
- api_key: $API_KEY
teams: example_team
PROMETHEUS - PUSH MODEL
See:
Good for short living jobs in your cluster.
https://prometheus.io/docs/instrumenting/pushing/
DESIGNING METRIC NAMES
Which one is better?
request_duration{app=my_app}
my_app_request_duration
see documentation on best practises for andmetric naming instrumentation
DESIGNING METRIC NAMES
Which one is better?
order_mgmt_db_duration_seconds_sum
order_mgmt_duration_seconds_sum{dep_name='db
PROMETHEUS + K8S = <3
LABELS ARE PROPAGATED FROM K8S TO
PROMETHEUS
INTEGRATION WITH PROMETHEUS
cat memcached-0-service.yaml
https://github.com/skarab7/kubernetes-memcached
---
apiVersion: v1
kind: Service
metadata:
name: memcached-0
labels:
app: memcached
kubernetes.io/name: "memcached"
role: shard-0
annotations:
prometheus.io/scrape: "true"
prometheus.io/scheme: "http"
prometheus.io/path: "metrics"
prometheus.io/port: "9150"
spec:

More Related Content

PPTX
Prometheus and Grafana
PDF
Monitoring with prometheus
PDF
Infrastructure & System Monitoring using Prometheus
PPT
Monitoring using Prometheus and Grafana
PDF
Monitoring Kubernetes with Prometheus
ODP
Monitoring With Prometheus
PPTX
Prometheus - Intro, CNCF, TSDB,PromQL,Grafana
PDF
PromQL Deep Dive - The Prometheus Query Language
Prometheus and Grafana
Monitoring with prometheus
Infrastructure & System Monitoring using Prometheus
Monitoring using Prometheus and Grafana
Monitoring Kubernetes with Prometheus
Monitoring With Prometheus
Prometheus - Intro, CNCF, TSDB,PromQL,Grafana
PromQL Deep Dive - The Prometheus Query Language

What's hot (20)

PDF
End to-end monitoring with the prometheus operator - Max Inden
PDF
Shift left Observability
PDF
Monitoring with Prometheus
PPTX
Monitoring With Prometheus
PDF
Getting Started Monitoring with Prometheus and Grafana
PDF
Server monitoring using grafana and prometheus
PDF
Grafana Loki: like Prometheus, but for Logs
PDF
Cloud Monitoring with Prometheus
PDF
Prometheus monitoring
PDF
VictoriaLogs: Open Source Log Management System - Preview
PDF
Combining logs, metrics, and traces for unified observability
PDF
Prometheus + Grafana = Awesome Monitoring
PPTX
Introduction to Docker - 2017
PDF
Prometheus
PDF
Prometheus Overview
PPTX
Prometheus for Monitoring Metrics (Fermilab 2018)
PPTX
Container orchestration overview
PDF
Observability
ODP
Elastic Stack ELK, Beats, and Cloud
PDF
Apache kafka 모니터링을 위한 Metrics 이해 및 최적화 방안
End to-end monitoring with the prometheus operator - Max Inden
Shift left Observability
Monitoring with Prometheus
Monitoring With Prometheus
Getting Started Monitoring with Prometheus and Grafana
Server monitoring using grafana and prometheus
Grafana Loki: like Prometheus, but for Logs
Cloud Monitoring with Prometheus
Prometheus monitoring
VictoriaLogs: Open Source Log Management System - Preview
Combining logs, metrics, and traces for unified observability
Prometheus + Grafana = Awesome Monitoring
Introduction to Docker - 2017
Prometheus
Prometheus Overview
Prometheus for Monitoring Metrics (Fermilab 2018)
Container orchestration overview
Observability
Elastic Stack ELK, Beats, and Cloud
Apache kafka 모니터링을 위한 Metrics 이해 및 최적화 방안
Ad

Similar to How to monitor your micro-service with Prometheus? (20)

PDF
Monitor your Java application with Prometheus Stack
PPTX
How to Improve the Observability of Apache Cassandra and Kafka applications...
PDF
ApacheCon2019 Talk: Improving the Observability of Cassandra, Kafka and Kuber...
PDF
Monitoring a Kubernetes-backed microservice architecture with Prometheus
PDF
DevOps Braga #15: Agentless monitoring with icinga and prometheus
PDF
Monitoring Kubernetes with Prometheus (Kubernetes Ireland, 2016)
PDF
Microservices and Prometheus (Microservices NYC 2016)
PPTX
Monitoring Weave Cloud with Prometheus
PPTX
Prometheus for Monitoring Metrics (Percona Live Europe 2017)
PDF
Prometheus and Docker (Docker Galway, November 2015)
PDF
Monitoring Clojure Applications with Prometheus
PDF
The hitchhiker’s guide to Prometheus
PDF
The hitchhiker’s guide to Prometheus
PDF
Prometheus - basics
PDF
Monitoring Cloud Native Applications with Prometheus
PDF
Regain Control Thanks To Prometheus
PPTX
Code instrumentation in Py with Prometheus and Grafana
PDF
Monitoring und Metriken im Wunderland
 
PDF
Observability beyond logging for Java Microservices
PDF
Your data is in Prometheus, now what? (CurrencyFair Engineering Meetup, 2016)
Monitor your Java application with Prometheus Stack
How to Improve the Observability of Apache Cassandra and Kafka applications...
ApacheCon2019 Talk: Improving the Observability of Cassandra, Kafka and Kuber...
Monitoring a Kubernetes-backed microservice architecture with Prometheus
DevOps Braga #15: Agentless monitoring with icinga and prometheus
Monitoring Kubernetes with Prometheus (Kubernetes Ireland, 2016)
Microservices and Prometheus (Microservices NYC 2016)
Monitoring Weave Cloud with Prometheus
Prometheus for Monitoring Metrics (Percona Live Europe 2017)
Prometheus and Docker (Docker Galway, November 2015)
Monitoring Clojure Applications with Prometheus
The hitchhiker’s guide to Prometheus
The hitchhiker’s guide to Prometheus
Prometheus - basics
Monitoring Cloud Native Applications with Prometheus
Regain Control Thanks To Prometheus
Code instrumentation in Py with Prometheus and Grafana
Monitoring und Metriken im Wunderland
 
Observability beyond logging for Java Microservices
Your data is in Prometheus, now what? (CurrencyFair Engineering Meetup, 2016)
Ad

More from Wojciech Barczyński (11)

PPTX
DevOps - what I have learnt so far
PDF
Effective Building your Platform with Kubernetes == Keep it Simple
PDF
Zero downtime deployment of micro-services with Kubernetes
PDF
Effective Kubernetes == Keep it Simple [Ignite Talk DevOpsDays Warsaw]
PDF
Effective Platform Building with Kubernetes. Is K8S new Linux?
PDF
Zero-downtime deployment of Micro-services with Kubernetes
PDF
Golang Warsaw #19 (early autumn) Intro Slides
PDF
Wprowadzenie do Kubernetesa. K8S jako nowy Linux.
PDF
Effective Kubernetes - Is Kubernetes the new Linux? Is the new Application Se...
PDF
Azure Kubernetes Service - benefits and challenges
PDF
SMACC - Automatic Bookkeeping with AI
DevOps - what I have learnt so far
Effective Building your Platform with Kubernetes == Keep it Simple
Zero downtime deployment of micro-services with Kubernetes
Effective Kubernetes == Keep it Simple [Ignite Talk DevOpsDays Warsaw]
Effective Platform Building with Kubernetes. Is K8S new Linux?
Zero-downtime deployment of Micro-services with Kubernetes
Golang Warsaw #19 (early autumn) Intro Slides
Wprowadzenie do Kubernetesa. K8S jako nowy Linux.
Effective Kubernetes - Is Kubernetes the new Linux? Is the new Application Se...
Azure Kubernetes Service - benefits and challenges
SMACC - Automatic Bookkeeping with AI

Recently uploaded (20)

PPTX
Tartificialntelligence_presentation.pptx
PDF
Reach Out and Touch Someone: Haptics and Empathic Computing
PDF
A comparative analysis of optical character recognition models for extracting...
PPTX
SOPHOS-XG Firewall Administrator PPT.pptx
PPTX
Programs and apps: productivity, graphics, security and other tools
PDF
Mobile App Security Testing_ A Comprehensive Guide.pdf
PDF
Agricultural_Statistics_at_a_Glance_2022_0.pdf
PDF
Advanced methodologies resolving dimensionality complications for autism neur...
PDF
Build a system with the filesystem maintained by OSTree @ COSCUP 2025
PPTX
cloud_computing_Infrastucture_as_cloud_p
PDF
Accuracy of neural networks in brain wave diagnosis of schizophrenia
PPT
Teaching material agriculture food technology
PDF
MIND Revenue Release Quarter 2 2025 Press Release
PPTX
Group 1 Presentation -Planning and Decision Making .pptx
PDF
Diabetes mellitus diagnosis method based random forest with bat algorithm
PDF
gpt5_lecture_notes_comprehensive_20250812015547.pdf
PDF
A comparative study of natural language inference in Swahili using monolingua...
PPTX
Spectroscopy.pptx food analysis technology
PDF
Profit Center Accounting in SAP S/4HANA, S4F28 Col11
PPTX
A Presentation on Artificial Intelligence
Tartificialntelligence_presentation.pptx
Reach Out and Touch Someone: Haptics and Empathic Computing
A comparative analysis of optical character recognition models for extracting...
SOPHOS-XG Firewall Administrator PPT.pptx
Programs and apps: productivity, graphics, security and other tools
Mobile App Security Testing_ A Comprehensive Guide.pdf
Agricultural_Statistics_at_a_Glance_2022_0.pdf
Advanced methodologies resolving dimensionality complications for autism neur...
Build a system with the filesystem maintained by OSTree @ COSCUP 2025
cloud_computing_Infrastucture_as_cloud_p
Accuracy of neural networks in brain wave diagnosis of schizophrenia
Teaching material agriculture food technology
MIND Revenue Release Quarter 2 2025 Press Release
Group 1 Presentation -Planning and Decision Making .pptx
Diabetes mellitus diagnosis method based random forest with bat algorithm
gpt5_lecture_notes_comprehensive_20250812015547.pdf
A comparative study of natural language inference in Swahili using monolingua...
Spectroscopy.pptx food analysis technology
Profit Center Accounting in SAP S/4HANA, S4F28 Col11
A Presentation on Artificial Intelligence

How to monitor your micro-service with Prometheus?