SlideShare a Scribd company logo
6
Most read
9
Most read
11
Most read
Ticketmaster confidential. Do not distribute.
Ticketmaster confidential. Do not distribute.
Prometheus +
Thanos
Thanos long term storage for Prometheus
Ticketmaster confidential. Do not distribute.
About me...
• Rocking it at Ticketmaster
• Student in Master of Business Administration (MBA) Strategic project management
• Bachelor of Applied Science (B.A.Sc.) in Computer Science
• Devops, Cloud, Kubernetes, Kafka, etc.
• Sport lover (Hockey, Dek Hockey, Ultimate Frisbee, Football…)
2
SECTION
Ticketmaster confidential. Do not distribute.
Where Prometheus & Thanos are in the CNCF landscape...
3
Prometheus
Ticketmaster confidential. Do not distribute.
Insert photography and crop to size of grey box.
(24.25 in. x 7.21 in.)
What is Prometheus?
4
Prometheus
Prometheus was the second project to be graduated of the Cloud Native Computing Foundation.
Ticketmaster confidential. Do not distribute.
Prometheus Architecture
Ticketmaster confidential. Do not distribute.
How we are using Prometheus at Ticketmaster?
• We are using the Prometheus-Operator made by CoreOS (RedHat (IBM))
• We have created a Helm chart that created common pulling jobs, settings and exporters
• Scrape EC2 instances based on Tags, Kubernetes
• Exporters like cloudwatch-metrics, kafka, blackbox
• Ingresses
• Thanos
• Federations
6
Prometheus
Ticketmaster confidential. Do not distribute.
Why moving away from the federation?
• Calculate the right ingestion rate of our scrape job is not really easy
• When the disk is full
• Prometheus stop working
• We needed to delete the PVC to recreated it bigger.
• Long term storage is costly SSD == $$$$
• Single point of failure
• Availability
• Operator error
• Hardware failure
• Rollout
7
Prometheus
Ticketmaster confidential. Do not distribute.
Solution
Ticketmaster confidential. Do not distribute.
Goals
9
Thanos
- Easy Deployment model
- Minimal number of dependencies
- Minimal baseline cost
Have a global view Seamless
integration with
Prometheus
Increase retentionHave a HA in place
Ticketmaster confidential. Do not distribute.
Global view
10
Thanos
Ticketmaster confidential. Do not distribute.
Global view + HA
11
Thanos
Ticketmaster confidential. Do not distribute.
Increase retention (Persist data)
12
Thanos
Ticketmaster confidential. Do not distribute.
Increase retention (Querying)
• A series is made up of one or more “chunks”
• A chunk contains ~120 samples each
• Chunks can be retrieved through HTTP byte range queries
Example:
• 1000 series @ 30s scrape interval
• Query 1 year
• 8.7 million chunks/range queries
• Chunks of the same series are aligned
• Similar series are aligned due to same metric name
This reduce request count by 4=6 orders of magnitude.
8.7 million requests turned into O(20) requests
•
13
Thanos
Ticketmaster confidential. Do not distribute.
Increase retention (Querying)
14
Thanos
group chunk same series group by same metric name
Ticketmaster confidential. Do not distribute.
Compaction
15
Thanos
Ticketmaster confidential. Do not distribute.
Full Architecture
16
Thanos
Ticketmaster confidential. Do not distribute.
Deployment Model( Example)
• Federation through Store API
17
Thanos
Ticketmaster confidential. Do not distribute.
Cost
• Store + Query node + Compaction ~ Savings on Prometheus side (+/- 0)
• Fewer SSD space on Prometheus side (Savings)
• Basically we are only paying for your data stored in S3/GCS/etc + requests
18
Thanos
Ticketmaster confidential. Do not distribute.
Cortex
19
Alternatives
Ticketmaster confidential. Do not distribute.
Cortex - Reason that we didn’t choose Cortex
• Documentation is not really good
• Need to maintain another dataset (NoSQL DB)
• Interfere with the datapath
20
Alternatives
Ticketmaster confidential. Do not distribute.
Question?
Ticketmaster confidential. Do not distribute.

More Related Content

PDF
Thanos: Global, durable Prometheus monitoring
PDF
Thanos - Prometheus on Scale
PPTX
Prometheus in Practice: High Availability with Thanos (DevOpsDays Edinburgh 2...
PPTX
Scaling Prometheus on Kubernetes with Thanos
PDF
Kubernetes Observability with Prometheus by Example
PDF
Prometheus Overview
PDF
Linking Metrics to Logs using Loki
PDF
Designing a complete ci cd pipeline using argo events, workflow and cd products
Thanos: Global, durable Prometheus monitoring
Thanos - Prometheus on Scale
Prometheus in Practice: High Availability with Thanos (DevOpsDays Edinburgh 2...
Scaling Prometheus on Kubernetes with Thanos
Kubernetes Observability with Prometheus by Example
Prometheus Overview
Linking Metrics to Logs using Loki
Designing a complete ci cd pipeline using argo events, workflow and cd products

What's hot (20)

PDF
Better Monitoring for Python: Inclusive Monitoring with Prometheus (Pycon Ire...
PPTX
Zuul @ Netflix SpringOne Platform
PPTX
Grafana Mimir and VictoriaMetrics_ Performance Tests.pptx
PDF
Prometheus - basics
PPTX
Building an Event Streaming Architecture with Apache Pulsar
PPTX
RedisConf17- Using Redis at scale @ Twitter
PDF
Systems Monitoring with Prometheus (Devops Ireland April 2015)
PDF
Prometheus and Docker (Docker Galway, November 2015)
PDF
Monitoring Kubernetes with Prometheus
PDF
Facebook Messages & HBase
PDF
Monitoring microservices with Prometheus
PDF
Prometheus
PDF
Apache Iceberg - A Table Format for Hige Analytic Datasets
PDF
How to monitor your micro-service with Prometheus?
PPT
Monitoring using Prometheus and Grafana
PPTX
Flume vs. kafka
PDF
Stateful, Stateless and Serverless - Running Apache Kafka® on Kubernetes
PPTX
MeetUp Monitoring with Prometheus and Grafana (September 2018)
PPTX
Prometheus (Prometheus London, 2016)
PPTX
Building Data Pipelines with Spark and StreamSets
Better Monitoring for Python: Inclusive Monitoring with Prometheus (Pycon Ire...
Zuul @ Netflix SpringOne Platform
Grafana Mimir and VictoriaMetrics_ Performance Tests.pptx
Prometheus - basics
Building an Event Streaming Architecture with Apache Pulsar
RedisConf17- Using Redis at scale @ Twitter
Systems Monitoring with Prometheus (Devops Ireland April 2015)
Prometheus and Docker (Docker Galway, November 2015)
Monitoring Kubernetes with Prometheus
Facebook Messages & HBase
Monitoring microservices with Prometheus
Prometheus
Apache Iceberg - A Table Format for Hige Analytic Datasets
How to monitor your micro-service with Prometheus?
Monitoring using Prometheus and Grafana
Flume vs. kafka
Stateful, Stateless and Serverless - Running Apache Kafka® on Kubernetes
MeetUp Monitoring with Prometheus and Grafana (September 2018)
Prometheus (Prometheus London, 2016)
Building Data Pipelines with Spark and StreamSets
Ad

Similar to Prometheus and Thanos (20)

PDF
Presto at Tivo, Boston Hadoop Meetup
PPTX
Implementing a canonical IoT backend in Azure with Azure Stream Analytics
PPTX
Apache Tajo: Query Optimization Techniques and JIT-based Vectorized Engine
PPTX
Hadoop Summit 2014: Query Optimization and JIT-based Vectorized Execution in ...
PDF
Big data serving: Processing and inference at scale in real time
PDF
Token Design as Optimization Design
PDF
Practical Data Science Workshop - Recommendation Systems - Collaborative Filt...
PDF
Monitoring your Python with Prometheus (Python Ireland April 2015)
PPTX
Starboard Solutions - 3PL
PDF
A Production Quality Sketching Library for the Analysis of Big Data
PPTX
AWS Roadshow Herbst 2013: Datenanalyse und Business Intelligence
PDF
Introduction to Data streaming - 05/12/2014
PDF
SnappyData at Spark Summit 2017
PPTX
SnappyData, the Spark Database. A unified cluster for streaming, transactions...
PPTX
Break out of The Box - Part 2
PDF
Jan 2012 HUG: Storm
PDF
Your Timestamps Deserve Better than a Generic Database
PDF
Data Platform at Twitter: Enabling Real-time & Batch Analytics at Scale
PPTX
HBase Data Modeling and Access Patterns with Kite SDK
PDF
Distributed Multi-device Execution of TensorFlow – an Outlook
Presto at Tivo, Boston Hadoop Meetup
Implementing a canonical IoT backend in Azure with Azure Stream Analytics
Apache Tajo: Query Optimization Techniques and JIT-based Vectorized Engine
Hadoop Summit 2014: Query Optimization and JIT-based Vectorized Execution in ...
Big data serving: Processing and inference at scale in real time
Token Design as Optimization Design
Practical Data Science Workshop - Recommendation Systems - Collaborative Filt...
Monitoring your Python with Prometheus (Python Ireland April 2015)
Starboard Solutions - 3PL
A Production Quality Sketching Library for the Analysis of Big Data
AWS Roadshow Herbst 2013: Datenanalyse und Business Intelligence
Introduction to Data streaming - 05/12/2014
SnappyData at Spark Summit 2017
SnappyData, the Spark Database. A unified cluster for streaming, transactions...
Break out of The Box - Part 2
Jan 2012 HUG: Storm
Your Timestamps Deserve Better than a Generic Database
Data Platform at Twitter: Enabling Real-time & Batch Analytics at Scale
HBase Data Modeling and Access Patterns with Kite SDK
Distributed Multi-device Execution of TensorFlow – an Outlook
Ad

More from CloudOps2005 (20)

PDF
Defense in Depth: Securing your new Kubernetes cluster from the challenges th...
PPTX
Human No, Machine Yes: Welcome to the CDF with Incremental Confidence
PDF
The Salmon Algorithm Spawning with Kubernetes
PDF
Own your Destiny in the Cloud - Ian Rae - Cloud Native Day Montreal 2019
PDF
Plateformes et infrastructure infonuagique natif de ville de Montréall
PPTX
Using Rook to Manage Kubernetes Storage with Ceph
PDF
Kafka on Kubernetes
PDF
Kubernetes: Crossing the Chasm
PPTX
Distributed Logging with Kubernetes
PDF
Kubernetes Security with Calico and Open Policy Agent
PDF
Advanced Deployment Strategies with Kubernetes and Istio
PDF
GitOps with ArgoCD
PPTX
Kubernetes Services are sooo Yesterday!
PPTX
Amazon EKS: the good, the bad, and the ugly
PDF
Kubernetes, Terraform, Vault, and Consul
PDF
SIG Multicluster and the Path to Federation
PDF
To Russia with Love: Deploying Kubernetes in Exotic Locations On Prem
PDF
Operator SDK for K8s using Go
PPTX
How to Handle your Kubernetes Upgrades
PDF
Kubernetes and Cloud Native Meetup - March, 2019
Defense in Depth: Securing your new Kubernetes cluster from the challenges th...
Human No, Machine Yes: Welcome to the CDF with Incremental Confidence
The Salmon Algorithm Spawning with Kubernetes
Own your Destiny in the Cloud - Ian Rae - Cloud Native Day Montreal 2019
Plateformes et infrastructure infonuagique natif de ville de Montréall
Using Rook to Manage Kubernetes Storage with Ceph
Kafka on Kubernetes
Kubernetes: Crossing the Chasm
Distributed Logging with Kubernetes
Kubernetes Security with Calico and Open Policy Agent
Advanced Deployment Strategies with Kubernetes and Istio
GitOps with ArgoCD
Kubernetes Services are sooo Yesterday!
Amazon EKS: the good, the bad, and the ugly
Kubernetes, Terraform, Vault, and Consul
SIG Multicluster and the Path to Federation
To Russia with Love: Deploying Kubernetes in Exotic Locations On Prem
Operator SDK for K8s using Go
How to Handle your Kubernetes Upgrades
Kubernetes and Cloud Native Meetup - March, 2019

Recently uploaded (20)

PPTX
SOPHOS-XG Firewall Administrator PPT.pptx
PPTX
TLE Review Electricity (Electricity).pptx
PPTX
Machine Learning_overview_presentation.pptx
PDF
Getting Started with Data Integration: FME Form 101
PDF
A comparative analysis of optical character recognition models for extracting...
PPTX
KOM of Painting work and Equipment Insulation REV00 update 25-dec.pptx
PDF
Profit Center Accounting in SAP S/4HANA, S4F28 Col11
PPTX
Programs and apps: productivity, graphics, security and other tools
PPTX
OMC Textile Division Presentation 2021.pptx
PDF
Unlocking AI with Model Context Protocol (MCP)
PDF
Building Integrated photovoltaic BIPV_UPV.pdf
PPTX
Digital-Transformation-Roadmap-for-Companies.pptx
PDF
Architecting across the Boundaries of two Complex Domains - Healthcare & Tech...
PDF
Approach and Philosophy of On baking technology
PDF
Accuracy of neural networks in brain wave diagnosis of schizophrenia
PDF
7 ChatGPT Prompts to Help You Define Your Ideal Customer Profile.pdf
PPTX
1. Introduction to Computer Programming.pptx
PDF
Spectral efficient network and resource selection model in 5G networks
PDF
Assigned Numbers - 2025 - Bluetooth® Document
PPTX
A Presentation on Artificial Intelligence
SOPHOS-XG Firewall Administrator PPT.pptx
TLE Review Electricity (Electricity).pptx
Machine Learning_overview_presentation.pptx
Getting Started with Data Integration: FME Form 101
A comparative analysis of optical character recognition models for extracting...
KOM of Painting work and Equipment Insulation REV00 update 25-dec.pptx
Profit Center Accounting in SAP S/4HANA, S4F28 Col11
Programs and apps: productivity, graphics, security and other tools
OMC Textile Division Presentation 2021.pptx
Unlocking AI with Model Context Protocol (MCP)
Building Integrated photovoltaic BIPV_UPV.pdf
Digital-Transformation-Roadmap-for-Companies.pptx
Architecting across the Boundaries of two Complex Domains - Healthcare & Tech...
Approach and Philosophy of On baking technology
Accuracy of neural networks in brain wave diagnosis of schizophrenia
7 ChatGPT Prompts to Help You Define Your Ideal Customer Profile.pdf
1. Introduction to Computer Programming.pptx
Spectral efficient network and resource selection model in 5G networks
Assigned Numbers - 2025 - Bluetooth® Document
A Presentation on Artificial Intelligence

Prometheus and Thanos

  • 1. Ticketmaster confidential. Do not distribute. Ticketmaster confidential. Do not distribute. Prometheus + Thanos Thanos long term storage for Prometheus
  • 2. Ticketmaster confidential. Do not distribute. About me... • Rocking it at Ticketmaster • Student in Master of Business Administration (MBA) Strategic project management • Bachelor of Applied Science (B.A.Sc.) in Computer Science • Devops, Cloud, Kubernetes, Kafka, etc. • Sport lover (Hockey, Dek Hockey, Ultimate Frisbee, Football…) 2 SECTION
  • 3. Ticketmaster confidential. Do not distribute. Where Prometheus & Thanos are in the CNCF landscape... 3 Prometheus
  • 4. Ticketmaster confidential. Do not distribute. Insert photography and crop to size of grey box. (24.25 in. x 7.21 in.) What is Prometheus? 4 Prometheus Prometheus was the second project to be graduated of the Cloud Native Computing Foundation.
  • 5. Ticketmaster confidential. Do not distribute. Prometheus Architecture
  • 6. Ticketmaster confidential. Do not distribute. How we are using Prometheus at Ticketmaster? • We are using the Prometheus-Operator made by CoreOS (RedHat (IBM)) • We have created a Helm chart that created common pulling jobs, settings and exporters • Scrape EC2 instances based on Tags, Kubernetes • Exporters like cloudwatch-metrics, kafka, blackbox • Ingresses • Thanos • Federations 6 Prometheus
  • 7. Ticketmaster confidential. Do not distribute. Why moving away from the federation? • Calculate the right ingestion rate of our scrape job is not really easy • When the disk is full • Prometheus stop working • We needed to delete the PVC to recreated it bigger. • Long term storage is costly SSD == $$$$ • Single point of failure • Availability • Operator error • Hardware failure • Rollout 7 Prometheus
  • 8. Ticketmaster confidential. Do not distribute. Solution
  • 9. Ticketmaster confidential. Do not distribute. Goals 9 Thanos - Easy Deployment model - Minimal number of dependencies - Minimal baseline cost Have a global view Seamless integration with Prometheus Increase retentionHave a HA in place
  • 10. Ticketmaster confidential. Do not distribute. Global view 10 Thanos
  • 11. Ticketmaster confidential. Do not distribute. Global view + HA 11 Thanos
  • 12. Ticketmaster confidential. Do not distribute. Increase retention (Persist data) 12 Thanos
  • 13. Ticketmaster confidential. Do not distribute. Increase retention (Querying) • A series is made up of one or more “chunks” • A chunk contains ~120 samples each • Chunks can be retrieved through HTTP byte range queries Example: • 1000 series @ 30s scrape interval • Query 1 year • 8.7 million chunks/range queries • Chunks of the same series are aligned • Similar series are aligned due to same metric name This reduce request count by 4=6 orders of magnitude. 8.7 million requests turned into O(20) requests • 13 Thanos
  • 14. Ticketmaster confidential. Do not distribute. Increase retention (Querying) 14 Thanos group chunk same series group by same metric name
  • 15. Ticketmaster confidential. Do not distribute. Compaction 15 Thanos
  • 16. Ticketmaster confidential. Do not distribute. Full Architecture 16 Thanos
  • 17. Ticketmaster confidential. Do not distribute. Deployment Model( Example) • Federation through Store API 17 Thanos
  • 18. Ticketmaster confidential. Do not distribute. Cost • Store + Query node + Compaction ~ Savings on Prometheus side (+/- 0) • Fewer SSD space on Prometheus side (Savings) • Basically we are only paying for your data stored in S3/GCS/etc + requests 18 Thanos
  • 19. Ticketmaster confidential. Do not distribute. Cortex 19 Alternatives
  • 20. Ticketmaster confidential. Do not distribute. Cortex - Reason that we didn’t choose Cortex • Documentation is not really good • Need to maintain another dataset (NoSQL DB) • Interfere with the datapath 20 Alternatives
  • 21. Ticketmaster confidential. Do not distribute. Question?
  • 22. Ticketmaster confidential. Do not distribute.