SlideShare a Scribd company logo
Ben Sigelman (@el_bhs, bhs@lightstep.com)
Co-founder & CEO: LightStep
Co-creator: OpenTracing, OpenTelemetry, Google Dapper, Google Monarch
Architectures that Scale Deep:
Regaining Control in Deep Systems
QCon SF, November 2019
InfoQ.com: News & Community Site
• Over 1,000,000 software developers, architects and CTOs read the site world-
wide every month
• 250,000 senior developers subscribe to our weekly newsletter
• Published in 4 languages (English, Chinese, Japanese and Brazilian
Portuguese)
• Post content from our QCon conferences
• 2 dedicated podcast channels: The InfoQ Podcast, with a focus on
Architecture and The Engineering Culture Podcast, with a focus on building
• 96 deep dives on innovative topics packed as downloadable emags and
minibooks
• Over 40 new content items per week
Watch the video with slide
synchronization on InfoQ.com!
https://www.infoq.com/presentations/
properties-deep-systems/
Purpose of QCon
- to empower software development by facilitating the spread of
knowledge and innovation
Strategy
- practitioner-driven conference designed for YOU: influencers of
change and innovation in your teams
- speakers and topics driving the evolution and innovation
- connecting and catalyzing the influencers and innovators
Highlights
- attended by more than 12,000 delegates since 2007
- held in 9 cities worldwide
Presented at QCon San Francisco
www.qconsf.com
Part I
Scaling, and Deep Systems
What is scale, anyway?
Scaling wide
Scaling wide
Scaling wide
Scaling wide
Scaling wide
Scaling deep
Scaling deep
Scaling deep
Scaling deep
Scaling deep
How does this look
for software?
Software: Scaling wide
Software: Scaling deep
How do real-world
systems look?
Microservices at scale aren’t
just wide systems, they’re
deep systems
Deep Systems
Architectures with ≥ 4 layers of
independently operated services
(including external/cloud dependencies)
Deep Systems
Architectures with ≥ 4 layers of
independently operated services
(including external/cloud dependencies)
What do deep systems sound like?
“Don’t deploy on Fridays”
What do deep systems sound like?
“Where’s Chris?! I’m dealing with
a P0 and they’re the only one
who knows how to debug this.”
What do deep systems sound like?
“It can’t be our fault, our
dashboard says we’re healthy”
What do deep systems sound like?
“Kafka is on fire”
What do deep systems sound like?
“I need 100% availability
from your team.
One hundred percent.”
What do deep systems sound like?
“I didn’t know I depended on
that region”
What do deep systems sound like?
“That was on a dashboard but I
can’t find it”
What do deep systems sound like?
Lots of challenges:
- People-management
- Security
- Multi-tenancy
- “Big-customer” success
- Performance
- Observability
What do deep systems sound like?
Part II
Control Theory: TL;DR Edition
Why do we care so much
about observability, anyway?
Architectures That Scale Deep - Regaining Control in Deep Systems
A System
Inputs Outputs
… and its state vector,
Inputs
A System
Outputs
… and its state vector,
Observability
How well can you infer
internal state using only
the outputs?
Outputs
A System
Inputs
… and its state vector,
Controllability
How well can you
control internal state
using only the inputs?
Controllability
is the dual of
Observability
Controllability
is the dual of
Observability
Part III
What Deep Systems
Mean for Observability
# of services
developersperservice
Architectural
evolution
Deep
Systems
Pure Monoliths
Stress (n): responsibility without controlStress
what you can control
what you are
responsible for
Architectures That Scale Deep - Regaining Control in Deep Systems
Observability:
Shrink This Gap
Mental models
A System
Managing Deep Systems
Services must have SLOs
(“Service Level Objectives”: latency, errors, etc)
For effective service management, only
three things matter:
0. Releasing service functionality
1. Gradually improving SLOs
2. Rapidly restoring SLOs
In a deep system, we must control the
entire “triangle” to maintain our SLOs
Controllability == ObservabilityControllability == Observability
There’s that word again…
Observability: “The Conventional Wisdom”
Observing microservices is hard
Google and Facebook solved this (right???)
They used Metrics, Logging, and Distributed Tracing…
… So we should, too.
3 Pillars, 3 Experiences
Metrics
Logs
Traces
Architectures That Scale Deep - Regaining Control in Deep Systems
Three Pillars?Three Pillars? Two giant pipes…
Logs
Metrics
Without Traces:
Cognitive Load ≈ O(depth2
)
Three Pillars?Three Pillars? Two giant pipes…
Logs
Metrics
Architectures That Scale Deep - Regaining Control in Deep Systems
Two giant pipes…
Logs
Metrics
Without Traces:
Cognitive Load ≈ O(depth2
)
Architectures That Scale Deep - Regaining Control in Deep Systems
Traces
Traces provide Context
Traces provide Context
And context rules out
invalid hypotheses
Two giant pipes and a filter
Logs
Metrics
Context
(from traces)
Context
(from traces)
Context reduces cognitive load
With Traces:
Cognitive Load ≈ O(depth)
Relevant Metrics
Relevant Logs
Observability:
Shrink This Gap
Architectures That Scale Deep - Regaining Control in Deep Systems
Let’s Review
Microservices don’t just scale wide,
they scale deep
Recognize deep systems
Stress (n): responsibility without controlStress
what you can control
what you are
responsible for
“Controllability” (of SLOs)
depends on observability
… and traces are not sprinkles
“The Three Pillars of Observability”
is a lousy metaphor
Tracing can reduce
cognitive load from
O(depth2
) to O(depth)
Tracing is the backbone of
simple observability
in deep systems
Thank You
Feedback always
welcome:
twitter → @el_bhs
the emails → bhs@lightstep.com
Play with LightStep,
for free, anytime:
(no email address required!)
lightstep.com/play
Watch the video with slide
synchronization on InfoQ.com!
https://www.infoq.com/presentations/
properties-deep-systems/

More Related Content

PDF
Why Distributed Tracing is Essential for Performance and Reliability
PDF
WJAX 2019 - Taking Distributed Tracing to the next level
PDF
Why Distributed Tracing is Essential for Performance and Reliability
PPTX
Solving the Hidden Costs of Kubernetes with Observability
PDF
Everything You wanted to Know About Distributed Tracing
PDF
Testing in a distributed world
PDF
Opentracing 101
PDF
Go Observability (in practice)
Why Distributed Tracing is Essential for Performance and Reliability
WJAX 2019 - Taking Distributed Tracing to the next level
Why Distributed Tracing is Essential for Performance and Reliability
Solving the Hidden Costs of Kubernetes with Observability
Everything You wanted to Know About Distributed Tracing
Testing in a distributed world
Opentracing 101
Go Observability (in practice)

What's hot (20)

PDF
Juraci Paixão Kröhling - All you need to know about OpenTelemetry
PPTX
Distributed tracing 101
PDF
Tracing Micro Services with OpenTracing
PDF
Distributed tracing using open tracing & jaeger 2
PPTX
OpenTelemetry For Operators
PDF
Software cracking and patching
PDF
Distributed tracing with OpenTracing and Jaeger @ getstream.io
PDF
2017 Microservices Practitioner Virtual Summit: Ancestry's Journey towards Mi...
PPTX
THE STATE OF OPENTELEMETRY, DOTAN HOROVITS, Logz.io
PPTX
PhD SDN Projects
PDF
2017 Microservices Practitioner Virtual Summit - Opening Keynote: Trends in M...
PDF
Opentracing jaeger
PPT
Distributed Tracing Velocity2016
PDF
Distributed tracing - get a grasp on your production
PDF
Monitoring to the Nth tier: The state of distributed tracing in 2016
ODP
Nagios Conference 2013 - Nick Scott - Nagios Network Analyzer
PDF
Api design best practice
PPTX
Monitoring Apache Kafka
PDF
CQRS and Event Sourcing: A DevOps perspective
PPTX
Where are yours vertexes and what are they talking about?
Juraci Paixão Kröhling - All you need to know about OpenTelemetry
Distributed tracing 101
Tracing Micro Services with OpenTracing
Distributed tracing using open tracing & jaeger 2
OpenTelemetry For Operators
Software cracking and patching
Distributed tracing with OpenTracing and Jaeger @ getstream.io
2017 Microservices Practitioner Virtual Summit: Ancestry's Journey towards Mi...
THE STATE OF OPENTELEMETRY, DOTAN HOROVITS, Logz.io
PhD SDN Projects
2017 Microservices Practitioner Virtual Summit - Opening Keynote: Trends in M...
Opentracing jaeger
Distributed Tracing Velocity2016
Distributed tracing - get a grasp on your production
Monitoring to the Nth tier: The state of distributed tracing in 2016
Nagios Conference 2013 - Nick Scott - Nagios Network Analyzer
Api design best practice
Monitoring Apache Kafka
CQRS and Event Sourcing: A DevOps perspective
Where are yours vertexes and what are they talking about?
Ad

Similar to Architectures That Scale Deep - Regaining Control in Deep Systems (20)

PDF
Containing your microservice sprawl
PDF
SRE Topics with Charity Majors and Liz Fong-Jones of Honeycomb
PDF
The Present and Future of Serverless Observability
PDF
Consul: Service-oriented at Scale
PPTX
The biggest DevOps problems you didn't know you had and what to do about them
PDF
The Economics of Scale: Promises and Perils of Going Distributed
PPTX
Let's talk about... Microservices
PDF
QCon 2015 - Microservices Track Notes
PPTX
DockerCon SF 2019 - Observability Workshop
PPTX
DevOps and the cloud: all hail the (developer) king - Daniel Bryant, Steve Poole
PPTX
Microservices Architecture
PPTX
The Road to SaaS
PPTX
JAXLondon 2015 "DevOps and the Cloud: All Hail the (Developer) King"
PPTX
OrteliusMicroserviceVisionaries2022_Why do you need a microservice catalog to...
PDF
From Monolith to Microservices - What Could Go Wrong?
PDF
AWS Community Day: From Monolith to Microservices - What Could Go Wrong?
PPTX
DevoxxUK 2016: "DevOps: Microservices, containers, platforms, tooling... Oh y...
PPTX
How do we drive tech changes
PPTX
From Duke of DevOps to Queen of Chaos - Api days 2018
PDF
Moving to Microservices with the Help of Distributed Traces
Containing your microservice sprawl
SRE Topics with Charity Majors and Liz Fong-Jones of Honeycomb
The Present and Future of Serverless Observability
Consul: Service-oriented at Scale
The biggest DevOps problems you didn't know you had and what to do about them
The Economics of Scale: Promises and Perils of Going Distributed
Let's talk about... Microservices
QCon 2015 - Microservices Track Notes
DockerCon SF 2019 - Observability Workshop
DevOps and the cloud: all hail the (developer) king - Daniel Bryant, Steve Poole
Microservices Architecture
The Road to SaaS
JAXLondon 2015 "DevOps and the Cloud: All Hail the (Developer) King"
OrteliusMicroserviceVisionaries2022_Why do you need a microservice catalog to...
From Monolith to Microservices - What Could Go Wrong?
AWS Community Day: From Monolith to Microservices - What Could Go Wrong?
DevoxxUK 2016: "DevOps: Microservices, containers, platforms, tooling... Oh y...
How do we drive tech changes
From Duke of DevOps to Queen of Chaos - Api days 2018
Moving to Microservices with the Help of Distributed Traces
Ad

More from C4Media (20)

PDF
Streaming a Million Likes/Second: Real-Time Interactions on Live Video
PDF
Next Generation Client APIs in Envoy Mobile
PDF
Software Teams and Teamwork Trends Report Q1 2020
PDF
Understand the Trade-offs Using Compilers for Java Applications
PDF
Kafka Needs No Keeper
PDF
High Performing Teams Act Like Owners
PDF
Does Java Need Inline Types? What Project Valhalla Can Bring to Java
PDF
Service Meshes- The Ultimate Guide
PDF
Shifting Left with Cloud Native CI/CD
PDF
CI/CD for Machine Learning
PDF
Fault Tolerance at Speed
PDF
ML in the Browser: Interactive Experiences with Tensorflow.js
PDF
Build Your Own WebAssembly Compiler
PDF
User & Device Identity for Microservices @ Netflix Scale
PDF
Scaling Patterns for Netflix's Edge
PDF
Make Your Electron App Feel at Home Everywhere
PDF
The Talk You've Been Await-ing For
PDF
Future of Data Engineering
PDF
Automated Testing for Terraform, Docker, Packer, Kubernetes, and More
PDF
Navigating Complexity: High-performance Delivery and Discovery Teams
Streaming a Million Likes/Second: Real-Time Interactions on Live Video
Next Generation Client APIs in Envoy Mobile
Software Teams and Teamwork Trends Report Q1 2020
Understand the Trade-offs Using Compilers for Java Applications
Kafka Needs No Keeper
High Performing Teams Act Like Owners
Does Java Need Inline Types? What Project Valhalla Can Bring to Java
Service Meshes- The Ultimate Guide
Shifting Left with Cloud Native CI/CD
CI/CD for Machine Learning
Fault Tolerance at Speed
ML in the Browser: Interactive Experiences with Tensorflow.js
Build Your Own WebAssembly Compiler
User & Device Identity for Microservices @ Netflix Scale
Scaling Patterns for Netflix's Edge
Make Your Electron App Feel at Home Everywhere
The Talk You've Been Await-ing For
Future of Data Engineering
Automated Testing for Terraform, Docker, Packer, Kubernetes, and More
Navigating Complexity: High-performance Delivery and Discovery Teams

Recently uploaded (20)

PDF
A comparative analysis of optical character recognition models for extracting...
PDF
Diabetes mellitus diagnosis method based random forest with bat algorithm
PDF
August Patch Tuesday
PDF
TokAI - TikTok AI Agent : The First AI Application That Analyzes 10,000+ Vira...
PDF
Assigned Numbers - 2025 - Bluetooth® Document
PPTX
KOM of Painting work and Equipment Insulation REV00 update 25-dec.pptx
PDF
Advanced methodologies resolving dimensionality complications for autism neur...
PDF
Video forgery: An extensive analysis of inter-and intra-frame manipulation al...
PDF
7 ChatGPT Prompts to Help You Define Your Ideal Customer Profile.pdf
PDF
MIND Revenue Release Quarter 2 2025 Press Release
PPTX
1. Introduction to Computer Programming.pptx
PDF
Network Security Unit 5.pdf for BCA BBA.
PDF
Mushroom cultivation and it's methods.pdf
PDF
NewMind AI Weekly Chronicles - August'25-Week II
PPTX
Group 1 Presentation -Planning and Decision Making .pptx
PPTX
Spectroscopy.pptx food analysis technology
PDF
Univ-Connecticut-ChatGPT-Presentaion.pdf
PDF
Mobile App Security Testing_ A Comprehensive Guide.pdf
PDF
Per capita expenditure prediction using model stacking based on satellite ima...
PDF
Accuracy of neural networks in brain wave diagnosis of schizophrenia
A comparative analysis of optical character recognition models for extracting...
Diabetes mellitus diagnosis method based random forest with bat algorithm
August Patch Tuesday
TokAI - TikTok AI Agent : The First AI Application That Analyzes 10,000+ Vira...
Assigned Numbers - 2025 - Bluetooth® Document
KOM of Painting work and Equipment Insulation REV00 update 25-dec.pptx
Advanced methodologies resolving dimensionality complications for autism neur...
Video forgery: An extensive analysis of inter-and intra-frame manipulation al...
7 ChatGPT Prompts to Help You Define Your Ideal Customer Profile.pdf
MIND Revenue Release Quarter 2 2025 Press Release
1. Introduction to Computer Programming.pptx
Network Security Unit 5.pdf for BCA BBA.
Mushroom cultivation and it's methods.pdf
NewMind AI Weekly Chronicles - August'25-Week II
Group 1 Presentation -Planning and Decision Making .pptx
Spectroscopy.pptx food analysis technology
Univ-Connecticut-ChatGPT-Presentaion.pdf
Mobile App Security Testing_ A Comprehensive Guide.pdf
Per capita expenditure prediction using model stacking based on satellite ima...
Accuracy of neural networks in brain wave diagnosis of schizophrenia

Architectures That Scale Deep - Regaining Control in Deep Systems