SlideShare a Scribd company logo
OpenTSDB Update
Distributed, Scalable Time Series Database
Chris Larsen clarsen@yahoo-inc.com
Who Am I? (no really, who am I?)
Chris Larsen
Current lead for OpenTSDB
Software Engineer @ Yahoo!
Monitoring Team
What Is OpenTSDB?
Open Source Time Series Database
Store trillions of data points
Sucks up all data and keeps going
Never lose precision
Scales using HBase, Cassandra
Or Bigtable
What good is it?
Systems Monitoring & Measurement
Servers
Networks
Sensor Data
The Internet of Things
SCADA
Financial Data
Scientific Experiment Results
Use Cases
Backing store for Argus:
Open source monitoring
and alerting system
15 HBase Servers
6 month retention
10M writes per minute
95p query latency < 30 days = 200ms
Moving to 200 node cluster writing at 100M/m
Use Cases
●Monitoring system, network and application
performance and statistics
110 region servers, 10M writes/s ~ 2PB
Multi-tenant and Kerberos secure HBase
~200k writes per second per TSD
Central monitoring for all Yahoo properties
Over 2 billion time series served
Some Other Users
What Are Time Series?
Time Series: data points for an identity
over time
Typical Identity:
Dotted string: web01.sys.cpu.user.0
OpenTSDB Identity:
Metric: sys.cpu.user
Tags (name/value pairs):
host=web01 cpu=0
What Are Time Series?
Data Point:
Metric + Tags
+ Value: 42
+ Timestamp: 1234567890
sys.cpu.user 1234567890 42 host=web01 cpu=0
^ a data point ^
How it Works
Writing Data
1) Open Telnet style socket, write:
put sys.cpu.user 1234567890 42 host=web01 cpu=0
2) ..or, post JSON to:
http://<host>:<port>/api/put
3) .. or import big files with CLI
No schema definition
No RRD file creation
Just write!
Querying Data
Graph with the GUI
CLI tools
HTTP API
Aggregate multiple series
Simple query language
To average all CPUs on host:
start=1h-ago
avg sys.cpu.user
host=web01
HBase Data Tables
tsdb - Data point table. Massive
tsdb-uid - Name to UID and UID to
name mappings
tsdb-meta - Time series index and
meta-data
tsdb-tree - Config and index for
hierarchical naming schema
Data Table Schema
Row key is a concatenation of UIDs and time:
metric + timestamp + tagk1 + tagv1… + tagkN + tagvN
sys.cpu.user 1234567890 42 host=web01 cpu=0
x00x00x01x49x95xFBx70x00x00x01x00x00x01x00x00x02x00x00x02
Timestamp normalized on 1 hour boundaries
All data points for an hour are stored in one row
Enables fast scans of all time series for a metric
…or pass a row key filter for specific time series with
particular tags
New for OpenTSDB 2.2
● Append writes (no more need for TSD
Compactions)
● Row salting and random metric IDs
● Downsampling Fill Policies
● Query filters (wildcard, regex, group by or not)
● Storage Exception plugin for retrying writes
● Released February 2016
New for OpenTSDB 2.3
● Graphite style expressions
● Cross-metric expressions
● Calendar based downsampling
● New data stores
● UID assignment plugin interface
● Datapoint write filter plugin interface
● RC1 released May 2016
Fuzzy Row Filter
How do you find a single time
series out of 1 million?
For a day?
For a month?
Fuzzy Row Filter
Instead of running a regex
string comparator over each
byte array formatted key…
(?s)^.{9}(?:.{8})*Qx00x00x00x02
E(?:Q)x00x0F‡x42x2BE)(?:.{8})*$
TSDB query takes 1.6 seconds
for 89,726 rows
KEY
Match -> m t1 tagk tagv1
No Match -> m t1 tagk tagv2
No Match -> m t1 tagk tagv3
No Match -> m t1 tagk tagv4
No Match -> m t1 tagk tagv5
No Match -> m t1 tagk tagv6
Match -> m t2 tagk tagv1
No Match -> m t2 tagk tagv2
Fuzzy Row Filter
Use a byte mask!
● Use the bloom filter to skip-scan
to the next candidate row.
● Combine with regex (after fuzzy
filter) to filter further.
FuzzyFilter{[FuzzyFilterPair{row_key=[18, 68,
-3, -82, 120, 87, 56, -15, 96, 0, 0, 0, 1, 0,
0, 0, 0, 0, 0, 0, 2, 0, 0, 0, 0],
mask=[0, 0, 0, 0, 0, 1, 1, 1, 1, 0, 0, 0, 0,
1, 1, 1, 1, 0, 0, 0, 0, 1, 1, 1, 1]}]}
Now it takes 0.239 seconds
KEY
Match -> m t1 tagk tagv1
Skip -> m t1 tagk tagv2
m t1 tagk tagv3
m t1 tagk tagv4
m t1 tagk tagv5
m t1 tagk tagv6
Match -> m t2 tagk tagv1
Skip -> m t2 tagk tagv2
Fuzzy Row Filter
Pros:
● Can improve scan latency by orders of magnitude
● Combines nicely with other filters
Cons:
● All row keys for the match have to be a fixed length
● Doesn’t help much when matching the majority of a set
● Doesn’t support bitmasks, only byte masks
AsyncHBase
AsyncHBase is a fully asynchronous, multi-
threaded HBase client
Supports HBase 0.90 to 1.x
Faster and less resource intensive than the
native HBase client
Support for scanner filters, META prefetch,
“fail-fast” RPCs
AsyncHBase in YCSB
AsyncHBase in YCSB
Upcoming in 1.8
●Reverse Scanning
●New Yahoo! Cloud Serving Benchmark
(YCSB) module for testing
●Lots of bug fixes
OpenTSDB on Bigtable
● Bigtable
○Hosted Google Service
○Client uses HTTP2 and GRPC for communication
● OpenTSDB heads home
○Based on a time series store on Bigtable at Google
○Identical schema as HBase
○Same filter support (fuzzy filters are coming)
OpenTSDB on Bigtable
● AsyncBigtable
○Implementation of AsyncHBase’s API for drop-in use
○https://github.com/OpenTSDB/asyncbigtable
○Uses HTable API
○Moving to native Bigtable API
● Thanks to Christos of Pythian, Solomon, Carter, Misha,
and the rest of the Google Bigtable team
● https://www.pythian.com/blog/run-opentsdb-google-
bigtable/#
OpenTSDB on Cassandra
● AsyncCassandra - Implementation of AsyncHBase’s
API for drop-in use
● Wraps Netflix’s Astyanax for asynchronous calls
● Requires the ByteOrderedPartitioner and legacy
API
● Same schema as HBase/Bigtable
● Scan filtering performed client side
● May not work with future Cassandra versions
if they drop the API
Community
Salesforce Argus
●Time series monitoring
and alerting
●Multi-series annotations
●Dashboards
Thanks to Tom Valine and the Salesforce engineers
https://medium.com/salesforce-open-source/argus-time-series-monitoring-and-
alerting-d2941f67864#.ez7mbo3ek
https://github.com/SalesforceEng/Argus
Community
Turn Splicer
●API to shard TSDB queries
●Locality advantage hosting
TSDs on region servers
●Query caching
Thanks to Jonathan Creasy and the Turn engineers
https://github.com/turn/splicer
The Future of OpenTSDB
The Future
Reworked query pipeline for selective ordering
of operations
Histogram support
Flexible query caching framework
Distributed queries
Greater data store abstraction
More Information
Thank you to everyone who has helped test, debug and add to OpenTSDB
2.3 including, but not limited to:
TODO
Contribute at github.com/OpenTSDB/opentsdb
Website: opentsdb.net
Documentation: opentsdb.net/docs/build/html
Mailing List: groups.google.com/group/opentsdb
Images
http://photos.jdhancock.com/photo/2013-06-04-212438-the-lonely-vacuum-of-space.html
http://en.wikipedia.org/wiki/File:Semi-automated-external-monitor-defibrillator.jpg
http://upload.wikimedia.org/wikipedia/commons/1/17/Dining_table_for_two.jpg
http://upload.wikimedia.org/wikipedia/commons/9/92/Easy_button.JPG
https://www.flickr.com/photos/verbeeldingskr8/15563333617
http://www.flickr.com/photos/ladydragonflyherworld/4845314274/
http://lego.cuusoo.com/ideas/view/96

More Related Content

PDF
OpenTSDB for monitoring @ Criteo
PDF
OpenTSDB: HBaseCon2017
PDF
OpenTSDB 2.0
PPTX
Update on OpenTSDB and AsyncHBase
PPTX
HBaseCon 2015: OpenTSDB and AsyncHBase Update
PPTX
Monitoring MySQL with OpenTSDB
PDF
openTSDB - Metrics for a distributed world
PDF
Tales from Taming the Long Tail
OpenTSDB for monitoring @ Criteo
OpenTSDB: HBaseCon2017
OpenTSDB 2.0
Update on OpenTSDB and AsyncHBase
HBaseCon 2015: OpenTSDB and AsyncHBase Update
Monitoring MySQL with OpenTSDB
openTSDB - Metrics for a distributed world
Tales from Taming the Long Tail

What's hot (20)

PDF
Ted Dunning – Very High Bandwidth Time Series Database Implementation - NoSQL...
PPTX
Keynote: Apache HBase at Yahoo! Scale
PDF
Advanced Apache Cassandra Operations with JMX
PPTX
Bucket Your Partitions Wisely (Markus Höfer, codecentric AG) | Cassandra Summ...
PDF
HBaseCon 2012 | Lessons learned from OpenTSDB - Benoit Sigoure, StumbleUpon
PPTX
HBaseCon 2013: OpenTSDB at Box
PDF
HBaseCon2017 gohbase: Pure Go HBase Client
PDF
HBaseCon 2013: Scalable Network Designs for Apache HBase
PPTX
Storing Cassandra Metrics (Chris Lohfink, DataStax) | C* Summit 2016
PDF
Terror & Hysteria: Cost Effective Scaling of Time Series Data with Cassandra ...
PDF
Samza memory capacity_2015_ieee_big_data_data_quality_workshop
PDF
Monitoring Cassandra at Scale (Jason Cacciatore, Netflix) | C* Summit 2016
PPTX
opentsdb in a real enviroment
PPTX
Hecuba2: Cassandra Operations Made Easy (Radovan Zvoncek, Spotify) | C* Summi...
PDF
Tuning Speculative Retries to Fight Latency (Michael Figuiere, Minh Do, Netfl...
PPTX
Aerospike & GCE (LSPE Talk)
PDF
SignalFx: Making Cassandra Perform as a Time Series Database
PDF
Cassandra Exports as a Trivially Parallelizable Problem (Emilio Del Tessandor...
PDF
Cassandra Community Webinar | In Case of Emergency Break Glass
PDF
Thanos - Prometheus on Scale
Ted Dunning – Very High Bandwidth Time Series Database Implementation - NoSQL...
Keynote: Apache HBase at Yahoo! Scale
Advanced Apache Cassandra Operations with JMX
Bucket Your Partitions Wisely (Markus Höfer, codecentric AG) | Cassandra Summ...
HBaseCon 2012 | Lessons learned from OpenTSDB - Benoit Sigoure, StumbleUpon
HBaseCon 2013: OpenTSDB at Box
HBaseCon2017 gohbase: Pure Go HBase Client
HBaseCon 2013: Scalable Network Designs for Apache HBase
Storing Cassandra Metrics (Chris Lohfink, DataStax) | C* Summit 2016
Terror & Hysteria: Cost Effective Scaling of Time Series Data with Cassandra ...
Samza memory capacity_2015_ieee_big_data_data_quality_workshop
Monitoring Cassandra at Scale (Jason Cacciatore, Netflix) | C* Summit 2016
opentsdb in a real enviroment
Hecuba2: Cassandra Operations Made Easy (Radovan Zvoncek, Spotify) | C* Summi...
Tuning Speculative Retries to Fight Latency (Michael Figuiere, Minh Do, Netfl...
Aerospike & GCE (LSPE Talk)
SignalFx: Making Cassandra Perform as a Time Series Database
Cassandra Exports as a Trivially Parallelizable Problem (Emilio Del Tessandor...
Cassandra Community Webinar | In Case of Emergency Break Glass
Thanos - Prometheus on Scale
Ad

Viewers also liked (20)

PDF
Apache HBase - Just the Basics
PDF
Breaking the Sound Barrier with Persistent Memory
PPTX
Keynote: The Future of Apache HBase
PDF
Apache HBase Improvements and Practices at Xiaomi
PPTX
Apache HBase at Airbnb
PPTX
Optimizing Apache HBase for Cloud Storage in Microsoft Azure HDInsight
PPTX
Apache Phoenix: Use Cases and New Features
PPTX
HBase In Action - Chapter 04: HBase table design
PDF
Argus Production Monitoring at Salesforce
PPTX
Date-tiered Compaction Policy for Time-series Data
PDF
HBase Advanced - Lars George
PPTX
Hadoop World 2011: Advanced HBase Schema Design
PDF
Scaling Pinterest's Monitoring
PDF
HBase schema design Big Data TechCon Boston
PPTX
HBase Data Modeling and Access Patterns with Kite SDK
PPTX
Time-Series Apache HBase
PPTX
Design Patterns for Building 360-degree Views with HBase and Kiji
PPTX
HBase: Just the Basics
PDF
HBaseCon 2015: HBase @ Flipboard
PPTX
Rolling Out Apache HBase for Mobile Offerings at Visa
Apache HBase - Just the Basics
Breaking the Sound Barrier with Persistent Memory
Keynote: The Future of Apache HBase
Apache HBase Improvements and Practices at Xiaomi
Apache HBase at Airbnb
Optimizing Apache HBase for Cloud Storage in Microsoft Azure HDInsight
Apache Phoenix: Use Cases and New Features
HBase In Action - Chapter 04: HBase table design
Argus Production Monitoring at Salesforce
Date-tiered Compaction Policy for Time-series Data
HBase Advanced - Lars George
Hadoop World 2011: Advanced HBase Schema Design
Scaling Pinterest's Monitoring
HBase schema design Big Data TechCon Boston
HBase Data Modeling and Access Patterns with Kite SDK
Time-Series Apache HBase
Design Patterns for Building 360-degree Views with HBase and Kiji
HBase: Just the Basics
HBaseCon 2015: HBase @ Flipboard
Rolling Out Apache HBase for Mobile Offerings at Visa
Ad

Similar to Update on OpenTSDB and AsyncHBase (20)

PPTX
Need for Time series Database
PDF
OSMC 2013 | openTSDB - metrics for a distributed world
PPTX
Apache IOTDB: a Time Series Database for Industrial IoT
PDF
Survey real time databases
PDF
Open TSDB Lightning Talk
PDF
From a student to an apache committer practice of apache io tdb
PPTX
Apache HBase - Introduction & Use Cases
PDF
Enhanced Data Visualization provided for 200,000 Machines with OpenTSDB and C...
PDF
TimeSpaceDB
PDF
Argus Production Monitoring at Salesforce
PPTX
temporal and spatial database.pptx
PDF
Paul Dix [InfluxData] The Journey of InfluxDB | InfluxDays 2022
PDF
Chronix Poster for the Poster Session FAST 2017
PDF
TechEvent Time Seriesd Databases
PPTX
Dealing with an Upside Down Internet
PPTX
How the Internet of Things are Turning the Internet Upside Down
PDF
The new time series kid on the block
PDF
Chronix Time Series Database - The New Time Series Kid on the Block
PDF
Efficient and Fast Time Series Storage - The missing link in dynamic software...
PDF
FiloDB: Reactive, Real-Time, In-Memory Time Series at Scale
Need for Time series Database
OSMC 2013 | openTSDB - metrics for a distributed world
Apache IOTDB: a Time Series Database for Industrial IoT
Survey real time databases
Open TSDB Lightning Talk
From a student to an apache committer practice of apache io tdb
Apache HBase - Introduction & Use Cases
Enhanced Data Visualization provided for 200,000 Machines with OpenTSDB and C...
TimeSpaceDB
Argus Production Monitoring at Salesforce
temporal and spatial database.pptx
Paul Dix [InfluxData] The Journey of InfluxDB | InfluxDays 2022
Chronix Poster for the Poster Session FAST 2017
TechEvent Time Seriesd Databases
Dealing with an Upside Down Internet
How the Internet of Things are Turning the Internet Upside Down
The new time series kid on the block
Chronix Time Series Database - The New Time Series Kid on the Block
Efficient and Fast Time Series Storage - The missing link in dynamic software...
FiloDB: Reactive, Real-Time, In-Memory Time Series at Scale

More from HBaseCon (20)

PDF
hbaseconasia2017: Building online HBase cluster of Zhihu based on Kubernetes
PDF
hbaseconasia2017: HBase on Beam
PDF
hbaseconasia2017: HBase Disaster Recovery Solution at Huawei
PDF
hbaseconasia2017: Removable singularity: a story of HBase upgrade in Pinterest
PDF
hbaseconasia2017: HareQL:快速HBase查詢工具的發展過程
PDF
hbaseconasia2017: Apache HBase at Netease
PDF
hbaseconasia2017: HBase在Hulu的使用和实践
PDF
hbaseconasia2017: 基于HBase的企业级大数据平台
PDF
hbaseconasia2017: HBase at JD.com
PDF
hbaseconasia2017: Large scale data near-line loading method and architecture
PDF
hbaseconasia2017: Ecosystems with HBase and CloudTable service at Huawei
PDF
hbaseconasia2017: HBase Practice At XiaoMi
PDF
hbaseconasia2017: hbase-2.0.0
PDF
HBaseCon2017 Democratizing HBase
PDF
HBaseCon2017 Removable singularity: a story of HBase upgrade in Pinterest
PDF
HBaseCon2017 Quanta: Quora's hierarchical counting system on HBase
PDF
HBaseCon2017 Transactions in HBase
PDF
HBaseCon2017 Highly-Available HBase
PDF
HBaseCon2017 Apache HBase at Didi
PDF
HBaseCon2017 Improving HBase availability in a multi tenant environment
hbaseconasia2017: Building online HBase cluster of Zhihu based on Kubernetes
hbaseconasia2017: HBase on Beam
hbaseconasia2017: HBase Disaster Recovery Solution at Huawei
hbaseconasia2017: Removable singularity: a story of HBase upgrade in Pinterest
hbaseconasia2017: HareQL:快速HBase查詢工具的發展過程
hbaseconasia2017: Apache HBase at Netease
hbaseconasia2017: HBase在Hulu的使用和实践
hbaseconasia2017: 基于HBase的企业级大数据平台
hbaseconasia2017: HBase at JD.com
hbaseconasia2017: Large scale data near-line loading method and architecture
hbaseconasia2017: Ecosystems with HBase and CloudTable service at Huawei
hbaseconasia2017: HBase Practice At XiaoMi
hbaseconasia2017: hbase-2.0.0
HBaseCon2017 Democratizing HBase
HBaseCon2017 Removable singularity: a story of HBase upgrade in Pinterest
HBaseCon2017 Quanta: Quora's hierarchical counting system on HBase
HBaseCon2017 Transactions in HBase
HBaseCon2017 Highly-Available HBase
HBaseCon2017 Apache HBase at Didi
HBaseCon2017 Improving HBase availability in a multi tenant environment

Recently uploaded (20)

PPTX
Advanced SystemCare Ultimate Crack + Portable (2025)
PDF
Cost to Outsource Software Development in 2025
PDF
Adobe Illustrator 28.6 Crack My Vision of Vector Design
PDF
Navsoft: AI-Powered Business Solutions & Custom Software Development
PPTX
Agentic AI : A Practical Guide. Undersating, Implementing and Scaling Autono...
PPTX
Computer Software and OS of computer science of grade 11.pptx
PPTX
Transform Your Business with a Software ERP System
PDF
Designing Intelligence for the Shop Floor.pdf
PPTX
Patient Appointment Booking in Odoo with online payment
PDF
Odoo Companies in India – Driving Business Transformation.pdf
PDF
Adobe Premiere Pro 2025 (v24.5.0.057) Crack free
PDF
Nekopoi APK 2025 free lastest update
DOCX
Greta — No-Code AI for Building Full-Stack Web & Mobile Apps
PDF
iTop VPN 6.5.0 Crack + License Key 2025 (Premium Version)
PDF
AutoCAD Professional Crack 2025 With License Key
PPTX
Monitoring Stack: Grafana, Loki & Promtail
PDF
Autodesk AutoCAD Crack Free Download 2025
PPTX
AMADEUS TRAVEL AGENT SOFTWARE | AMADEUS TICKETING SYSTEM
PDF
Design an Analysis of Algorithms I-SECS-1021-03
PPTX
Why Generative AI is the Future of Content, Code & Creativity?
Advanced SystemCare Ultimate Crack + Portable (2025)
Cost to Outsource Software Development in 2025
Adobe Illustrator 28.6 Crack My Vision of Vector Design
Navsoft: AI-Powered Business Solutions & Custom Software Development
Agentic AI : A Practical Guide. Undersating, Implementing and Scaling Autono...
Computer Software and OS of computer science of grade 11.pptx
Transform Your Business with a Software ERP System
Designing Intelligence for the Shop Floor.pdf
Patient Appointment Booking in Odoo with online payment
Odoo Companies in India – Driving Business Transformation.pdf
Adobe Premiere Pro 2025 (v24.5.0.057) Crack free
Nekopoi APK 2025 free lastest update
Greta — No-Code AI for Building Full-Stack Web & Mobile Apps
iTop VPN 6.5.0 Crack + License Key 2025 (Premium Version)
AutoCAD Professional Crack 2025 With License Key
Monitoring Stack: Grafana, Loki & Promtail
Autodesk AutoCAD Crack Free Download 2025
AMADEUS TRAVEL AGENT SOFTWARE | AMADEUS TICKETING SYSTEM
Design an Analysis of Algorithms I-SECS-1021-03
Why Generative AI is the Future of Content, Code & Creativity?

Update on OpenTSDB and AsyncHBase

  • 1. OpenTSDB Update Distributed, Scalable Time Series Database Chris Larsen [email protected]
  • 2. Who Am I? (no really, who am I?) Chris Larsen Current lead for OpenTSDB Software Engineer @ Yahoo! Monitoring Team
  • 3. What Is OpenTSDB? Open Source Time Series Database Store trillions of data points Sucks up all data and keeps going Never lose precision Scales using HBase, Cassandra Or Bigtable
  • 4. What good is it? Systems Monitoring & Measurement Servers Networks Sensor Data The Internet of Things SCADA Financial Data Scientific Experiment Results
  • 5. Use Cases Backing store for Argus: Open source monitoring and alerting system 15 HBase Servers 6 month retention 10M writes per minute 95p query latency < 30 days = 200ms Moving to 200 node cluster writing at 100M/m
  • 6. Use Cases ●Monitoring system, network and application performance and statistics 110 region servers, 10M writes/s ~ 2PB Multi-tenant and Kerberos secure HBase ~200k writes per second per TSD Central monitoring for all Yahoo properties Over 2 billion time series served
  • 8. What Are Time Series? Time Series: data points for an identity over time Typical Identity: Dotted string: web01.sys.cpu.user.0 OpenTSDB Identity: Metric: sys.cpu.user Tags (name/value pairs): host=web01 cpu=0
  • 9. What Are Time Series? Data Point: Metric + Tags + Value: 42 + Timestamp: 1234567890 sys.cpu.user 1234567890 42 host=web01 cpu=0 ^ a data point ^
  • 11. Writing Data 1) Open Telnet style socket, write: put sys.cpu.user 1234567890 42 host=web01 cpu=0 2) ..or, post JSON to: http://<host>:<port>/api/put 3) .. or import big files with CLI No schema definition No RRD file creation Just write!
  • 12. Querying Data Graph with the GUI CLI tools HTTP API Aggregate multiple series Simple query language To average all CPUs on host: start=1h-ago avg sys.cpu.user host=web01
  • 13. HBase Data Tables tsdb - Data point table. Massive tsdb-uid - Name to UID and UID to name mappings tsdb-meta - Time series index and meta-data tsdb-tree - Config and index for hierarchical naming schema
  • 14. Data Table Schema Row key is a concatenation of UIDs and time: metric + timestamp + tagk1 + tagv1… + tagkN + tagvN sys.cpu.user 1234567890 42 host=web01 cpu=0 x00x00x01x49x95xFBx70x00x00x01x00x00x01x00x00x02x00x00x02 Timestamp normalized on 1 hour boundaries All data points for an hour are stored in one row Enables fast scans of all time series for a metric …or pass a row key filter for specific time series with particular tags
  • 15. New for OpenTSDB 2.2 ● Append writes (no more need for TSD Compactions) ● Row salting and random metric IDs ● Downsampling Fill Policies ● Query filters (wildcard, regex, group by or not) ● Storage Exception plugin for retrying writes ● Released February 2016
  • 16. New for OpenTSDB 2.3 ● Graphite style expressions ● Cross-metric expressions ● Calendar based downsampling ● New data stores ● UID assignment plugin interface ● Datapoint write filter plugin interface ● RC1 released May 2016
  • 17. Fuzzy Row Filter How do you find a single time series out of 1 million? For a day? For a month?
  • 18. Fuzzy Row Filter Instead of running a regex string comparator over each byte array formatted key… (?s)^.{9}(?:.{8})*Qx00x00x00x02 E(?:Q)x00x0F‡x42x2BE)(?:.{8})*$ TSDB query takes 1.6 seconds for 89,726 rows KEY Match -> m t1 tagk tagv1 No Match -> m t1 tagk tagv2 No Match -> m t1 tagk tagv3 No Match -> m t1 tagk tagv4 No Match -> m t1 tagk tagv5 No Match -> m t1 tagk tagv6 Match -> m t2 tagk tagv1 No Match -> m t2 tagk tagv2
  • 19. Fuzzy Row Filter Use a byte mask! ● Use the bloom filter to skip-scan to the next candidate row. ● Combine with regex (after fuzzy filter) to filter further. FuzzyFilter{[FuzzyFilterPair{row_key=[18, 68, -3, -82, 120, 87, 56, -15, 96, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 2, 0, 0, 0, 0], mask=[0, 0, 0, 0, 0, 1, 1, 1, 1, 0, 0, 0, 0, 1, 1, 1, 1, 0, 0, 0, 0, 1, 1, 1, 1]}]} Now it takes 0.239 seconds KEY Match -> m t1 tagk tagv1 Skip -> m t1 tagk tagv2 m t1 tagk tagv3 m t1 tagk tagv4 m t1 tagk tagv5 m t1 tagk tagv6 Match -> m t2 tagk tagv1 Skip -> m t2 tagk tagv2
  • 20. Fuzzy Row Filter Pros: ● Can improve scan latency by orders of magnitude ● Combines nicely with other filters Cons: ● All row keys for the match have to be a fixed length ● Doesn’t help much when matching the majority of a set ● Doesn’t support bitmasks, only byte masks
  • 21. AsyncHBase AsyncHBase is a fully asynchronous, multi- threaded HBase client Supports HBase 0.90 to 1.x Faster and less resource intensive than the native HBase client Support for scanner filters, META prefetch, “fail-fast” RPCs
  • 24. Upcoming in 1.8 ●Reverse Scanning ●New Yahoo! Cloud Serving Benchmark (YCSB) module for testing ●Lots of bug fixes
  • 25. OpenTSDB on Bigtable ● Bigtable ○Hosted Google Service ○Client uses HTTP2 and GRPC for communication ● OpenTSDB heads home ○Based on a time series store on Bigtable at Google ○Identical schema as HBase ○Same filter support (fuzzy filters are coming)
  • 26. OpenTSDB on Bigtable ● AsyncBigtable ○Implementation of AsyncHBase’s API for drop-in use ○https://github.com/OpenTSDB/asyncbigtable ○Uses HTable API ○Moving to native Bigtable API ● Thanks to Christos of Pythian, Solomon, Carter, Misha, and the rest of the Google Bigtable team ● https://www.pythian.com/blog/run-opentsdb-google- bigtable/#
  • 27. OpenTSDB on Cassandra ● AsyncCassandra - Implementation of AsyncHBase’s API for drop-in use ● Wraps Netflix’s Astyanax for asynchronous calls ● Requires the ByteOrderedPartitioner and legacy API ● Same schema as HBase/Bigtable ● Scan filtering performed client side ● May not work with future Cassandra versions if they drop the API
  • 28. Community Salesforce Argus ●Time series monitoring and alerting ●Multi-series annotations ●Dashboards Thanks to Tom Valine and the Salesforce engineers https://medium.com/salesforce-open-source/argus-time-series-monitoring-and- alerting-d2941f67864#.ez7mbo3ek https://github.com/SalesforceEng/Argus
  • 29. Community Turn Splicer ●API to shard TSDB queries ●Locality advantage hosting TSDs on region servers ●Query caching Thanks to Jonathan Creasy and the Turn engineers https://github.com/turn/splicer
  • 30. The Future of OpenTSDB
  • 31. The Future Reworked query pipeline for selective ordering of operations Histogram support Flexible query caching framework Distributed queries Greater data store abstraction
  • 32. More Information Thank you to everyone who has helped test, debug and add to OpenTSDB 2.3 including, but not limited to: TODO Contribute at github.com/OpenTSDB/opentsdb Website: opentsdb.net Documentation: opentsdb.net/docs/build/html Mailing List: groups.google.com/group/opentsdb Images http://photos.jdhancock.com/photo/2013-06-04-212438-the-lonely-vacuum-of-space.html http://en.wikipedia.org/wiki/File:Semi-automated-external-monitor-defibrillator.jpg http://upload.wikimedia.org/wikipedia/commons/1/17/Dining_table_for_two.jpg http://upload.wikimedia.org/wikipedia/commons/9/92/Easy_button.JPG https://www.flickr.com/photos/verbeeldingskr8/15563333617 http://www.flickr.com/photos/ladydragonflyherworld/4845314274/ http://lego.cuusoo.com/ideas/view/96