RedisConf17- Using Redis at scale @ Twitter

Nighthawk
Distributed caching with Redis @
Twitter
Rashmi Ramesh
@rashmi_ur

Agenda
What is Nighthawk?
How does it work?
Scaling out
High availability
Current challenges

Nighthawk - cache-as-a-service
Runs redis at it’s core
> 10M QPS,
Largest cluster runs ~3K redis nodes
> 10TB of data

Who uses Nighthawk?
Some of our biggest customers:
Analytics services - Ads, Video
Ad serving
Ad Exchange
Direct Messaging
Mobile app conversion tracking

Design Goals
Scalable: scale vertically and horizontally
Elastic: add / remove instances without violating SLA
High throughput and low latencies
High availability in the event of machine failures
Topology agnostic client

Nighthawk Architecture
Client
Proxy/Routing layer
Backend N
..……...
Redis 0 Redis N
Backend 0
..……...
Redis 0 Redis N
Topology
Cluster
manager

Cache backend
Mesos Container
Redis nodes
Topology
watcher and
announcer
1 2 3
NM
Proxy/Router
Replica 1 -> Redis1
Replica 2 -> Redis2
Replica 3 -> Redis3
Redis1(dc,host,port1,capacity)
Redis2(dc,host,port2, capacity)
Redis3(dc,host,port3,, capacity)
Topology

Cluster manager
Manages topology membership and changes
- (Re)Balances replicas
- Reacts to topology changes, eg: dead node
- Replicated cache - ensures 2 replicas of same partition are on separate
failure domains

Redis databases for partitions
Partition -> Redis DB
Granular key remapping
Logical data isolation
Enumerating - redis db scan
Deletion - flushdb
Enables replica rehydration
K1 K4K2 K3
Partition X Partition Y
1 2

Scaling out with Client/Proxy managed
partitioningKey count: 1.5 M keys
Client
500K 500K500K

Scaling out with Client/Proxy managed
partitioningKey count: 1.5M keys
Remapped keys: 600K
Client
300K 300K300K 300K
300K
Persistent storage

Scaling out with Cluster manager
Key count: 1.5M keys
Partition count: 100
Keys/Partition: 15K
Client
Persistent storage
Proxy
Topology and
cluster manager
500K 500K500K

Keys/Partition: 15K
Client
Persistent storage
Proxy
Topology and
cluster manager
500K 485K500K 15K

Keys/Partition: 15K
Client
485K 485K500K 15K 15K
Persistent storage
Proxy
Topology and
cluster manager

Scaling out with Cluster manager - Post
balancingKey count: 1.5M keys
Post balancing...
Client
Persistent storage
Proxy
Topology and
cluster manager
250K 250K250K 250K 500K

Advantages over Client managed partitioning
- Thin client - simple and oblivious to topology
- Clients, proxy layer and backends scale independently
- Pluggable custom load balancing logic through cluster manager
- No cluster downtime during scaling out/up/back

High Availability with Replication
Synchronous, best effort
RF = 2, Intra DC
Supports idempotent operations only - get, put, remove, count, scan
Copies of a partition never on the same host and rack
Passive warming for failed/restarted replicas

High Availability with Replication
Client
Proxy/Routing layer
Backend 0
Partition 2,5,9
Topology
Cluster
manager
GetKey in
Partition 5
GetKey in
Partition 5
SERVING
Backend N
Partition
12,5,10
SERVINGFAILED
Backend N*
Partition 12,5,10
WARMING
SetKey in
partition 5
Pool A Pool B

Remember this?
The most retweeted
Tweet of 2014!

Hot key symptom
Significantly high QPS to a single cache server

Hot Key Mitigation
Server side diagnostics:
Sampling a small % of requests and logging
Post processing the logs to identify high frequency keys
Client side solution:
Client side hot key detection and caching
Better to have:
Redis tracks the hot keys
Protocol support to send feedback to client if a key is hot

Active warming of replicas
Client
Proxy/Routing layer
Topology
Cluster
manager
Backend A
Partition 2,5,9
SERVING
Backend B*
Partition 12,5,10
WARMING
writes
Bootstrapper
Pool A
Pool B

RedisConf17- Using Redis at scale @ Twitter

More Related Content

What's hot (20)

Similar to RedisConf17- Using Redis at scale @ Twitter (20)

More from Redis Labs (20)

Recently uploaded (20)

RedisConf17- Using Redis at scale @ Twitter

Editor's Notes