SlideShare a Scribd company logo
Facebook Style Notifications Using HBase and
Event Streams
github.com/regunathb RegunathB
Serving User Intent (eCommerce)
• Mass targeted
(Low relevance)
– User Intent Captured
from: Browse, Buy,
Register
• Quantified,Time-bound
(Improved relevance)
– User Intent Derived from:
Category Affinity,
Recommendations
Serving User Intent (social)
Image Source : http://allfacebook.com/
• Near real-time
– Quick updates about
friends’ actions that most
affect you
• Relevant Actions
– Likes, Comments etc
• Personalized
– Content only from social
circle
• Non-invasive
– Users therefore tolerate less
relevant content as
compared to email
Notifications on Flipkart
Search, Browse
Add to Wish List
Add to Cart
Checkout/Buy
User Intent
derived from
Price Drop Notification
iPhone 5C
Price =42K
Price =44K
Price =39K
.
.
Time.
t2
t0
t1
t2
Solution 1 : Generate Notifications
on Demand
Gather
User
Intent
Retrieve
Current,
Past
Data
Intents
Data store
• Pros
• Perceived optimal resource utilization
• Cons
• Gathering, Processing and Serving coupled
• Read path is computationally expensive
• High latency
• Need versioning support on Product data
• Repeated computations Product
Data store
Create Notifications on Visits
Solution 2 : Pre-create in Real-time,
Serve on Demand
What Leads to a Notification?
Intent (interest expressed by the user) ⋂ Event (price changes ) => Notification
(Intersection of millions of User Intent and Product Changes)
Intent Event
Stream
Change Event
Stream
Notifications
Intent
Capturing
System
Event
Processing
System
Notification
Serving
System
Intents,
Notifications
Product
changes
append
create
update
expire
Event based Pre-processing Near real-time Serving
read
Pre-create, Serve on Demand
SEDA, Filtering using CEP
Filtered
event
processing
Intents
Product
changes
Facts,
Notifications
CEP
Engine
intermediate stages
intermediate stages
Extract
unique
interests
The Data Store
• Store large sets of data
– Products(P) 10s M
– Users(U) 10s M
– Activity(I = U X P) 100s M
– Events/day (E = P + U) 10s M
– Notifications (N = E ⋂ I) >100 M (in total)
• High write throughput
• High read throughput for sets of data
– Intents: user pivoted, Facts: product pivoted
• Low latency reads
– Notifications – user pivoted, ordered by recency
The Data Store - HBase
U:USERID_A:TIMESTAMP:PRICE_DROP:MOBDSGU2ZMDYENQ
U:USERID_B:TIMESTAMP:PRICE_DROP:MOBDSGU2ZMDYENQ
U:USERID_B:TIMESTAMP:PRICE_DROP:MOBDQ9VXXXX6NF8V
U:USERID_B:TIMESTAMP:PRICE_DROP:MOBDP6W6MCUWCF
U:USERID_C:TIMESTAMP:PRICE_DROP:MOBDQ9VXXXX6NF8V
LSM Tree
Row key design for Notifications table
Image Sources : http://blog.sematext.com/,
http://dailyjs.com/
• Benefits of keeping related
data together
– Minimize disk seek for rows
read
– Rows may be returned from
Block cache, MemStore
Intent Capturing
System
Event Processing
System
Notification Serving
System
HBase
(Intents,
Notifications)
Product
changes
append
create
update
expire
Event based Pre-processing Near real-time Serving
read
Tech Stack
Trooper
Batch
W3 via
Phanto
m
Trooper SEDA
(RabbitMQ, Mule),
CEP (Esper)
Phantom Flipcast
CeryxTomcat
CDNMemcached
Tech Stack
• Phantom – Reverse proxy for latency sensitive user actions
• Trooper Batch – Cron jobs
• Trooper SEDA – Distributed, Event processing
• FlipCast – Platform agnostic multi-cast notifications
• RabbitMQ – Integration, Work distribution
• Esper – Complex Event Processing (Filtering/Matching)
• HBase – Data store
• Tomcat – REST services container for Notifications
• Ceryx – Target Group generation, User preferences
Flipkart OSS Public domain OSS Closed source
Operating Notifications
A/B framework
Phantom: Intent
Capture
Phantom: Serve
Notifications
Trooper Batch : Jobs
• Monitoring consoles
– RabbitMQ queues
– FQ service
– Graphite
– Nagios
– Omniture tracking
– Trooper SEDA & Batch
consoles
Tweeple Reactions
Recap
• Pros
– Low latency read-path, resilience to failure (ok to not show
notifications for some users)
– Scales well (LSM trees, KV store, SEDA, CDN for images)
– Immutable Facts, Change Events stored in append-only data store
provides ability to re-compute notifications
• Cons
– Consistency challenges
•HBase has strong consistency (single write master) but Notification
source data can change – leading to Eventual Consistency
– Pre-creating Notifications that may never be seen (cost of storage)
References
• HBase : The Definitive Guide (http://www.flipkart.com/hbase-definitive-
guide/p/itmd36cuhzdfq4za?pid=DGBDTYAYB3PNSGYN )
• Block cache 101(http://hortonworks.com/blog/hbase-blockcache-101/)
• Trooper (https://github.com/regunathb/Trooper)
• Flipkart Phantom (https://github.com/Flipkart/phantom)
• Facebook messages & Hbase
(http://www.slideshare.net/brizzzdotcom/facebook-messages-hbase)

More Related Content

PDF
Aesop change data propagation
ODP
Oss as a competitive advantage
PDF
E commerce data migration in moving systems across data centres
PDF
Building tiered data stores using aesop to bridge sql and no sql systems
PDF
Scalability truths and serverless architectures
PPTX
Hadoop and friends
PPTX
Rebuilding from MongoDB for Scale on HBase
PDF
The Holy Grail of Data Analytics
Aesop change data propagation
Oss as a competitive advantage
E commerce data migration in moving systems across data centres
Building tiered data stores using aesop to bridge sql and no sql systems
Scalability truths and serverless architectures
Hadoop and friends
Rebuilding from MongoDB for Scale on HBase
The Holy Grail of Data Analytics

What's hot (20)

PDF
FOSSASIA 2016 - 7 Tips to design web centric high-performance applications
PDF
Billions of Messages in Real Time: Why Paypal & LinkedIn Trust an Engagement ...
PPTX
Webinar: MongoDB and Hadoop - Working Together to provide Business Insights
PPTX
HBaseCon 2015: Optimizing HBase for the Cloud in Microsoft Azure HDInsight
PDF
HBaseCon2017 Splice Machine as a Service: Multi-tenant HBase using DCOS (Meso...
PDF
Couchbase@live person meetup july 22nd
PDF
Distributed Query Service Powered By Presto & Alluxio Across Clouds @Walmart...
PDF
Enterprise Presto PaaS offering in Google Cloud
PDF
Ramunas Balukonis. Research DWH
PDF
Data streaming-systems
PPTX
NoSQL for SQL Users
PPTX
Stream processing at Hotstar
PPTX
Google mesa
PPTX
Beyond the Basics 1: Storage Engines
PPTX
PDF
Membase Intro from Membase Meetup San Francisco
PDF
Column and hadoop
PPTX
Data Management on Hadoop at Yahoo!
PDF
Practical Use of a NoSQL
PPTX
Migrating from RDBMS to MongoDB Atlas - Texas American Resources Company (TARC)
FOSSASIA 2016 - 7 Tips to design web centric high-performance applications
Billions of Messages in Real Time: Why Paypal & LinkedIn Trust an Engagement ...
Webinar: MongoDB and Hadoop - Working Together to provide Business Insights
HBaseCon 2015: Optimizing HBase for the Cloud in Microsoft Azure HDInsight
HBaseCon2017 Splice Machine as a Service: Multi-tenant HBase using DCOS (Meso...
Couchbase@live person meetup july 22nd
Distributed Query Service Powered By Presto & Alluxio Across Clouds @Walmart...
Enterprise Presto PaaS offering in Google Cloud
Ramunas Balukonis. Research DWH
Data streaming-systems
NoSQL for SQL Users
Stream processing at Hotstar
Google mesa
Beyond the Basics 1: Storage Engines
Membase Intro from Membase Meetup San Francisco
Column and hadoop
Data Management on Hadoop at Yahoo!
Practical Use of a NoSQL
Migrating from RDBMS to MongoDB Atlas - Texas American Resources Company (TARC)
Ad

Viewers also liked (10)

PPTX
Srikanth Nadhamuni
PPTX
PPTX
Aadhaar at 5th_elephant_v3
PPT
practical risks in aadhaar project and measures to overcome them
ODP
Hadoop at aadhaar
PDF
Building the Flipkart phantom
PPTX
Uid
PDF
What database
PPTX
Unique identification authority of india uid
PPTX
Authentication(pswrd,token,certificate,biometric)
Srikanth Nadhamuni
Aadhaar at 5th_elephant_v3
practical risks in aadhaar project and measures to overcome them
Hadoop at aadhaar
Building the Flipkart phantom
Uid
What database
Unique identification authority of india uid
Authentication(pswrd,token,certificate,biometric)
Ad

Similar to Facebook style notifications using hbase and event streams (20)

PPT
29.4 Mb
PPT
29.4 mb
PPT
Big Data and the Next Best Offer
PPTX
Events & Microservices
PDF
Big Data Paris - A Modern Enterprise Architecture
PPTX
Big data technologies with Case Study Finance and Healthcare
PDF
Large scale Click-streaming and tranaction log mining
PDF
IEEE.BigData.Tutorial.2.slides
PDF
RedisConf17 - Redis Powers Next-gen Ambient Intelligence Platform
PPT
ajeet project report filesssssssssssssss
PDF
Online retail a look at data consulting approach
PPTX
Wikibon #IoT #HyperConvergence Presentation via @theCUBE
PPTX
Hyper-Convergence CrowdChat
PDF
Real-Time Recommendations with Hopsworks and OpenSearch - MLOps World 2022
PPTX
WebAction In-Memory Computing Summit 2015
PPTX
Catch Them in the Act: CEP for Real-time Ecommerce Influence
PPT
1 content optimization-hug-2010-07-21
PPTX
Recommender System at Scale Using HBase and Hadoop
PDF
Kafka and Stream Processing, Taking Analytics Real-time, Mike Spicer
29.4 Mb
29.4 mb
Big Data and the Next Best Offer
Events & Microservices
Big Data Paris - A Modern Enterprise Architecture
Big data technologies with Case Study Finance and Healthcare
Large scale Click-streaming and tranaction log mining
IEEE.BigData.Tutorial.2.slides
RedisConf17 - Redis Powers Next-gen Ambient Intelligence Platform
ajeet project report filesssssssssssssss
Online retail a look at data consulting approach
Wikibon #IoT #HyperConvergence Presentation via @theCUBE
Hyper-Convergence CrowdChat
Real-Time Recommendations with Hopsworks and OpenSearch - MLOps World 2022
WebAction In-Memory Computing Summit 2015
Catch Them in the Act: CEP for Real-time Ecommerce Influence
1 content optimization-hug-2010-07-21
Recommender System at Scale Using HBase and Hadoop
Kafka and Stream Processing, Taking Analytics Real-time, Mike Spicer

Recently uploaded (20)

PPTX
SOPHOS-XG Firewall Administrator PPT.pptx
PDF
Unlocking AI with Model Context Protocol (MCP)
PDF
Optimiser vos workloads AI/ML sur Amazon EC2 et AWS Graviton
PDF
cuic standard and advanced reporting.pdf
PDF
Getting Started with Data Integration: FME Form 101
PDF
Blue Purple Modern Animated Computer Science Presentation.pdf.pdf
PPTX
Machine Learning_overview_presentation.pptx
PDF
Reach Out and Touch Someone: Haptics and Empathic Computing
PDF
Dropbox Q2 2025 Financial Results & Investor Presentation
PPTX
20250228 LYD VKU AI Blended-Learning.pptx
PDF
Spectral efficient network and resource selection model in 5G networks
PPT
Teaching material agriculture food technology
PPTX
KOM of Painting work and Equipment Insulation REV00 update 25-dec.pptx
PDF
Advanced methodologies resolving dimensionality complications for autism neur...
PPTX
Tartificialntelligence_presentation.pptx
PDF
Agricultural_Statistics_at_a_Glance_2022_0.pdf
PDF
Architecting across the Boundaries of two Complex Domains - Healthcare & Tech...
PDF
Electronic commerce courselecture one. Pdf
PDF
Mobile App Security Testing_ A Comprehensive Guide.pdf
PDF
The Rise and Fall of 3GPP – Time for a Sabbatical?
SOPHOS-XG Firewall Administrator PPT.pptx
Unlocking AI with Model Context Protocol (MCP)
Optimiser vos workloads AI/ML sur Amazon EC2 et AWS Graviton
cuic standard and advanced reporting.pdf
Getting Started with Data Integration: FME Form 101
Blue Purple Modern Animated Computer Science Presentation.pdf.pdf
Machine Learning_overview_presentation.pptx
Reach Out and Touch Someone: Haptics and Empathic Computing
Dropbox Q2 2025 Financial Results & Investor Presentation
20250228 LYD VKU AI Blended-Learning.pptx
Spectral efficient network and resource selection model in 5G networks
Teaching material agriculture food technology
KOM of Painting work and Equipment Insulation REV00 update 25-dec.pptx
Advanced methodologies resolving dimensionality complications for autism neur...
Tartificialntelligence_presentation.pptx
Agricultural_Statistics_at_a_Glance_2022_0.pdf
Architecting across the Boundaries of two Complex Domains - Healthcare & Tech...
Electronic commerce courselecture one. Pdf
Mobile App Security Testing_ A Comprehensive Guide.pdf
The Rise and Fall of 3GPP – Time for a Sabbatical?

Facebook style notifications using hbase and event streams

  • 1. Facebook Style Notifications Using HBase and Event Streams github.com/regunathb RegunathB
  • 2. Serving User Intent (eCommerce) • Mass targeted (Low relevance) – User Intent Captured from: Browse, Buy, Register • Quantified,Time-bound (Improved relevance) – User Intent Derived from: Category Affinity, Recommendations
  • 3. Serving User Intent (social) Image Source : http://allfacebook.com/ • Near real-time – Quick updates about friends’ actions that most affect you • Relevant Actions – Likes, Comments etc • Personalized – Content only from social circle • Non-invasive – Users therefore tolerate less relevant content as compared to email
  • 4. Notifications on Flipkart Search, Browse Add to Wish List Add to Cart Checkout/Buy User Intent derived from
  • 5. Price Drop Notification iPhone 5C Price =42K Price =44K Price =39K . . Time. t2 t0 t1 t2
  • 6. Solution 1 : Generate Notifications on Demand
  • 7. Gather User Intent Retrieve Current, Past Data Intents Data store • Pros • Perceived optimal resource utilization • Cons • Gathering, Processing and Serving coupled • Read path is computationally expensive • High latency • Need versioning support on Product data • Repeated computations Product Data store Create Notifications on Visits
  • 8. Solution 2 : Pre-create in Real-time, Serve on Demand
  • 9. What Leads to a Notification? Intent (interest expressed by the user) ⋂ Event (price changes ) => Notification (Intersection of millions of User Intent and Product Changes) Intent Event Stream Change Event Stream Notifications
  • 11. SEDA, Filtering using CEP Filtered event processing Intents Product changes Facts, Notifications CEP Engine intermediate stages intermediate stages Extract unique interests
  • 12. The Data Store • Store large sets of data – Products(P) 10s M – Users(U) 10s M – Activity(I = U X P) 100s M – Events/day (E = P + U) 10s M – Notifications (N = E ⋂ I) >100 M (in total) • High write throughput • High read throughput for sets of data – Intents: user pivoted, Facts: product pivoted • Low latency reads – Notifications – user pivoted, ordered by recency
  • 13. The Data Store - HBase U:USERID_A:TIMESTAMP:PRICE_DROP:MOBDSGU2ZMDYENQ U:USERID_B:TIMESTAMP:PRICE_DROP:MOBDSGU2ZMDYENQ U:USERID_B:TIMESTAMP:PRICE_DROP:MOBDQ9VXXXX6NF8V U:USERID_B:TIMESTAMP:PRICE_DROP:MOBDP6W6MCUWCF U:USERID_C:TIMESTAMP:PRICE_DROP:MOBDQ9VXXXX6NF8V LSM Tree Row key design for Notifications table Image Sources : http://blog.sematext.com/, http://dailyjs.com/ • Benefits of keeping related data together – Minimize disk seek for rows read – Rows may be returned from Block cache, MemStore
  • 14. Intent Capturing System Event Processing System Notification Serving System HBase (Intents, Notifications) Product changes append create update expire Event based Pre-processing Near real-time Serving read Tech Stack Trooper Batch W3 via Phanto m Trooper SEDA (RabbitMQ, Mule), CEP (Esper) Phantom Flipcast CeryxTomcat CDNMemcached
  • 15. Tech Stack • Phantom – Reverse proxy for latency sensitive user actions • Trooper Batch – Cron jobs • Trooper SEDA – Distributed, Event processing • FlipCast – Platform agnostic multi-cast notifications • RabbitMQ – Integration, Work distribution • Esper – Complex Event Processing (Filtering/Matching) • HBase – Data store • Tomcat – REST services container for Notifications • Ceryx – Target Group generation, User preferences Flipkart OSS Public domain OSS Closed source
  • 16. Operating Notifications A/B framework Phantom: Intent Capture Phantom: Serve Notifications Trooper Batch : Jobs • Monitoring consoles – RabbitMQ queues – FQ service – Graphite – Nagios – Omniture tracking – Trooper SEDA & Batch consoles
  • 18. Recap • Pros – Low latency read-path, resilience to failure (ok to not show notifications for some users) – Scales well (LSM trees, KV store, SEDA, CDN for images) – Immutable Facts, Change Events stored in append-only data store provides ability to re-compute notifications • Cons – Consistency challenges •HBase has strong consistency (single write master) but Notification source data can change – leading to Eventual Consistency – Pre-creating Notifications that may never be seen (cost of storage)
  • 19. References • HBase : The Definitive Guide (http://www.flipkart.com/hbase-definitive- guide/p/itmd36cuhzdfq4za?pid=DGBDTYAYB3PNSGYN ) • Block cache 101(http://hortonworks.com/blog/hbase-blockcache-101/) • Trooper (https://github.com/regunathb/Trooper) • Flipkart Phantom (https://github.com/Flipkart/phantom) • Facebook messages & Hbase (http://www.slideshare.net/brizzzdotcom/facebook-messages-hbase)