SlideShare a Scribd company logo
Punch clock for
Apache storm
<just an idea>
Punch clock (a.ka. time clock)
Punch clock (a.ka. time clock)
● You have a card per person.
Punch clock (a.ka. time clock)
● You have a card per person.
● The person punches IN with the card when
he/she enters the office.
Punch clock (a.ka. time clock)
● You have a card per person.
● The person punches IN with the card when
he/she enters the office.
● The person punches OUT with the card
when he/she leaves the office.
Punch clock (a.ka. time clock)
● You have a card per person.
● The person punches IN with the card when
he/she enters the office.
● The person punches OUT with the card
when he/she leaves the office.
● The punch clock records the time of
entry/exit on the card
Motivation
To Find out …
Motivation
To Find out …
1. When did the Person enter / exit the office ?
Motivation
To Find out …
1. When did the Person enter / exit the office ?
2. Who is still in office ?
Change of Context …
“Apache Storm”
Tuples going In & Out
of Spouts/Bolts
Motivation
Debugging Apache Storm*
* Debugging Storm
Transactional Topologies
Debugging Transactional Topologies
Debugging Transactional Topologies
1. Spout emits a batch of data(tuples) which forms a
transaction.
Debugging Transactional Topologies
1. Spout emits a batch of data(tuples) which forms a
transaction.
2. Every Bolt in the topology processes that batch of data
(tuples).
Motivation
To Find out …
Motivation
To Find out …
1. When did the batch enter/exit the Spout/Bolt ?
Motivation
To Find out …
1. When did the batch enter/exit the Spout/Bolt ?
2. Which batch is still in the Spout/Bolt? i.e. are any batches STUCK ?
Motivation
To Find out …
1. When did the batch enter/exit the Spout/Bolt ?
2. Which batch is still in the Spout/Bolt? i.e. are any batches STUCK ?
a. On which host are they stuck ?
b. In which Spout/Bolt are they stuck ?
Possible Solution(s):
Possible Solution(s):
Add a log statement before and after the critical section.
Possible Solution(s):
Add a log statement before and after the critical section.
log.info(“Inserting data into database ….”); // ← entering
datasource.insert(table, tuples); // ←the real work
log.info(“Inserted data into database.”); //← exiting
Possible Solution(s):
Add a log statement before and after the critical section.
log.info(“Inserting data into database ….”); // ← entering
datasource.insert(table, tuples); // ←the real work
log.info(“Inserted data into database.”); //← exiting
------------------------------------------------------------------
Cons: Logs distributed over multiple hosts, need to aggregate logs. needs a bit of work,
Elastic Search Kibana ?
Possible Solution(s):
Use http://riemann.io/index.html
This was Suggested by my friend angad. I have not looked at this though.
My Idea
Batch of Tuples Punch IN and Punch Out in a bolt / spout.
My Idea
Batch of Tuples Punch IN and Punch Out in a bolt / spout.
Punch In - Put into hashmap (or any other suitable data structure)
Punch Out - Remove from hashmap (or any other suitable data structure)
My Idea:
Batch of Tuples Punch In and Punch Out in a spout.
In the emitBatch of Transactional Spout:
PunchClock.getInstance().punchIn(punchCardId); // ←Punch In
collector.emit(tuples); // ←Emit tuple(s)
PunchClock.getInstance().punchOut(punchCardId); // ←Punch Out
Batch of Tuples Punch IN and Punch Out in a bolt .
In the prepare method of Transactional Bolt:
punchCardId ="Bolt__"+Thread.currentThread().getId()+"__"+System.currentTimeMillis(); // ←Create Punch
Card for txn
In the execute method of Transactional Bolt:
PunchClock.getInstance().punchIn(punchCardId); // ← Punch In
In the finishBatch method of Transactional Bolt:
PunchClock.getInstance().punchOut(punchCardId); // ← Punch Out
My Idea:
Yes,
but it’s a simple Put / Remove call to a hashmap.
When compared to logging it’s cheaper
Is it intrusive ?
Punch Clocks
Punch Clocks
● Spouts / Bolts housed in a storm worker jvm.
Punch Clocks
● Spouts / Bolts housed in a storm worker jvm.
● One Punch Clock per JVM.
Punch Clocks
● Spouts / Bolts housed in a storm worker jvm.
● One Punch Clock per JVM.
● Since we have multiple JVM we have multiple Punch Clocks.
Punch Clocks
● Spouts / Bolts housed in a storm worker jvm.
● One Punch Clock per JVM.
● Since we have multiple JVM we have multiple Punch Clocks.
● Batches move across storm workers & we have multiple JVM,
○ We need to aggregate the data across Punch Clocks.
○ Expose Punch Clock via JMX.
Punch clock for  debugging apache storm
demo:
Punch clock for  debugging apache storm
thank you
jaihind213@gmail.com
https://github.com/jaihind213/storm-punch-clock
sweetweet213@twitter

More Related Content

PPTX
Hypercritical C++ Code Review
PDF
a wild Supposition: can MySQL be Kafka ?
PDF
Build your own Real Time Analytics and Visualization, Enable Complex Event Pr...
PPTX
Do you need microservices architecture?
PDF
Demystifying datastores
PDF
Visualising Basic Concepts of Docker
PDF
Spring IO '15 - Developing microservices, Spring Boot or Grails?
PDF
Let's Go: Introduction to Google's Go Programming Language
Hypercritical C++ Code Review
a wild Supposition: can MySQL be Kafka ?
Build your own Real Time Analytics and Visualization, Enable Complex Event Pr...
Do you need microservices architecture?
Demystifying datastores
Visualising Basic Concepts of Docker
Spring IO '15 - Developing microservices, Spring Boot or Grails?
Let's Go: Introduction to Google's Go Programming Language

Viewers also liked (7)

PDF
Software Design in Practice (with Java examples)
PDF
Microservices with Spring Boot
PDF
Microservices with Java, Spring Boot and Spring Cloud
PDF
Microservice With Spring Boot and Spring Cloud
PDF
Bangalore Container Conference 2017 - Poster
PDF
Docker by Example - Basics
PPTX
Spring boot
Software Design in Practice (with Java examples)
Microservices with Spring Boot
Microservices with Java, Spring Boot and Spring Cloud
Microservice With Spring Boot and Spring Cloud
Bangalore Container Conference 2017 - Poster
Docker by Example - Basics
Spring boot
Ad

More from vishnu rao (7)

PDF
Assessing Data Pipeline Quality & Sanity with Data Angiograms.pdf
PDF
A talk on mysql & aurora
PDF
Introduction to Apache Kafka
PDF
Mysql Relay log - the unsung hero
PDF
simple introduction to hadoop
PPTX
Druid beginner performance tips
PDF
StormWars - when the data stream shrinks
Assessing Data Pipeline Quality & Sanity with Data Angiograms.pdf
A talk on mysql & aurora
Introduction to Apache Kafka
Mysql Relay log - the unsung hero
simple introduction to hadoop
Druid beginner performance tips
StormWars - when the data stream shrinks
Ad

Recently uploaded (20)

PPTX
Current and future trends in Computer Vision.pptx
PPTX
Internet of Things (IOT) - A guide to understanding
PDF
III.4.1.2_The_Space_Environment.p pdffdf
PDF
Mohammad Mahdi Farshadian CV - Prospective PhD Student 2026
PPTX
Engineering Ethics, Safety and Environment [Autosaved] (1).pptx
PPTX
UNIT 4 Total Quality Management .pptx
PPTX
OOP with Java - Java Introduction (Basics)
PPTX
Construction Project Organization Group 2.pptx
DOCX
ASol_English-Language-Literature-Set-1-27-02-2023-converted.docx
PPTX
Geodesy 1.pptx...............................................
PDF
Well-logging-methods_new................
PPTX
FINAL REVIEW FOR COPD DIANOSIS FOR PULMONARY DISEASE.pptx
PDF
The CXO Playbook 2025 – Future-Ready Strategies for C-Suite Leaders Cerebrai...
PDF
BIO-INSPIRED HORMONAL MODULATION AND ADAPTIVE ORCHESTRATION IN S-AI-GPT
PPTX
Fundamentals of safety and accident prevention -final (1).pptx
PDF
Evaluating the Democratization of the Turkish Armed Forces from a Normative P...
PPTX
Sustainable Sites - Green Building Construction
PPTX
Infosys Presentation by1.Riyan Bagwan 2.Samadhan Naiknavare 3.Gaurav Shinde 4...
PDF
PPT on Performance Review to get promotions
PDF
SM_6th-Sem__Cse_Internet-of-Things.pdf IOT
Current and future trends in Computer Vision.pptx
Internet of Things (IOT) - A guide to understanding
III.4.1.2_The_Space_Environment.p pdffdf
Mohammad Mahdi Farshadian CV - Prospective PhD Student 2026
Engineering Ethics, Safety and Environment [Autosaved] (1).pptx
UNIT 4 Total Quality Management .pptx
OOP with Java - Java Introduction (Basics)
Construction Project Organization Group 2.pptx
ASol_English-Language-Literature-Set-1-27-02-2023-converted.docx
Geodesy 1.pptx...............................................
Well-logging-methods_new................
FINAL REVIEW FOR COPD DIANOSIS FOR PULMONARY DISEASE.pptx
The CXO Playbook 2025 – Future-Ready Strategies for C-Suite Leaders Cerebrai...
BIO-INSPIRED HORMONAL MODULATION AND ADAPTIVE ORCHESTRATION IN S-AI-GPT
Fundamentals of safety and accident prevention -final (1).pptx
Evaluating the Democratization of the Turkish Armed Forces from a Normative P...
Sustainable Sites - Green Building Construction
Infosys Presentation by1.Riyan Bagwan 2.Samadhan Naiknavare 3.Gaurav Shinde 4...
PPT on Performance Review to get promotions
SM_6th-Sem__Cse_Internet-of-Things.pdf IOT

Punch clock for debugging apache storm