SlideShare a Scribd company logo
Realtime Classroom Analytics
Powered By Apache Druid
Karthik Deivasigamani, Chief Architect, Noon - The Social Learning Platform
Agenda
● Who We Are
● Live Online Classroom
● Quality Of Experience
● Why Apache Druid
● Realtime Classroom Monitoring
● Key Lessons
● Q & A
Who Are We?
Noon has evolved into a ‘Social Learning’
platform three years ago to craft the most
engaging learning experience.
● Our mission is to radically change the
way people learn.
● Make learning more social and fun.
● 10M+ users from over 5 countries
● 1M+ MAU with 50+ mins per active day
per student
Live Online Classroom
Students spend a significant amount of their
time on Noon learning from their teacher
within the online classrooms.
Classroom Features
● Video, Audio, Chat and Whiteboard
● Breakouts, Raise Hand
● Peak 10K students / session
Live Classroom - Challenges
Audio
Voice is broken
● Teacher’s uplink quality
● Issues with microphone
● Student’s downlink
quality
● ISP policies
Whiteboard
Lag in whiteboard
● Loss of drawing events
due to unstable network
● Heavy CPU usage on the
mobile device
● Software Bug
Quality Of Experience
“Quality of experience is a measure
of the delight or annoyance of a
customer's experiences with a
service.” - Wikipedia
Monitoring The Classroom
Metrics
● Uplink/Downlink Network Quality
● Packet Loss
● Remote/Local Audio Quality
● Mic Status
● Jitter Buffer Delay
● frameFrozenRate
● Uplink/Downlink BitRate
Dimensions
● Country
● Region
● City
● Session
● User
● ISP
● Network Type
Aggregations
● Percentile
● Count
● Average
● Distinct Count
● Standard Deviation
System Characteristics
● Real Time Ingestion
● Scale Horizontally
● High Cardinality Data
● Subsecond Query Latency
● Fast Aggregation
● Zoom In & Zoom Out
● Highly Available
Why Apache Druid
● Real Time Ingestion From Kafka Through Spec Files
● Data & Query Nodes Allows For Horizontal Scaling
● Sketches For High Cardinality Columns
● Low-Latency Querying
● Rich Built In Capabilities For Exact & Approx Aggregation
● Data Rollups
● Fault Tolerance At Multiple Levels
Data Collection - Network & Audio
WebRTC Stats
Sent BitRate
Received BitRate
Audio Packet Loss
Audio Level
Bytes Sent/Received
Audio Frame Freeze Rate
Network Quality
Audio Quality
Data Collection - Whiteboard
Whiteboard Stats
Stroke Difference
Drift Percentage
Ingestion
● All ingestions happen via Kafka in real
time
● Flink Topology
● Split & Format to conform with
ingestion spec
● Rollup Enabled At Ingestion Time
● Conditional transformation
● Looking forward to using Lag Based
AutoScaler.
Making Ingestion Easy
● Well defined event (ProtoBuf) schema
serialized as JSON.
● Jsonpath based DSL defining
transformers & ingestion spec.
● Parsing & Transformation based on
the configuration file in a flink
topology.
● Ingestion Spec Auto Generated from
JSON configuration file.
● Automated Deployments Via Jenkins
Schema Design
● Always start from your use-cases.
● Identify Dimensions & Metrics
● Aggregations & Approximation (hyperloglog,
quantiles sketches)
● Query Granularity
● Partitions
● Deep Storage
● Data Retention
Self Serve Dashboard - Zoom Out & Zoom In
Country Level View
Sessions Inside A
Country
Session Level View
Students Inside A
Session View
Student Session
Level View
Our Druid Cluster
Topology
● Master (m5.2xl)
● Data Node (i3.2xl)
○ Tiered
○ 24 slots
● Query Node (m5.2xl)
● External ZK, MySQL, S3
Deep Storage
Monitoring Numbers
● Datadog-Druid
● System Resources
● Ingestion Lag
● Number of Segments
● Query Time
● JVM Memory Usage
● 15+ dims, 50+ metrics
● 105 M events per day
● 2B rows @ Avg Row Size
1K
● 4k-5k Segment
● p90 latency ~ 850 ms
Putting Together
Business Impact
● Quickly Identify Problems
● Validation of fixes put in to improve quality
● Self Serve Tool, reducing burden on
developers
● Improved transparency & trust between
OPS and developers
● Student NPS score improved
Challenges & Key Lessons
● Rollups are your best friend
● Ingestion Time Transformation > Query Time
Transformation
● Approximation - Hyperloglog, Data Sketches
● Late Arrival Of Messages & Compaction
● Query Performance depends on your data model
● Setup takes time to stabilize.
● druid-user group is super helpful!
Questions?
Thank you
Contact: karthik@noonacademy.com

More Related Content

PPT
Tale of two streaming frameworks- Apace Storm & Apache Flink
PDF
Clickstream Analysis with Spark—Understanding Visitors in Realtime by Josef A...
PDF
SplunkLive! Munich 2018: Siemens Security Use Case
PPTX
Exactly-Once Financial Data Processing at Scale with Flink and Pinot
PPTX
Evening out the uneven: dealing with skew in Flink
PPTX
Kafka presentation
PPTX
Netflix Data Pipeline With Kafka
PDF
Introducing the Apache Flink Kubernetes Operator
Tale of two streaming frameworks- Apace Storm & Apache Flink
Clickstream Analysis with Spark—Understanding Visitors in Realtime by Josef A...
SplunkLive! Munich 2018: Siemens Security Use Case
Exactly-Once Financial Data Processing at Scale with Flink and Pinot
Evening out the uneven: dealing with skew in Flink
Kafka presentation
Netflix Data Pipeline With Kafka
Introducing the Apache Flink Kubernetes Operator

What's hot (20)

PDF
CDC Stream Processing With Apache Flink With Timo Walther | Current 2022
PPTX
Airflow - a data flow engine
PDF
Performance Engineering Masterclass: Introduction to Modern Performance
PDF
Apache Kafka’s Transactions in the Wild! Developing an exactly-once KafkaSink...
PPTX
Stephan Ewen - Experiences running Flink at Very Large Scale
PDF
Building Streaming Data Pipelines with Google Cloud Dataflow and Confluent Cl...
PDF
Flink Forward San Francisco 2019: Massive Scale Data Processing at Netflix us...
PPTX
Apache Spark Architecture | Apache Spark Architecture Explained | Apache Spar...
PPTX
“Alexa, be quiet!”: End-to-end near-real time model building and evaluation i...
KEY
Rainbird: Realtime Analytics at Twitter (Strata 2011)
PPTX
Apache Flink in the Cloud-Native Era
PDF
Flink Forward Berlin 2017: Stefan Richter - A look at Flink's internal data s...
PDF
Scaling Apache Pulsar to 10 Petabytes/Day
PPTX
Google cloud Dataflow & Apache Flink
PDF
Apache Kafka Architecture & Fundamentals Explained
PDF
Kubeflow
PDF
Desenvolvendo Apps Acessíveis com React Native no Nubank
PDF
Integrating PostgreSql with RabbitMQ
PPTX
Building Reliable Lakehouses with Apache Flink and Delta Lake
PPTX
REST vs gRPC: Battle of API's
CDC Stream Processing With Apache Flink With Timo Walther | Current 2022
Airflow - a data flow engine
Performance Engineering Masterclass: Introduction to Modern Performance
Apache Kafka’s Transactions in the Wild! Developing an exactly-once KafkaSink...
Stephan Ewen - Experiences running Flink at Very Large Scale
Building Streaming Data Pipelines with Google Cloud Dataflow and Confluent Cl...
Flink Forward San Francisco 2019: Massive Scale Data Processing at Netflix us...
Apache Spark Architecture | Apache Spark Architecture Explained | Apache Spar...
“Alexa, be quiet!”: End-to-end near-real time model building and evaluation i...
Rainbird: Realtime Analytics at Twitter (Strata 2011)
Apache Flink in the Cloud-Native Era
Flink Forward Berlin 2017: Stefan Richter - A look at Flink's internal data s...
Scaling Apache Pulsar to 10 Petabytes/Day
Google cloud Dataflow & Apache Flink
Apache Kafka Architecture & Fundamentals Explained
Kubeflow
Desenvolvendo Apps Acessíveis com React Native no Nubank
Integrating PostgreSql with RabbitMQ
Building Reliable Lakehouses with Apache Flink and Delta Lake
REST vs gRPC: Battle of API's
Ad

Similar to Realtime classroom analytics powered by apache druid (20)

PDF
PDF
Apache Druid 101
PDF
Druid Adoption Tips and Tricks
PDF
Game Analytics at London Apache Druid Meetup
PDF
Real-time analytics with Druid at Appsflyer
PPTX
Druid Optimizations for Scaling Customer Facing Analytics
PDF
Druid meetup @ Netflix (11/14/2018 )
PDF
How Netflix Uses Druid in Real-time to Ensure a High Quality Streaming Experi...
PDF
A Trifecta of Real-Time Applications: Apache Kafka, Flink, and Druid
PPTX
Understanding apache-druid
PDF
20th Athens Big Data Meetup - 1st Talk - Druid: the open source, performant, ...
PDF
How To Use Kafka and Druid to Tame Your Router Data (Rachel Pedreschi and Eri...
PDF
How To Use Kafka and Druid to Tame Your Router Data (Rachel Pedreschi, Imply ...
PDF
Druid at Strata Conf NY 2016.pdf
PDF
Data Con LA 2018 - Enabling real-time exploration and analytics at scale at H...
PDF
A Day in the Life of a Druid Implementor and Druid's Roadmap
PDF
Aggregated queries with Druid on terrabytes and petabytes of data
PDF
Druid: Under the Covers (Virtual Meetup)
PPT
Counting Unique Users in Real-Time: Here's a Challenge for You!
PPTX
Our journey with druid - from initial research to full production scale
Apache Druid 101
Druid Adoption Tips and Tricks
Game Analytics at London Apache Druid Meetup
Real-time analytics with Druid at Appsflyer
Druid Optimizations for Scaling Customer Facing Analytics
Druid meetup @ Netflix (11/14/2018 )
How Netflix Uses Druid in Real-time to Ensure a High Quality Streaming Experi...
A Trifecta of Real-Time Applications: Apache Kafka, Flink, and Druid
Understanding apache-druid
20th Athens Big Data Meetup - 1st Talk - Druid: the open source, performant, ...
How To Use Kafka and Druid to Tame Your Router Data (Rachel Pedreschi and Eri...
How To Use Kafka and Druid to Tame Your Router Data (Rachel Pedreschi, Imply ...
Druid at Strata Conf NY 2016.pdf
Data Con LA 2018 - Enabling real-time exploration and analytics at scale at H...
A Day in the Life of a Druid Implementor and Druid's Roadmap
Aggregated queries with Druid on terrabytes and petabytes of data
Druid: Under the Covers (Virtual Meetup)
Counting Unique Users in Real-Time: Here's a Challenge for You!
Our journey with druid - from initial research to full production scale
Ad

Recently uploaded (20)

PDF
A comparative study of natural language inference in Swahili using monolingua...
PDF
Univ-Connecticut-ChatGPT-Presentaion.pdf
PPTX
cloud_computing_Infrastucture_as_cloud_p
PPTX
SOPHOS-XG Firewall Administrator PPT.pptx
PPTX
A Presentation on Artificial Intelligence
PPTX
Machine Learning_overview_presentation.pptx
PDF
Accuracy of neural networks in brain wave diagnosis of schizophrenia
PDF
Getting Started with Data Integration: FME Form 101
PDF
Blue Purple Modern Animated Computer Science Presentation.pdf.pdf
PDF
NewMind AI Weekly Chronicles - August'25-Week II
PDF
Encapsulation_ Review paper, used for researhc scholars
PDF
TokAI - TikTok AI Agent : The First AI Application That Analyzes 10,000+ Vira...
PPTX
Tartificialntelligence_presentation.pptx
PPTX
KOM of Painting work and Equipment Insulation REV00 update 25-dec.pptx
PDF
Building Integrated photovoltaic BIPV_UPV.pdf
PDF
Video forgery: An extensive analysis of inter-and intra-frame manipulation al...
PDF
Network Security Unit 5.pdf for BCA BBA.
PDF
A comparative analysis of optical character recognition models for extracting...
PPTX
Group 1 Presentation -Planning and Decision Making .pptx
PPTX
1. Introduction to Computer Programming.pptx
A comparative study of natural language inference in Swahili using monolingua...
Univ-Connecticut-ChatGPT-Presentaion.pdf
cloud_computing_Infrastucture_as_cloud_p
SOPHOS-XG Firewall Administrator PPT.pptx
A Presentation on Artificial Intelligence
Machine Learning_overview_presentation.pptx
Accuracy of neural networks in brain wave diagnosis of schizophrenia
Getting Started with Data Integration: FME Form 101
Blue Purple Modern Animated Computer Science Presentation.pdf.pdf
NewMind AI Weekly Chronicles - August'25-Week II
Encapsulation_ Review paper, used for researhc scholars
TokAI - TikTok AI Agent : The First AI Application That Analyzes 10,000+ Vira...
Tartificialntelligence_presentation.pptx
KOM of Painting work and Equipment Insulation REV00 update 25-dec.pptx
Building Integrated photovoltaic BIPV_UPV.pdf
Video forgery: An extensive analysis of inter-and intra-frame manipulation al...
Network Security Unit 5.pdf for BCA BBA.
A comparative analysis of optical character recognition models for extracting...
Group 1 Presentation -Planning and Decision Making .pptx
1. Introduction to Computer Programming.pptx

Realtime classroom analytics powered by apache druid

  • 1. Realtime Classroom Analytics Powered By Apache Druid Karthik Deivasigamani, Chief Architect, Noon - The Social Learning Platform
  • 2. Agenda ● Who We Are ● Live Online Classroom ● Quality Of Experience ● Why Apache Druid ● Realtime Classroom Monitoring ● Key Lessons ● Q & A
  • 3. Who Are We? Noon has evolved into a ‘Social Learning’ platform three years ago to craft the most engaging learning experience. ● Our mission is to radically change the way people learn. ● Make learning more social and fun. ● 10M+ users from over 5 countries ● 1M+ MAU with 50+ mins per active day per student
  • 4. Live Online Classroom Students spend a significant amount of their time on Noon learning from their teacher within the online classrooms. Classroom Features ● Video, Audio, Chat and Whiteboard ● Breakouts, Raise Hand ● Peak 10K students / session
  • 5. Live Classroom - Challenges Audio Voice is broken ● Teacher’s uplink quality ● Issues with microphone ● Student’s downlink quality ● ISP policies Whiteboard Lag in whiteboard ● Loss of drawing events due to unstable network ● Heavy CPU usage on the mobile device ● Software Bug
  • 6. Quality Of Experience “Quality of experience is a measure of the delight or annoyance of a customer's experiences with a service.” - Wikipedia
  • 7. Monitoring The Classroom Metrics ● Uplink/Downlink Network Quality ● Packet Loss ● Remote/Local Audio Quality ● Mic Status ● Jitter Buffer Delay ● frameFrozenRate ● Uplink/Downlink BitRate Dimensions ● Country ● Region ● City ● Session ● User ● ISP ● Network Type Aggregations ● Percentile ● Count ● Average ● Distinct Count ● Standard Deviation
  • 8. System Characteristics ● Real Time Ingestion ● Scale Horizontally ● High Cardinality Data ● Subsecond Query Latency ● Fast Aggregation ● Zoom In & Zoom Out ● Highly Available
  • 9. Why Apache Druid ● Real Time Ingestion From Kafka Through Spec Files ● Data & Query Nodes Allows For Horizontal Scaling ● Sketches For High Cardinality Columns ● Low-Latency Querying ● Rich Built In Capabilities For Exact & Approx Aggregation ● Data Rollups ● Fault Tolerance At Multiple Levels
  • 10. Data Collection - Network & Audio WebRTC Stats Sent BitRate Received BitRate Audio Packet Loss Audio Level Bytes Sent/Received Audio Frame Freeze Rate Network Quality Audio Quality
  • 11. Data Collection - Whiteboard Whiteboard Stats Stroke Difference Drift Percentage
  • 12. Ingestion ● All ingestions happen via Kafka in real time ● Flink Topology ● Split & Format to conform with ingestion spec ● Rollup Enabled At Ingestion Time ● Conditional transformation ● Looking forward to using Lag Based AutoScaler.
  • 13. Making Ingestion Easy ● Well defined event (ProtoBuf) schema serialized as JSON. ● Jsonpath based DSL defining transformers & ingestion spec. ● Parsing & Transformation based on the configuration file in a flink topology. ● Ingestion Spec Auto Generated from JSON configuration file. ● Automated Deployments Via Jenkins
  • 14. Schema Design ● Always start from your use-cases. ● Identify Dimensions & Metrics ● Aggregations & Approximation (hyperloglog, quantiles sketches) ● Query Granularity ● Partitions ● Deep Storage ● Data Retention
  • 15. Self Serve Dashboard - Zoom Out & Zoom In Country Level View Sessions Inside A Country Session Level View Students Inside A Session View Student Session Level View
  • 16. Our Druid Cluster Topology ● Master (m5.2xl) ● Data Node (i3.2xl) ○ Tiered ○ 24 slots ● Query Node (m5.2xl) ● External ZK, MySQL, S3 Deep Storage Monitoring Numbers ● Datadog-Druid ● System Resources ● Ingestion Lag ● Number of Segments ● Query Time ● JVM Memory Usage ● 15+ dims, 50+ metrics ● 105 M events per day ● 2B rows @ Avg Row Size 1K ● 4k-5k Segment ● p90 latency ~ 850 ms
  • 18. Business Impact ● Quickly Identify Problems ● Validation of fixes put in to improve quality ● Self Serve Tool, reducing burden on developers ● Improved transparency & trust between OPS and developers ● Student NPS score improved
  • 19. Challenges & Key Lessons ● Rollups are your best friend ● Ingestion Time Transformation > Query Time Transformation ● Approximation - Hyperloglog, Data Sketches ● Late Arrival Of Messages & Compaction ● Query Performance depends on your data model ● Setup takes time to stabilize. ● druid-user group is super helpful!