SlideShare a Scribd company logo
Norikra: 
SQL Stream Processing 
In Ruby 
2014/11/19 
RubyConf 2014 DAY 3 
Satoshi Tagomori (@tagomoris)
Topics 
Why I wrote Norikra 
Norikra overview 
Norikra queries 
Use cases in production 
JRuby for me
Satoshi Tagomori (@tagomoris) 
Tokyo, Japan 
LINE Corporation
Norikra: SQL Stream Processing In Ruby
Norikra: SQL Stream Processing In Ruby
Norikra: SQL Stream Processing In Ruby
Monitoring/Data Analytics Overview 
collect parse 
clean up 
process 
visualize 
Access logs, store process 
Application logs, ...
Norikra: SQL Stream Processing In Ruby
collect parse 
clean up 
process 
visualize 
store process
Norikra: SQL Stream Processing In Ruby
collect parse 
clean up 
process 
visualize 
store process
Norikra: SQL Stream Processing In Ruby
Fluentd stream aggregation: 
Good for simple data/calculation 
collect parse 
clean up 
process 
visualize 
store process
Our services: 
More and more different services 
Many changes in a day (including logging) 
Many kind of logs for each services 
Many different metrics for each services
Fluentd stream aggregation: 
Not good for processing 
about complex/fragile environment... 
collect parse 
clean up 
process 
visualize 
store process
We want to: 
add/remove queries anytime we want 
write many queries for a service log stream 
ignore events without data we want 
make our service directors / growth hackers to write their own 
queries!
collect parse 
clean up 
process 
visualize 
store process
break.
Norikra: SQL Stream Processing In Ruby
Norikra: 
Schema-less Stream Processing with SQL 
Server software, written in JRuby, runs on JVM 
Open source software (GPLv2) 
http://norikra.github.io/ 
https://github.com/norikra/norikra
How To Setup Norikra: 
Install JRuby 
download jruby.tar.gz, extract it and export $PATH 
use rbenv 
rbenv install jruby-1.7.xx 
rbenv shell jruby-.. 
Install Norikra 
gem install norikra 
Execute Norikra server 
norikra start
Norikra Interface: 
CLI client/Client library: norikra-client 
norikra-client target open ... 
norikra-client query add ... 
tail -f ... | norikra-client event send ... 
WebUI 
show status 
show/add/remove queries 
HTTP API 
JSON, MessagePack
Norikra: 
Schema-less event stream: 
Add/Remove data fields whenever you want 
SQL: 
No more restarts to add/remove queries 
w/ JOINs, w/ SubQueries 
w/ UDF (in Java/Ruby as rubygems) 
Truly Complex events: 
Nested Hash/Array, accessible directly from SQL
Norikra Queries: (1) 
SELECT name, age 
FROM events 
target
Norikra Queries: (1) 
{“name”:”tagomoris”, 
“age”:35, “address”:”Tokyo”, 
“corp”:”LINE”, “current”:”San Diego”} 
SELECT name, age 
FROM events 
{“name”:”tagomoris”,”age”:35}
Norikra Queries: (1) 
{“name”:”tagomoris”, 
“address”:”Tokyo”, 
“corp”:”LINE”, “current”:”San Diego”} 
without “age” 
SELECT name, age 
FROM events 
nothing
Norikra Queries: (2) 
{“name”:”tagomoris”, 
“address”:”Tokyo”, 
“corp”:”LINE”, “current”:”San Diego”} 
SELECT name, age 
FROM events 
WHERE current=”San Diego” 
{“name”:”tagomoris”,”age”:35}
Norikra Queries: (2) 
{“name”:”nobu”, 
“age”:0, “address”:”Somewhere”, 
“corp”:”Heroku”, “current”:”SAN”} 
current is not “San Diego” 
SELECT name, age 
FROM events 
WHERE current=”San Diego” 
nothing
Norikra Queries: (3) 
SELECT age, COUNT(*) as cnt 
FROM events.win:time_batch(5 mins) 
GROUP BY age
Norikra Queries: (3) 
{“name”:”tagomoris”, 
“address”:”Tokyo”, 
“corp”:”LINE”, “current”:”San Diego”} 
SELECT age, COUNT(*) as cnt 
FROM events.win:time_batch(5 mins) 
GROUP BY age 
every 5 mins 
{”age”:35,”cnt”:3}, {“age”:33,”cnt”:1}, ...
Norikra Queries: (4) 
{“name”:”tagomoris”, 
“address”:”Tokyo”, 
“corp”:”LINE”, “current”:”San Diego”} 
SELECT age, COUNT(*) as cnt 
FROM 
events.win:time_batch(5 mins) 
GROUP BY age 
{”age”:35,”cnt”:3}, 
{“age”:33,”cnt”:1}, 
... 
SELECT max(age) as max 
FROM 
events.win:time_batch(5 mins) 
{“max”:51} 
every 5 mins
Norikra Queries: (5) 
{“name”:”tagomoris”, 
“user:{“age”:35, “corp”:”LINE”, 
“address”:”Tokyo”}, 
“current”:”San Diego”, 
“speaker”:true, 
“attend”:[true,true,false, ...] 
} 
SELECT age, COUNT(*) as cnt 
FROM events.win:time_batch(5 mins) 
GROUP BY age
Norikra Queries: (5) 
{“name”:”tagomoris”, 
“user:{“age”:35, “corp”:”LINE”, 
“address”:”Tokyo”}, 
“current”:”San Diego”, 
“speaker”:true, 
“attend”:[true,true,false, ...] 
} 
SELECT user.age, COUNT(*) as cnt 
FROM events.win:time_batch(5 mins) 
GROUP BY user.age
Norikra Queries: (5) 
{“name”:”tagomoris”, 
“user:{“age”:35, “corp”:”LINE”, 
“address”:”Tokyo”}, 
“current”:”San Diego”, 
“speaker”:true, 
“attend”:[true,true,false, ...] 
} 
SELECT user.age, COUNT(*) as cnt 
FROM events.win:time_batch(5 mins) 
WHERE current=”San Diego” 
AND attend.$0 AND attend.$1 
GROUP BY user.age
break. 
next: use cases
Use case 1: 
External API call reports for partners (LINE) 
External API call for LINE Business Connect 
LINE backend sends requests to partner’s API 
endpoint using users’ messages 
http://developers.linecorp.com/blog/?p=3386
Use case 1: 
External API call reports for partners (LINE) 
channel 
gateway 
partner’s 
server 
logs 
query 
results 
MySQL Mail 
SELECT 
channelId 
AS 
channel_id, 
reason, 
detail, 
count(*) 
AS 
error_count, 
min(timestamp) 
AS 
first_timestamp, 
max(timestamp) 
AS 
last_timestamp 
FROM 
api_error_log.win:time_batch(60 
sec) 
GROUP 
BY 
channelId,reason,detail 
HAVING 
count(*) 
> 
0 
http://developers.linecorp.com/blog/?p=3386
Use case 1: 
External API call reports for partners (LINE) 
API error response summaries 
http://developers.linecorp.com/blog/?p=3386
Use case 2: Lambda architecture 
Prompt reports for Ad service console 
Prompt reports with Norikra + Fixed reports with Hive 
app 
serverapp 
serverapp 
server 
app 
serverapp 
serverapp 
server 
Fluentd 
HDFS 
console 
service 
execute hive query 
(daily) 
fetch query results 
(frequently) 
impression 
logs
Use case 2: 
Prompt reports for Ad service console 
SELECT 
yyyymmdd, 
hh, 
campaign_id, 
region, 
lang, 
COUNT(*) 
AS 
click, 
COUNT(DISTINCT 
member_id) 
AS 
uu 
FROM 
( 
SELECT 
yyyymmdd, 
hh, 
get_json_object(log, 
'$.campaign.id') 
AS 
campaign_id, 
get_json_object(log, 
'$.member.region') 
AS 
region, 
get_json_object(log, 
'$.member.lang') 
AS 
lang, 
get_json_object(log, 
'$.member.id') 
AS 
member_id 
FROM 
applog 
WHERE 
service='myservice' 
AND 
yyyymmdd='20140913' 
AND 
get_json_object(log, 
'$.type')='click' 
) 
x 
GROUP 
BY 
yyyymmdd, 
hh, 
campaign_id, 
region, 
lang 
Hive query 
for fixed reports
Use case 2: 
Prompt reports for Ad service console 
Norikra query for prompt reports 
SELECT 
campaign.id 
AS 
campaign_id, 
member.region 
AS 
region, 
member.lang 
AS 
lang, 
COUNT(*) 
AS 
click, 
COUNT(DISTINCT 
member.id) 
AS 
uu 
FROM 
myservice.win:time_batch(1 
hours) 
WHERE 
type="click" 
GROUP 
BY 
campaign.id, 
member.region, 
member.lang
Use case 3: 
Realtime access dashboard on Google Platform 
Access log visualization 
Count using Norikra (2-step), Store on Google BigQuery 
Dashboard on Google Spreadsheet + Apps Script 
http://qiita.com/kazunori279/items/6329df57635799405547 
https://www.youtube.com/watch?v=EZkw5TDcCGw
Use case 3: 
Realtime access dashboard on Google Platform 
Server 
Fluentd 
http://qiita.com/kazunori279/items/6329df57635799405547 
https://www.youtube.com/watch?v=EZkw5TDcCGw 
ngnix 
access log 
access logs 
to BigQuery 
norikra query results 
norikra query to aggregate node 
to aggregate locally
Use case 3: 
Realtime access dashboard on Google Platform 
70 servers, 120,000 requests/sec (or more!) 
Fluentd 
logs to store 
http://qiita.com/kazunori279/items/6329df57635799405547 
https://www.youtube.com/watch?v=EZkw5TDcCGw 
ngnix 
ngngninxix ngngninxix ngngninxix ngngninxix 
ngngninxix ngngninxix ngngninxix ngngninxix ngnix 
Google 
BigQuery 
Google 
Spreadsheet 
+ Apps script 
... 
counts per host 
total count
Why Norikra is written in JRuby 
Esper 
CEP(Complex Event Processing) library, written in Java 
Rubygems.org 
Open repository, for public UDF plugins of Norikra provided as gem
JRuby for me 
Ruby! (by great JRuby developer team!) 
makes developing Norikra dramatically faster 
with rubygems and rubygems.org for easy deployment/installation 
with Java libraries, ex: Jetty, Esper, ... 
There are not so many users in Tokyo :(
More queries, more simplicity 
and less latency 
in data processing 
Thanks! 
photo: by my co-workers 
http://norikra.github.io/ 
https://github.com/norikra/norikra
See also: 
http://norikra.github.io/ 
“Lambda Architecture Platform Using SQL” 
http://www.slideshare.net/tagomoris/lambda-architecture-using-sql-hadoopcon- 
2014-taiwan 
“Stream processing and Norikra” 
http://www.slideshare.net/tagomoris/stream-processing-and-norikra 
“Batch processing and Stream processing by SQL” 
http://www.slideshare.net/tagomoris/hcj2014-sql 
“Norikra in Action” 
http://www.slideshare.net/tagomoris/norikra-in-action-ver-2014-spring 
http://www.slideshare.net/tagomoris/presentations
Storm or Norikra? 
Simple and fixed workload for huge traffic 
Use Storm! 
Complex and fragile workload for non-huge traffic 
Use Norikra!
Scalability? 
10,000 - 100,000 events/sec 
on 2CPU 8Core server
HA? Distributed? 
NO! 
I have some idea, but I have no time to implement it 
There are no needs for HA/Distributed processing
Data flow & API? 
Use Fluentd!

More Related Content

PDF
Norikra: Stream Processing with SQL
PDF
How to Make Norikra Perfect
PDF
HadoopCon 2016 - 用 Jupyter Notebook Hold 住一個上線 Spark Machine Learning 專案實戰
PPTX
Elk with Openstack
PDF
Building a near real time search engine & analytics for logs using solr
PPT
Real-Time Streaming with Apache Spark Streaming and Apache Storm
PDF
Tale of ISUCON and Its Bench Tools
PDF
Designing Scalable and Extendable Data Pipeline for Call Of Duty Games
Norikra: Stream Processing with SQL
How to Make Norikra Perfect
HadoopCon 2016 - 用 Jupyter Notebook Hold 住一個上線 Spark Machine Learning 專案實戰
Elk with Openstack
Building a near real time search engine & analytics for logs using solr
Real-Time Streaming with Apache Spark Streaming and Apache Storm
Tale of ISUCON and Its Bench Tools
Designing Scalable and Extendable Data Pipeline for Call Of Duty Games

What's hot (20)

PDF
Async and Non-blocking IO w/ JRuby
PDF
Data Analytics Service Company and Its Ruby Usage
PPTX
Monitoring MySQL with OpenTSDB
PDF
How To Write Middleware In Ruby
PPTX
Realtime Statistics based on Apache Storm and RocketMQ
PDF
Tuning Solr for Logs
PDF
Real-Time Analytics with Kafka, Cassandra and Storm
PDF
ELK: Moose-ively scaling your log system
PPTX
RESTful API – How to Consume, Extract, Store and Visualize Data with InfluxDB...
PPTX
ELK Stack
ODP
Meet Up - Spark Stream Processing + Kafka
PDF
Presto in Treasure Data (presented at db tech showcase Sapporo 2015)
PDF
JEEConf. Vanilla java
PDF
InfluxDB IOx Tech Talks: Intro to the InfluxDB IOx Read Buffer - A Read-Optim...
PDF
Fluentd - Flexible, Stable, Scalable
PDF
The Patterns of Distributed Logging and Containers
PDF
Docker Monitoring Webinar
PDF
Advanced troubleshooting linux performance
PDF
Large Scale Log Analytics with Solr: Presented by Rafał Kuć & Radu Gheorghe, ...
PDF
OpenStack Log Mining
Async and Non-blocking IO w/ JRuby
Data Analytics Service Company and Its Ruby Usage
Monitoring MySQL with OpenTSDB
How To Write Middleware In Ruby
Realtime Statistics based on Apache Storm and RocketMQ
Tuning Solr for Logs
Real-Time Analytics with Kafka, Cassandra and Storm
ELK: Moose-ively scaling your log system
RESTful API – How to Consume, Extract, Store and Visualize Data with InfluxDB...
ELK Stack
Meet Up - Spark Stream Processing + Kafka
Presto in Treasure Data (presented at db tech showcase Sapporo 2015)
JEEConf. Vanilla java
InfluxDB IOx Tech Talks: Intro to the InfluxDB IOx Read Buffer - A Read-Optim...
Fluentd - Flexible, Stable, Scalable
The Patterns of Distributed Logging and Containers
Docker Monitoring Webinar
Advanced troubleshooting linux performance
Large Scale Log Analytics with Solr: Presented by Rafał Kuć & Radu Gheorghe, ...
OpenStack Log Mining
Ad

Viewers also liked (15)

PDF
Invitation for v1.0.0
PDF
Handling not so big data
PDF
Lambda Architecture Using SQL
PDF
BigQuery, Fluentd and tagomoris #gcpja
PDF
運用とデータ分析の遠くて近い関係、ISUCONを添えて
PDF
Ruby for soul of BigData Nerds
PDF
Who owns your chats - by Zoobe for Startup Safary Berlin May 2015 // Facebook...
PDF
Fluentd and WebHDFS
PPTX
Telegram's Bot Platform
PDF
Андрей Листочкин "Боты: возможно, вам не нужен UI"
PDF
Instant Messaging apps market analysis
PPT
Instant Messaging
PDF
fluent-plugin-norikra #fluentdcasual
PDF
Bot Trends 2016
PDF
Hadoop and Kerberos
Invitation for v1.0.0
Handling not so big data
Lambda Architecture Using SQL
BigQuery, Fluentd and tagomoris #gcpja
運用とデータ分析の遠くて近い関係、ISUCONを添えて
Ruby for soul of BigData Nerds
Who owns your chats - by Zoobe for Startup Safary Berlin May 2015 // Facebook...
Fluentd and WebHDFS
Telegram's Bot Platform
Андрей Листочкин "Боты: возможно, вам не нужен UI"
Instant Messaging apps market analysis
Instant Messaging
fluent-plugin-norikra #fluentdcasual
Bot Trends 2016
Hadoop and Kerberos
Ad

Similar to Norikra: SQL Stream Processing In Ruby (20)

PDF
Introducing BinarySortedMultiMap - A new Flink state primitive to boost your ...
PPTX
ql.io at NodePDX
PDF
Fast NoSQL from HDDs?
PDF
nuclio Overview October 2017
PDF
iguazio - nuclio overview to CNCF (Sep 25th 2017)
PPTX
Fabric - Realtime stream processing framework
PDF
Original slides from Ryan Dahl's NodeJs intro talk
PDF
WSO2 Product Release Webinar: WSO2 Complex Event Processor 4.0
PDF
Letswift19-clean-architecture
PDF
The magic behind your Lyft ride prices: A case study on machine learning and ...
PDF
Lightbend Lagom: Microservices Just Right
PPTX
Introduction to WSO2 Data Analytics Platform
PDF
A Practical Deep Dive into Observability of Streaming Applications with Kosta...
PPTX
ql.io: Consuming HTTP at Scale
PPTX
Angular2 inter3
PDF
Perfect Norikra 2nd Season
PDF
Application Monitoring using Open Source - VictoriaMetrics & Altinity ClickHo...
PDF
Application Monitoring using Open Source: VictoriaMetrics - ClickHouse
PDF
Java/Scala Lab: Анатолий Кметюк - Scala SubScript: Алгебра для реактивного пр...
PDF
IVS CTO Night And Day 2018 Winter - [re:Cap] Serverless & Mobile
Introducing BinarySortedMultiMap - A new Flink state primitive to boost your ...
ql.io at NodePDX
Fast NoSQL from HDDs?
nuclio Overview October 2017
iguazio - nuclio overview to CNCF (Sep 25th 2017)
Fabric - Realtime stream processing framework
Original slides from Ryan Dahl's NodeJs intro talk
WSO2 Product Release Webinar: WSO2 Complex Event Processor 4.0
Letswift19-clean-architecture
The magic behind your Lyft ride prices: A case study on machine learning and ...
Lightbend Lagom: Microservices Just Right
Introduction to WSO2 Data Analytics Platform
A Practical Deep Dive into Observability of Streaming Applications with Kosta...
ql.io: Consuming HTTP at Scale
Angular2 inter3
Perfect Norikra 2nd Season
Application Monitoring using Open Source - VictoriaMetrics & Altinity ClickHo...
Application Monitoring using Open Source: VictoriaMetrics - ClickHouse
Java/Scala Lab: Анатолий Кметюк - Scala SubScript: Алгебра для реактивного пр...
IVS CTO Night And Day 2018 Winter - [re:Cap] Serverless & Mobile

More from SATOSHI TAGOMORI (20)

PDF
Ractor's speed is not light-speed
PDF
Good Things and Hard Things of SaaS Development/Operations
PDF
Maccro Strikes Back
PDF
Invitation to the dark side of Ruby
PDF
Hijacking Ruby Syntax in Ruby (RubyConf 2018)
PDF
Make Your Ruby Script Confusing
PDF
Hijacking Ruby Syntax in Ruby
PDF
Lock, Concurrency and Throughput of Exclusive Operations
PDF
Data Processing and Ruby in the World
PDF
Planet-scale Data Ingestion Pipeline: Bigdam
PDF
Technologies, Data Analytics Service and Enterprise Business
PDF
Ruby and Distributed Storage Systems
PDF
Fluentd 101
PDF
To Have Own Data Analytics Platform, Or NOT To
PDF
Modern Black Mages Fighting in the Real World
PDF
Open Source Software, Distributed Systems, Database as a Cloud Service
PDF
Fluentd Overview, Now and Then
PDF
Distributed Logging Architecture in Container Era
PDF
Fighting API Compatibility On Fluentd Using "Black Magic"
PDF
Fluentd v0.14 Plugin API Details
Ractor's speed is not light-speed
Good Things and Hard Things of SaaS Development/Operations
Maccro Strikes Back
Invitation to the dark side of Ruby
Hijacking Ruby Syntax in Ruby (RubyConf 2018)
Make Your Ruby Script Confusing
Hijacking Ruby Syntax in Ruby
Lock, Concurrency and Throughput of Exclusive Operations
Data Processing and Ruby in the World
Planet-scale Data Ingestion Pipeline: Bigdam
Technologies, Data Analytics Service and Enterprise Business
Ruby and Distributed Storage Systems
Fluentd 101
To Have Own Data Analytics Platform, Or NOT To
Modern Black Mages Fighting in the Real World
Open Source Software, Distributed Systems, Database as a Cloud Service
Fluentd Overview, Now and Then
Distributed Logging Architecture in Container Era
Fighting API Compatibility On Fluentd Using "Black Magic"
Fluentd v0.14 Plugin API Details

Recently uploaded (20)

PPTX
A Presentation on Artificial Intelligence
PDF
Blue Purple Modern Animated Computer Science Presentation.pdf.pdf
PDF
7 ChatGPT Prompts to Help You Define Your Ideal Customer Profile.pdf
PPTX
ACSFv1EN-58255 AWS Academy Cloud Security Foundations.pptx
PDF
A comparative analysis of optical character recognition models for extracting...
PDF
Peak of Data & AI Encore- AI for Metadata and Smarter Workflows
PDF
Diabetes mellitus diagnosis method based random forest with bat algorithm
PDF
Dropbox Q2 2025 Financial Results & Investor Presentation
PDF
Optimiser vos workloads AI/ML sur Amazon EC2 et AWS Graviton
PDF
Encapsulation theory and applications.pdf
PDF
Machine learning based COVID-19 study performance prediction
PPTX
Machine Learning_overview_presentation.pptx
PDF
Approach and Philosophy of On baking technology
PDF
Unlocking AI with Model Context Protocol (MCP)
PDF
cuic standard and advanced reporting.pdf
PPTX
KOM of Painting work and Equipment Insulation REV00 update 25-dec.pptx
PDF
TokAI - TikTok AI Agent : The First AI Application That Analyzes 10,000+ Vira...
PDF
Encapsulation_ Review paper, used for researhc scholars
PDF
Agricultural_Statistics_at_a_Glance_2022_0.pdf
PDF
MIND Revenue Release Quarter 2 2025 Press Release
A Presentation on Artificial Intelligence
Blue Purple Modern Animated Computer Science Presentation.pdf.pdf
7 ChatGPT Prompts to Help You Define Your Ideal Customer Profile.pdf
ACSFv1EN-58255 AWS Academy Cloud Security Foundations.pptx
A comparative analysis of optical character recognition models for extracting...
Peak of Data & AI Encore- AI for Metadata and Smarter Workflows
Diabetes mellitus diagnosis method based random forest with bat algorithm
Dropbox Q2 2025 Financial Results & Investor Presentation
Optimiser vos workloads AI/ML sur Amazon EC2 et AWS Graviton
Encapsulation theory and applications.pdf
Machine learning based COVID-19 study performance prediction
Machine Learning_overview_presentation.pptx
Approach and Philosophy of On baking technology
Unlocking AI with Model Context Protocol (MCP)
cuic standard and advanced reporting.pdf
KOM of Painting work and Equipment Insulation REV00 update 25-dec.pptx
TokAI - TikTok AI Agent : The First AI Application That Analyzes 10,000+ Vira...
Encapsulation_ Review paper, used for researhc scholars
Agricultural_Statistics_at_a_Glance_2022_0.pdf
MIND Revenue Release Quarter 2 2025 Press Release

Norikra: SQL Stream Processing In Ruby

  • 1. Norikra: SQL Stream Processing In Ruby 2014/11/19 RubyConf 2014 DAY 3 Satoshi Tagomori (@tagomoris)
  • 2. Topics Why I wrote Norikra Norikra overview Norikra queries Use cases in production JRuby for me
  • 3. Satoshi Tagomori (@tagomoris) Tokyo, Japan LINE Corporation
  • 7. Monitoring/Data Analytics Overview collect parse clean up process visualize Access logs, store process Application logs, ...
  • 9. collect parse clean up process visualize store process
  • 11. collect parse clean up process visualize store process
  • 13. Fluentd stream aggregation: Good for simple data/calculation collect parse clean up process visualize store process
  • 14. Our services: More and more different services Many changes in a day (including logging) Many kind of logs for each services Many different metrics for each services
  • 15. Fluentd stream aggregation: Not good for processing about complex/fragile environment... collect parse clean up process visualize store process
  • 16. We want to: add/remove queries anytime we want write many queries for a service log stream ignore events without data we want make our service directors / growth hackers to write their own queries!
  • 17. collect parse clean up process visualize store process
  • 20. Norikra: Schema-less Stream Processing with SQL Server software, written in JRuby, runs on JVM Open source software (GPLv2) http://norikra.github.io/ https://github.com/norikra/norikra
  • 21. How To Setup Norikra: Install JRuby download jruby.tar.gz, extract it and export $PATH use rbenv rbenv install jruby-1.7.xx rbenv shell jruby-.. Install Norikra gem install norikra Execute Norikra server norikra start
  • 22. Norikra Interface: CLI client/Client library: norikra-client norikra-client target open ... norikra-client query add ... tail -f ... | norikra-client event send ... WebUI show status show/add/remove queries HTTP API JSON, MessagePack
  • 23. Norikra: Schema-less event stream: Add/Remove data fields whenever you want SQL: No more restarts to add/remove queries w/ JOINs, w/ SubQueries w/ UDF (in Java/Ruby as rubygems) Truly Complex events: Nested Hash/Array, accessible directly from SQL
  • 24. Norikra Queries: (1) SELECT name, age FROM events target
  • 25. Norikra Queries: (1) {“name”:”tagomoris”, “age”:35, “address”:”Tokyo”, “corp”:”LINE”, “current”:”San Diego”} SELECT name, age FROM events {“name”:”tagomoris”,”age”:35}
  • 26. Norikra Queries: (1) {“name”:”tagomoris”, “address”:”Tokyo”, “corp”:”LINE”, “current”:”San Diego”} without “age” SELECT name, age FROM events nothing
  • 27. Norikra Queries: (2) {“name”:”tagomoris”, “address”:”Tokyo”, “corp”:”LINE”, “current”:”San Diego”} SELECT name, age FROM events WHERE current=”San Diego” {“name”:”tagomoris”,”age”:35}
  • 28. Norikra Queries: (2) {“name”:”nobu”, “age”:0, “address”:”Somewhere”, “corp”:”Heroku”, “current”:”SAN”} current is not “San Diego” SELECT name, age FROM events WHERE current=”San Diego” nothing
  • 29. Norikra Queries: (3) SELECT age, COUNT(*) as cnt FROM events.win:time_batch(5 mins) GROUP BY age
  • 30. Norikra Queries: (3) {“name”:”tagomoris”, “address”:”Tokyo”, “corp”:”LINE”, “current”:”San Diego”} SELECT age, COUNT(*) as cnt FROM events.win:time_batch(5 mins) GROUP BY age every 5 mins {”age”:35,”cnt”:3}, {“age”:33,”cnt”:1}, ...
  • 31. Norikra Queries: (4) {“name”:”tagomoris”, “address”:”Tokyo”, “corp”:”LINE”, “current”:”San Diego”} SELECT age, COUNT(*) as cnt FROM events.win:time_batch(5 mins) GROUP BY age {”age”:35,”cnt”:3}, {“age”:33,”cnt”:1}, ... SELECT max(age) as max FROM events.win:time_batch(5 mins) {“max”:51} every 5 mins
  • 32. Norikra Queries: (5) {“name”:”tagomoris”, “user:{“age”:35, “corp”:”LINE”, “address”:”Tokyo”}, “current”:”San Diego”, “speaker”:true, “attend”:[true,true,false, ...] } SELECT age, COUNT(*) as cnt FROM events.win:time_batch(5 mins) GROUP BY age
  • 33. Norikra Queries: (5) {“name”:”tagomoris”, “user:{“age”:35, “corp”:”LINE”, “address”:”Tokyo”}, “current”:”San Diego”, “speaker”:true, “attend”:[true,true,false, ...] } SELECT user.age, COUNT(*) as cnt FROM events.win:time_batch(5 mins) GROUP BY user.age
  • 34. Norikra Queries: (5) {“name”:”tagomoris”, “user:{“age”:35, “corp”:”LINE”, “address”:”Tokyo”}, “current”:”San Diego”, “speaker”:true, “attend”:[true,true,false, ...] } SELECT user.age, COUNT(*) as cnt FROM events.win:time_batch(5 mins) WHERE current=”San Diego” AND attend.$0 AND attend.$1 GROUP BY user.age
  • 36. Use case 1: External API call reports for partners (LINE) External API call for LINE Business Connect LINE backend sends requests to partner’s API endpoint using users’ messages http://developers.linecorp.com/blog/?p=3386
  • 37. Use case 1: External API call reports for partners (LINE) channel gateway partner’s server logs query results MySQL Mail SELECT channelId AS channel_id, reason, detail, count(*) AS error_count, min(timestamp) AS first_timestamp, max(timestamp) AS last_timestamp FROM api_error_log.win:time_batch(60 sec) GROUP BY channelId,reason,detail HAVING count(*) > 0 http://developers.linecorp.com/blog/?p=3386
  • 38. Use case 1: External API call reports for partners (LINE) API error response summaries http://developers.linecorp.com/blog/?p=3386
  • 39. Use case 2: Lambda architecture Prompt reports for Ad service console Prompt reports with Norikra + Fixed reports with Hive app serverapp serverapp server app serverapp serverapp server Fluentd HDFS console service execute hive query (daily) fetch query results (frequently) impression logs
  • 40. Use case 2: Prompt reports for Ad service console SELECT yyyymmdd, hh, campaign_id, region, lang, COUNT(*) AS click, COUNT(DISTINCT member_id) AS uu FROM ( SELECT yyyymmdd, hh, get_json_object(log, '$.campaign.id') AS campaign_id, get_json_object(log, '$.member.region') AS region, get_json_object(log, '$.member.lang') AS lang, get_json_object(log, '$.member.id') AS member_id FROM applog WHERE service='myservice' AND yyyymmdd='20140913' AND get_json_object(log, '$.type')='click' ) x GROUP BY yyyymmdd, hh, campaign_id, region, lang Hive query for fixed reports
  • 41. Use case 2: Prompt reports for Ad service console Norikra query for prompt reports SELECT campaign.id AS campaign_id, member.region AS region, member.lang AS lang, COUNT(*) AS click, COUNT(DISTINCT member.id) AS uu FROM myservice.win:time_batch(1 hours) WHERE type="click" GROUP BY campaign.id, member.region, member.lang
  • 42. Use case 3: Realtime access dashboard on Google Platform Access log visualization Count using Norikra (2-step), Store on Google BigQuery Dashboard on Google Spreadsheet + Apps Script http://qiita.com/kazunori279/items/6329df57635799405547 https://www.youtube.com/watch?v=EZkw5TDcCGw
  • 43. Use case 3: Realtime access dashboard on Google Platform Server Fluentd http://qiita.com/kazunori279/items/6329df57635799405547 https://www.youtube.com/watch?v=EZkw5TDcCGw ngnix access log access logs to BigQuery norikra query results norikra query to aggregate node to aggregate locally
  • 44. Use case 3: Realtime access dashboard on Google Platform 70 servers, 120,000 requests/sec (or more!) Fluentd logs to store http://qiita.com/kazunori279/items/6329df57635799405547 https://www.youtube.com/watch?v=EZkw5TDcCGw ngnix ngngninxix ngngninxix ngngninxix ngngninxix ngngninxix ngngninxix ngngninxix ngngninxix ngnix Google BigQuery Google Spreadsheet + Apps script ... counts per host total count
  • 45. Why Norikra is written in JRuby Esper CEP(Complex Event Processing) library, written in Java Rubygems.org Open repository, for public UDF plugins of Norikra provided as gem
  • 46. JRuby for me Ruby! (by great JRuby developer team!) makes developing Norikra dramatically faster with rubygems and rubygems.org for easy deployment/installation with Java libraries, ex: Jetty, Esper, ... There are not so many users in Tokyo :(
  • 47. More queries, more simplicity and less latency in data processing Thanks! photo: by my co-workers http://norikra.github.io/ https://github.com/norikra/norikra
  • 48. See also: http://norikra.github.io/ “Lambda Architecture Platform Using SQL” http://www.slideshare.net/tagomoris/lambda-architecture-using-sql-hadoopcon- 2014-taiwan “Stream processing and Norikra” http://www.slideshare.net/tagomoris/stream-processing-and-norikra “Batch processing and Stream processing by SQL” http://www.slideshare.net/tagomoris/hcj2014-sql “Norikra in Action” http://www.slideshare.net/tagomoris/norikra-in-action-ver-2014-spring http://www.slideshare.net/tagomoris/presentations
  • 49. Storm or Norikra? Simple and fixed workload for huge traffic Use Storm! Complex and fragile workload for non-huge traffic Use Norikra!
  • 50. Scalability? 10,000 - 100,000 events/sec on 2CPU 8Core server
  • 51. HA? Distributed? NO! I have some idea, but I have no time to implement it There are no needs for HA/Distributed processing
  • 52. Data flow & API? Use Fluentd!