SlideShare a Scribd company logo
Fluentd and Kafka
Hadoop / Spark Conference Japan 2016

Feb 8, 2016
Who are you?
• Masahiro Nakagawa
• github: @repeatedly
• Treasure Data Inc.
• Fluentd / td-agent developer
• Fluentd Enterprise support
• I love OSS :)
• D Language, MessagePack, The organizer of several meetups, etc…
Fluentd
• Pluggable streaming event collector
• Lightweight, robust and flexible
• Lots of plugins on rubygems
• Used by AWS, GCP, MS and more companies
• Resources
• http://www.fluentd.org/
• Webinar: https://www.youtube.com/watch?v=6uPB_M7cbYk
Popular case
App
Push
Push
Forwarder Aggregator Destination
• Distributed messaging system
• Producer - Broker - Consumer pattern
• Pull model, replication, etc











Apache Kafka
App
Push
Pull
Producer Broker DestinationConsumer
Push vs Pull
• Push:
• Easy to transfer data to multiple destinations
• Hard to control stream ratio in multiple streams
• Pull:
• Easy to control stream flow / ratio
• Should manage consumers correctly
There are 2 ways
• fluent-plugin-kafka
• kafka-fluentd-consumer
fluent-plugin-kafka
• Input / Output plugin for kafka
• https://github.com/htgc/fluent-plugin-kafka
• in_kafka, in_kafka_group, out_kafka, out_kafka_buffered
• Pros
• Easy to use and output support
• Cons
• Performance is not primary
Configuration example
<source>
@type kafka
topics web,system
format json
add_prefix kafka.
# more options
</source>
<match kafka.**>
@type kafka_buffered
output_data_type msgpack
default_topic metrics
compression_codec gzip
required_acks 1
</match>
https://github.com/htgc/fluent-plugin-kafka#usage
kafka fluentd consumer
• Stand-alone kafka consumer for fluentd
• https://github.com/treasure-data/kafka-fluentd-consumer
• Send cosumed events to fluentd’s in_forward
• Pros
• High performance and Java API features
• Cons
• Need Java runtime
Run consumer
• Edit log4j and fluentd-consumer properties
• Run following command:

$ java 

-Dlog4j.configuration=file:///path/to/log4j.properties 

-jar path/to/kafka-fluentd-consumer-0.2.1-all.jar 

path/to/fluentd-consumer.properties
Properties example
fluentd.tag.prefix=kafka.event.
fluentd.record.format=regexp # default is json
fluentd.record.pattern=(?<text>.*) # for regexp format
fluentd.consumer.topics=app.* # can use Java Rege
fluentd.consumer.topics.pattern=blacklist # default is whitelist
fluentd.consumer.threads=5
https://github.com/treasure-data/kafka-fluentd-consumer/blob/master/config/fluentd-consumer.properties
With Fluentd example
<source>
@type forward
</source>
<source>
@type exec
command java -
Dlog4j.configuration=file:///path/to/
log4j.properties -jar /path/to/kafka-
fluentd-consumer-0.2.1-all.jar /path/
to/config/fluentd-
consumer.properties
tag dummy
format json
</source>
https://github.com/treasure-data/kafka-fluentd-consumer#run-kafka-consumer-for-fluentd-via-in_exec
Conclusion
• Kafka is now becomes important component

on data platform
• Fluentd can communicate with Kafka
• Fluentd plugin and kafka consumer
• Building reliable and flexible data pipeline with

Fluentd and Kafka

More Related Content

PDF
TIME_WAITに関する話
PDF
YugabyteDBを使ってみよう(NewSQL/分散SQLデータベースよろず勉強会 #1 発表資料)
PDF
各種データベースの特徴とパフォーマンス比較
PDF
Building a Data Pipeline using Apache Airflow (on AWS / GCP)
PDF
Kuberneteの運用を支えるGitOps
PPTX
PostgreSQLモニタリング機能の現状とこれから(Open Developers Conference 2020 Online 発表資料)
PDF
BGP Unnumbered で遊んでみた
PDF
Apache Kafkaって本当に大丈夫?~故障検証のオーバービューと興味深い挙動の紹介~
TIME_WAITに関する話
YugabyteDBを使ってみよう(NewSQL/分散SQLデータベースよろず勉強会 #1 発表資料)
各種データベースの特徴とパフォーマンス比較
Building a Data Pipeline using Apache Airflow (on AWS / GCP)
Kuberneteの運用を支えるGitOps
PostgreSQLモニタリング機能の現状とこれから(Open Developers Conference 2020 Online 発表資料)
BGP Unnumbered で遊んでみた
Apache Kafkaって本当に大丈夫?~故障検証のオーバービューと興味深い挙動の紹介~

What's hot (20)

PDF
PostgreSQLのリカバリ超入門(もしくはWAL、CHECKPOINT、オンラインバックアップの仕組み)
PDF
Vault の鍵管理機構
PPTX
대용량 분산 아키텍쳐 설계 #4. soa 아키텍쳐
PDF
トランザクション処理可能な分散DB 「YugabyteDB」入門(Open Source Conference 2022 Online/Fukuoka 発...
PDF
Intro to Telegraf
PDF
[GKE & Spanner 勉強会] Cloud Spanner の技術概要
PPTX
Apache Tez: Accelerating Hadoop Query Processing
PDF
Apache spark 2.3 and beyond
PDF
Amazon Aurora - Auroraの止まらない進化とその中身
PPTX
組織利用におけるMFA管理方法を考える OpsJAWS Meetup#8
PPTX
Apache Flink in the Cloud-Native Era
PDF
30分でわかるマイクロサービスアーキテクチャ 第2版
PDF
[Postgre sql9.4新機能]レプリケーション・スロットの活用
PDF
Migrating Oracle database to PostgreSQL
PPTX
Azure Reference Architectures
PDF
[오픈소스컨설팅]쿠버네티스를 활용한 개발환경 구축
PDF
オンプレミスRDBMSをAWSへ移行する手法
PPTX
NginxとLuaを用いた動的なリバースプロキシでデプロイを 100 倍速くした
PPTX
PostgreSQLからのデータ連携/同期も完全対応!DBを『活かす』なら、Syniti DR 9.7!
PDF
NTT DATA と PostgreSQL が挑んだ総力戦
PostgreSQLのリカバリ超入門(もしくはWAL、CHECKPOINT、オンラインバックアップの仕組み)
Vault の鍵管理機構
대용량 분산 아키텍쳐 설계 #4. soa 아키텍쳐
トランザクション処理可能な分散DB 「YugabyteDB」入門(Open Source Conference 2022 Online/Fukuoka 発...
Intro to Telegraf
[GKE & Spanner 勉強会] Cloud Spanner の技術概要
Apache Tez: Accelerating Hadoop Query Processing
Apache spark 2.3 and beyond
Amazon Aurora - Auroraの止まらない進化とその中身
組織利用におけるMFA管理方法を考える OpsJAWS Meetup#8
Apache Flink in the Cloud-Native Era
30分でわかるマイクロサービスアーキテクチャ 第2版
[Postgre sql9.4新機能]レプリケーション・スロットの活用
Migrating Oracle database to PostgreSQL
Azure Reference Architectures
[오픈소스컨설팅]쿠버네티스를 활용한 개발환경 구축
オンプレミスRDBMSをAWSへ移行する手法
NginxとLuaを用いた動的なリバースプロキシでデプロイを 100 倍速くした
PostgreSQLからのデータ連携/同期も完全対応!DBを『活かす』なら、Syniti DR 9.7!
NTT DATA と PostgreSQL が挑んだ総力戦
Ad

Similar to Fluentd and Kafka (20)

PPTX
Introducing Apache Kafka and why it is important to Oracle, Java and IT profe...
PDF
Apache Kafka - Scalable Message-Processing and more !
PDF
fluent-plugin-beats at Elasticsearch meetup #14
PPTX
What is Kafka & why is it Important? (UKOUG Tech17, Birmingham, UK - December...
PDF
AWS Re-Invent 2017 Netflix Keystone SPaaS - Monal Daxini - Abd320 2017
PPTX
Current and Future of Apache Kafka
PDF
Trend Micro Big Data Platform and Apache Bigtop
PPTX
Apache Pulsar: Why Unified Messaging and Streaming Is the Future - Pulsar Sum...
PDF
Fluentd at HKOScon
PDF
Technologies for Data Analytics Platform
PDF
Apache Kafka - Scalable Message Processing and more!
PPTX
Apache kafka
PDF
Automation + dev ops summit hail hydrate! from stream to lake
PPT
Building Scalable Big Data Infrastructure Using Open Source Software Presenta...
PDF
Hail hydrate! from stream to lake using open source
PPTX
AWS 2020 Year in Review reInvent ReCap
PDF
kafka-tutorial-cloudruable-v2.pdf
PDF
Self-Service Data Ingestion Using NiFi, StreamSets & Kafka
PDF
DATA SUMMIT 24 Building Real-Time Pipelines With FLaNK
PDF
Building a company-wide data pipeline on Apache Kafka - engineering for 150 b...
Introducing Apache Kafka and why it is important to Oracle, Java and IT profe...
Apache Kafka - Scalable Message-Processing and more !
fluent-plugin-beats at Elasticsearch meetup #14
What is Kafka & why is it Important? (UKOUG Tech17, Birmingham, UK - December...
AWS Re-Invent 2017 Netflix Keystone SPaaS - Monal Daxini - Abd320 2017
Current and Future of Apache Kafka
Trend Micro Big Data Platform and Apache Bigtop
Apache Pulsar: Why Unified Messaging and Streaming Is the Future - Pulsar Sum...
Fluentd at HKOScon
Technologies for Data Analytics Platform
Apache Kafka - Scalable Message Processing and more!
Apache kafka
Automation + dev ops summit hail hydrate! from stream to lake
Building Scalable Big Data Infrastructure Using Open Source Software Presenta...
Hail hydrate! from stream to lake using open source
AWS 2020 Year in Review reInvent ReCap
kafka-tutorial-cloudruable-v2.pdf
Self-Service Data Ingestion Using NiFi, StreamSets & Kafka
DATA SUMMIT 24 Building Real-Time Pipelines With FLaNK
Building a company-wide data pipeline on Apache Kafka - engineering for 150 b...
Ad

More from N Masahiro (20)

PDF
Fluentd Project Intro at Kubecon 2019 EU
PDF
Fluentd v1 and future at techtalk
PDF
Fluentd and Distributed Logging at Kubecon
PDF
Fluentd v1.0 in a nutshell
PDF
Fluentd v1.0 in a nutshell
PDF
Presto changes
PDF
Fluentd v0.14 Overview
PDF
Dive into Fluentd plugin v0.12
PDF
Docker and Fluentd
PDF
How to create Treasure Data #dotsbigdata
PDF
Fluentd v0.12 master guide
PDF
Fluentd and Embulk Game Server 4
PDF
Treasure Data and AWS - Developers.io 2015
PDF
Fluentd Unified Logging Layer At Fossasia
PDF
Treasure Data and OSS
PDF
Fluentd - RubyKansai 65
PDF
Fluentd - road to v1 -
PDF
Fluentd: Unified Logging Layer at CWT2014
PDF
SQL for Everything at CWT2014
PDF
Can you say the same words even in oss
Fluentd Project Intro at Kubecon 2019 EU
Fluentd v1 and future at techtalk
Fluentd and Distributed Logging at Kubecon
Fluentd v1.0 in a nutshell
Fluentd v1.0 in a nutshell
Presto changes
Fluentd v0.14 Overview
Dive into Fluentd plugin v0.12
Docker and Fluentd
How to create Treasure Data #dotsbigdata
Fluentd v0.12 master guide
Fluentd and Embulk Game Server 4
Treasure Data and AWS - Developers.io 2015
Fluentd Unified Logging Layer At Fossasia
Treasure Data and OSS
Fluentd - RubyKansai 65
Fluentd - road to v1 -
Fluentd: Unified Logging Layer at CWT2014
SQL for Everything at CWT2014
Can you say the same words even in oss

Recently uploaded (20)

PDF
Encapsulation_ Review paper, used for researhc scholars
PDF
Accuracy of neural networks in brain wave diagnosis of schizophrenia
PDF
Build a system with the filesystem maintained by OSTree @ COSCUP 2025
PPTX
Programs and apps: productivity, graphics, security and other tools
PPTX
SOPHOS-XG Firewall Administrator PPT.pptx
PDF
cuic standard and advanced reporting.pdf
PPT
Teaching material agriculture food technology
PDF
Building Integrated photovoltaic BIPV_UPV.pdf
PDF
Mobile App Security Testing_ A Comprehensive Guide.pdf
PDF
A comparative analysis of optical character recognition models for extracting...
PPTX
1. Introduction to Computer Programming.pptx
PDF
7 ChatGPT Prompts to Help You Define Your Ideal Customer Profile.pdf
PDF
Empathic Computing: Creating Shared Understanding
PDF
Architecting across the Boundaries of two Complex Domains - Healthcare & Tech...
PPTX
Tartificialntelligence_presentation.pptx
PPTX
Group 1 Presentation -Planning and Decision Making .pptx
PDF
Optimiser vos workloads AI/ML sur Amazon EC2 et AWS Graviton
PDF
Unlocking AI with Model Context Protocol (MCP)
PPT
“AI and Expert System Decision Support & Business Intelligence Systems”
PDF
Reach Out and Touch Someone: Haptics and Empathic Computing
Encapsulation_ Review paper, used for researhc scholars
Accuracy of neural networks in brain wave diagnosis of schizophrenia
Build a system with the filesystem maintained by OSTree @ COSCUP 2025
Programs and apps: productivity, graphics, security and other tools
SOPHOS-XG Firewall Administrator PPT.pptx
cuic standard and advanced reporting.pdf
Teaching material agriculture food technology
Building Integrated photovoltaic BIPV_UPV.pdf
Mobile App Security Testing_ A Comprehensive Guide.pdf
A comparative analysis of optical character recognition models for extracting...
1. Introduction to Computer Programming.pptx
7 ChatGPT Prompts to Help You Define Your Ideal Customer Profile.pdf
Empathic Computing: Creating Shared Understanding
Architecting across the Boundaries of two Complex Domains - Healthcare & Tech...
Tartificialntelligence_presentation.pptx
Group 1 Presentation -Planning and Decision Making .pptx
Optimiser vos workloads AI/ML sur Amazon EC2 et AWS Graviton
Unlocking AI with Model Context Protocol (MCP)
“AI and Expert System Decision Support & Business Intelligence Systems”
Reach Out and Touch Someone: Haptics and Empathic Computing

Fluentd and Kafka

  • 1. Fluentd and Kafka Hadoop / Spark Conference Japan 2016
 Feb 8, 2016
  • 2. Who are you? • Masahiro Nakagawa • github: @repeatedly • Treasure Data Inc. • Fluentd / td-agent developer • Fluentd Enterprise support • I love OSS :) • D Language, MessagePack, The organizer of several meetups, etc…
  • 3. Fluentd • Pluggable streaming event collector • Lightweight, robust and flexible • Lots of plugins on rubygems • Used by AWS, GCP, MS and more companies • Resources • http://www.fluentd.org/ • Webinar: https://www.youtube.com/watch?v=6uPB_M7cbYk
  • 5. • Distributed messaging system • Producer - Broker - Consumer pattern • Pull model, replication, etc
 
 
 
 
 
 Apache Kafka App Push Pull Producer Broker DestinationConsumer
  • 6. Push vs Pull • Push: • Easy to transfer data to multiple destinations • Hard to control stream ratio in multiple streams • Pull: • Easy to control stream flow / ratio • Should manage consumers correctly
  • 7. There are 2 ways • fluent-plugin-kafka • kafka-fluentd-consumer
  • 8. fluent-plugin-kafka • Input / Output plugin for kafka • https://github.com/htgc/fluent-plugin-kafka • in_kafka, in_kafka_group, out_kafka, out_kafka_buffered • Pros • Easy to use and output support • Cons • Performance is not primary
  • 9. Configuration example <source> @type kafka topics web,system format json add_prefix kafka. # more options </source> <match kafka.**> @type kafka_buffered output_data_type msgpack default_topic metrics compression_codec gzip required_acks 1 </match> https://github.com/htgc/fluent-plugin-kafka#usage
  • 10. kafka fluentd consumer • Stand-alone kafka consumer for fluentd • https://github.com/treasure-data/kafka-fluentd-consumer • Send cosumed events to fluentd’s in_forward • Pros • High performance and Java API features • Cons • Need Java runtime
  • 11. Run consumer • Edit log4j and fluentd-consumer properties • Run following command:
 $ java 
 -Dlog4j.configuration=file:///path/to/log4j.properties 
 -jar path/to/kafka-fluentd-consumer-0.2.1-all.jar 
 path/to/fluentd-consumer.properties
  • 12. Properties example fluentd.tag.prefix=kafka.event. fluentd.record.format=regexp # default is json fluentd.record.pattern=(?<text>.*) # for regexp format fluentd.consumer.topics=app.* # can use Java Rege fluentd.consumer.topics.pattern=blacklist # default is whitelist fluentd.consumer.threads=5 https://github.com/treasure-data/kafka-fluentd-consumer/blob/master/config/fluentd-consumer.properties
  • 13. With Fluentd example <source> @type forward </source> <source> @type exec command java - Dlog4j.configuration=file:///path/to/ log4j.properties -jar /path/to/kafka- fluentd-consumer-0.2.1-all.jar /path/ to/config/fluentd- consumer.properties tag dummy format json </source> https://github.com/treasure-data/kafka-fluentd-consumer#run-kafka-consumer-for-fluentd-via-in_exec
  • 14. Conclusion • Kafka is now becomes important component
 on data platform • Fluentd can communicate with Kafka • Fluentd plugin and kafka consumer • Building reliable and flexible data pipeline with
 Fluentd and Kafka