SlideShare a Scribd company logo
Keira Zhou
May 11, 2016
§ Install on your laptop
§ Kafka 0.9
§ Flink 1.0.2
§ Elasticserach 2.3.2
§ Create a topic
§ bin/kafka-topics.sh 
--create 
--zookeeper localhost:2181 
--replication-factor 1 
--partitions 1 
--topic viper_test
§ Create an index
§ curl -XPUT 'http://localhost:9200/viper-test/' -d '{
"settings" :{
"index" :{
"number_of_shards" :1,
"number_of_replicas" :0
}
}
}’
§ Put mapping of a doc type within the index
§ curl -XPUT 'localhost:9200/viper-test/_mapping/viper-log' -d '{
"properties":{
"ip":{ "type":"string","index":"not_analyzed" },
"info":{ "type":"string" }
}
}'
§ More info:
§ https://ci.apache.org/projects/flink/flink-docs-master/apis/streaming/connectors/kafka.html
§ Maven dependency
§ <dependency>
<groupId>org.apache.flink</groupId>
<artifactId>flink-connector-kafka-0.9_2.10</artifactId>
<version>1.0.2</version>
</dependency>
§ Example Java code
§ public static DataStream<String> readFromKafka(StreamExecutionEnvironment env) {
env.enableCheckpointing(5000);
// set up the execution environment
Properties properties = new Properties();
properties.setProperty("bootstrap.servers", "localhost:9092");
properties.setProperty("group.id", "test");
DataStream<String> stream = env.addSource(
new FlinkKafkaConsumer09<>("test", new SimpleStringSchema(), properties));
return stream;
}
§ More info:
§ https://ci.apache.org/projects/flink/flink-docs-master/apis/streaming/connectors/elasticsearch2.html
§ Maven dependency
§ <dependency>
<groupId>org.apache.flink</groupId>
<artifactId>flink-connector-elasticsearch2_2.10</artifactId>
<version>1.1-SNAPSHOT</version>
</dependency>
§ Example Java code
§ Next page…
§ Example Java code
§ public static void writeElastic(DataStream<String> input) {
Map<String, String> config = new HashMap<>();
// This instructs the sink to emit after every element, otherwise they would be buffered
config.put("bulk.flush.max.actions", "1");
config.put("cluster.name", "es_keira");
try {
// Add elasticsearch hosts on startup
List<InetSocketAddress> transports = new ArrayList<>();
transports.add(new InetSocketAddress("127.0.0.1", 9300)); // port is 9300 not 9200 for ES TransportClient
ElasticsearchSinkFunction<String> indexLog = new ElasticsearchSinkFunction<String>() {
public IndexRequest createIndexRequest(String element) {
String[] logContent = element.trim().split("t");
Map<String, String> esJson = new HashMap<>();
esJson.put("IP", logContent[0]);
esJson.put("info", logContent[1]);
return Requests .indexRequest() .index("viper-test") .type("viper-log") .source(esJson); }
@Override
public void process(String element, RuntimeContext ctx, RequestIndexer indexer) {
indexer.add(createIndexRequest(element));
}
};
ElasticsearchSink esSink = new ElasticsearchSink(config, transports, indexLog);
input.addSink(esSink);
} catch (Exception e) {
System.out.println(e);
}
}
§ https://github.com/keiraqz/KafkaFlinkElastic/blob/master/src/main/java/viper/K
afkaFlinkElastic.java
§ Start your Flink program in your IDE
§ Start Kafka producer cli interface
§ bin/kafka-console-producer.sh --broker-list localhost:9092 --topic viper_test
§ In your terminal,type (it’s tab separated):
§ 10.20.30.40 test
§ Afterwards,in elastic:
§ curl 'localhost:9200/viper-test/viper-log/_search?pretty’
§ You should see:
§ { "took" :1,
"timed_out" : false,
"_shards" : { "total" : 1, "successful" : 1, "failed" : 0 },
"hits" : { "total" : 1, "max_score" : 1.0,
"hits" : [ { "_index" : "viper-test", "_type" : "viper-log", "_id" : ”SOMETHING","_score" : 1.0,
"_source" : { "IP" : "10.20.30.40", "info" : "test" } } ]
}
}
§ https://github.com/keiraqz/KafkaFlinkElastic

More Related Content

PDF
The Rise Of Event Streaming – Why Apache Kafka Changes Everything
PDF
Microservices with Kafka Ecosystem
PPTX
How to Choose The Right Database on AWS - Berlin Summit - 2019
PDF
Real-Life Use Cases & Architectures for Event Streaming with Apache Kafka
PDF
A Technical Introduction to WiredTiger
PDF
GraphQL Basics
PPTX
Hardware planning & sizing for sql server
PPT
Sql Server Performance Tuning
The Rise Of Event Streaming – Why Apache Kafka Changes Everything
Microservices with Kafka Ecosystem
How to Choose The Right Database on AWS - Berlin Summit - 2019
Real-Life Use Cases & Architectures for Event Streaming with Apache Kafka
A Technical Introduction to WiredTiger
GraphQL Basics
Hardware planning & sizing for sql server
Sql Server Performance Tuning

What's hot (20)

PDF
Serverless Kafka and Spark in a Multi-Cloud Lakehouse Architecture
PPTX
Best practices on building data lakes and lake formation
PDF
Serverless Kafka on AWS as Part of a Cloud-native Data Lake Architecture
PDF
Building an open data platform with apache iceberg
PPTX
Delta lake and the delta architecture
PPTX
Alfresco search services: Now and Then
PDF
Kafka Streams: What it is, and how to use it?
PPTX
Grokking Techtalk #40: Consistency and Availability tradeoff in database cluster
PDF
Microservices - Death of the Enterprise Service Bus (ESB)? (Update 2016)
PDF
Azure Event Hubs - Behind the Scenes With Kasun Indrasiri | Current 2022
PDF
Presto @ Uber Hadoop summit2017
PDF
Modern Data Architecture for a Data Lake with Informatica and Hortonworks Dat...
PDF
Introduction to Kafka Streams
PDF
Data Streaming Ecosystem Management at Booking.com
PDF
Apache Spark Data Source V2 with Wenchen Fan and Gengliang Wang
PDF
New Relic
PDF
Democratizing Data at Airbnb
PPTX
An Introduction to Elastic Search.
PPTX
Deep Dive with Spark Streaming - Tathagata Das - Spark Meetup 2013-06-17
PDF
0-60: Tesla's Streaming Data Platform ( Jesse Yates, Tesla) Kafka Summit SF 2019
Serverless Kafka and Spark in a Multi-Cloud Lakehouse Architecture
Best practices on building data lakes and lake formation
Serverless Kafka on AWS as Part of a Cloud-native Data Lake Architecture
Building an open data platform with apache iceberg
Delta lake and the delta architecture
Alfresco search services: Now and Then
Kafka Streams: What it is, and how to use it?
Grokking Techtalk #40: Consistency and Availability tradeoff in database cluster
Microservices - Death of the Enterprise Service Bus (ESB)? (Update 2016)
Azure Event Hubs - Behind the Scenes With Kasun Indrasiri | Current 2022
Presto @ Uber Hadoop summit2017
Modern Data Architecture for a Data Lake with Informatica and Hortonworks Dat...
Introduction to Kafka Streams
Data Streaming Ecosystem Management at Booking.com
Apache Spark Data Source V2 with Wenchen Fan and Gengliang Wang
New Relic
Democratizing Data at Airbnb
An Introduction to Elastic Search.
Deep Dive with Spark Streaming - Tathagata Das - Spark Meetup 2013-06-17
0-60: Tesla's Streaming Data Platform ( Jesse Yates, Tesla) Kafka Summit SF 2019
Ad

Recently uploaded (20)

PPTX
CH1 Production IntroductoryConcepts.pptx
PDF
ETO & MEO Certificate of Competency Questions and Answers
PPT
Mechanical Engineering MATERIALS Selection
PPTX
Infosys Presentation by1.Riyan Bagwan 2.Samadhan Naiknavare 3.Gaurav Shinde 4...
PDF
Digital Logic Computer Design lecture notes
PPTX
Unit 5 BSP.pptxytrrftyyydfyujfttyczcgvcd
PPTX
UNIT-1 - COAL BASED THERMAL POWER PLANTS
PPTX
Welding lecture in detail for understanding
PPTX
CYBER-CRIMES AND SECURITY A guide to understanding
PPT
Drone Technology Electronics components_1
PPTX
Construction Project Organization Group 2.pptx
PPTX
additive manufacturing of ss316l using mig welding
PPTX
MCN 401 KTU-2019-PPE KITS-MODULE 2.pptx
PDF
Evaluating the Democratization of the Turkish Armed Forces from a Normative P...
PPTX
web development for engineering and engineering
PPTX
MET 305 MODULE 1 KTU 2019 SCHEME 25.pptx
PPT
Project quality management in manufacturing
PDF
Operating System & Kernel Study Guide-1 - converted.pdf
PPTX
OOP with Java - Java Introduction (Basics)
PPTX
FINAL REVIEW FOR COPD DIANOSIS FOR PULMONARY DISEASE.pptx
CH1 Production IntroductoryConcepts.pptx
ETO & MEO Certificate of Competency Questions and Answers
Mechanical Engineering MATERIALS Selection
Infosys Presentation by1.Riyan Bagwan 2.Samadhan Naiknavare 3.Gaurav Shinde 4...
Digital Logic Computer Design lecture notes
Unit 5 BSP.pptxytrrftyyydfyujfttyczcgvcd
UNIT-1 - COAL BASED THERMAL POWER PLANTS
Welding lecture in detail for understanding
CYBER-CRIMES AND SECURITY A guide to understanding
Drone Technology Electronics components_1
Construction Project Organization Group 2.pptx
additive manufacturing of ss316l using mig welding
MCN 401 KTU-2019-PPE KITS-MODULE 2.pptx
Evaluating the Democratization of the Turkish Armed Forces from a Normative P...
web development for engineering and engineering
MET 305 MODULE 1 KTU 2019 SCHEME 25.pptx
Project quality management in manufacturing
Operating System & Kernel Study Guide-1 - converted.pdf
OOP with Java - Java Introduction (Basics)
FINAL REVIEW FOR COPD DIANOSIS FOR PULMONARY DISEASE.pptx
Ad

Streaming using Kafka Flink & Elasticsearch

  • 2. § Install on your laptop § Kafka 0.9 § Flink 1.0.2 § Elasticserach 2.3.2
  • 3. § Create a topic § bin/kafka-topics.sh --create --zookeeper localhost:2181 --replication-factor 1 --partitions 1 --topic viper_test
  • 4. § Create an index § curl -XPUT 'http://localhost:9200/viper-test/' -d '{ "settings" :{ "index" :{ "number_of_shards" :1, "number_of_replicas" :0 } } }’ § Put mapping of a doc type within the index § curl -XPUT 'localhost:9200/viper-test/_mapping/viper-log' -d '{ "properties":{ "ip":{ "type":"string","index":"not_analyzed" }, "info":{ "type":"string" } } }'
  • 5. § More info: § https://ci.apache.org/projects/flink/flink-docs-master/apis/streaming/connectors/kafka.html § Maven dependency § <dependency> <groupId>org.apache.flink</groupId> <artifactId>flink-connector-kafka-0.9_2.10</artifactId> <version>1.0.2</version> </dependency> § Example Java code § public static DataStream<String> readFromKafka(StreamExecutionEnvironment env) { env.enableCheckpointing(5000); // set up the execution environment Properties properties = new Properties(); properties.setProperty("bootstrap.servers", "localhost:9092"); properties.setProperty("group.id", "test"); DataStream<String> stream = env.addSource( new FlinkKafkaConsumer09<>("test", new SimpleStringSchema(), properties)); return stream; }
  • 6. § More info: § https://ci.apache.org/projects/flink/flink-docs-master/apis/streaming/connectors/elasticsearch2.html § Maven dependency § <dependency> <groupId>org.apache.flink</groupId> <artifactId>flink-connector-elasticsearch2_2.10</artifactId> <version>1.1-SNAPSHOT</version> </dependency> § Example Java code § Next page…
  • 7. § Example Java code § public static void writeElastic(DataStream<String> input) { Map<String, String> config = new HashMap<>(); // This instructs the sink to emit after every element, otherwise they would be buffered config.put("bulk.flush.max.actions", "1"); config.put("cluster.name", "es_keira"); try { // Add elasticsearch hosts on startup List<InetSocketAddress> transports = new ArrayList<>(); transports.add(new InetSocketAddress("127.0.0.1", 9300)); // port is 9300 not 9200 for ES TransportClient ElasticsearchSinkFunction<String> indexLog = new ElasticsearchSinkFunction<String>() { public IndexRequest createIndexRequest(String element) { String[] logContent = element.trim().split("t"); Map<String, String> esJson = new HashMap<>(); esJson.put("IP", logContent[0]); esJson.put("info", logContent[1]); return Requests .indexRequest() .index("viper-test") .type("viper-log") .source(esJson); } @Override public void process(String element, RuntimeContext ctx, RequestIndexer indexer) { indexer.add(createIndexRequest(element)); } }; ElasticsearchSink esSink = new ElasticsearchSink(config, transports, indexLog); input.addSink(esSink); } catch (Exception e) { System.out.println(e); } }
  • 9. § Start your Flink program in your IDE § Start Kafka producer cli interface § bin/kafka-console-producer.sh --broker-list localhost:9092 --topic viper_test § In your terminal,type (it’s tab separated): § 10.20.30.40 test § Afterwards,in elastic: § curl 'localhost:9200/viper-test/viper-log/_search?pretty’ § You should see: § { "took" :1, "timed_out" : false, "_shards" : { "total" : 1, "successful" : 1, "failed" : 0 }, "hits" : { "total" : 1, "max_score" : 1.0, "hits" : [ { "_index" : "viper-test", "_type" : "viper-log", "_id" : ”SOMETHING","_score" : 1.0, "_source" : { "IP" : "10.20.30.40", "info" : "test" } } ] } }