SlideShare a Scribd company logo
Jan 2014, HAPPIEST MINDS TECHNOLOGIES

Innovation @Work
Log Management with Logstash
and ElasticSearch
Rishav Rohit

SHARING. MINDFUL. INTEGRITY. LEARNING. EXCELLENCE. SOCIAL RESPONSIBILITY.
2

Copyright Information

This document is exclusive property of Happiest Minds Technologies Pvt. Ltd.It is
intended for limited circulation.

© 2013 Happiest Minds Technologies Pvt. Ltd. All Rights Reserved
3

Contents
Copyright Information ........................................................................................................... 2
Abstract ................................................................................................................................ 4
Introduction .......................................................................................................................... 4
Problem Definition ................................................................................................................ 4
High Level Solution ................................................................................................................ 5
Solution Details ..................................................................................................................... 6
Solution Benefits ................................................................................................................... 6
Solution extend-ability ........................................................................................................ 10
Deliverables ........................................................................................................................ 10
Conclusion........................................................................................................................... 11
References .......................................................................................................................... 11
Happiest Mind Innovators ................................................................................................... 11

© 2013 Happiest Minds Technologies Pvt. Ltd. All Rights Reserved
4

Abstract
Gathering logs from a wide array of servers and applications to be collected,searched, and
analyzed centrally, in real-time, is a challenging task. Once we overcome this challenge we
can get an ocean of insights from these logs, identify problems and come up with a solution
or corrective measures much quickly. In this paper, we will build a highly scalable real-time
log collection, search, visualization and analysis application using Logstash, ElasticSearch and
Kibana.

Introduction
Recent compliance mandates require not only that organizations collect all logs, but also that
they be reviewed regularly, are searchable, and are stored in their original, unaltered, raw
form for mandate-specific timeframes. Log management solutions address data collection
and retention needs in a way that allows them to inexpensively collect, store and manage
large amounts of log data.
To solve this problem we can build a highly scalable solution with real-time analysis using
Logstash, ElasticSearch and Kibana.
Logstash: Logstash is a free, light weight and high-integrality tool for managing events and
logs. It can collect logs, parse them and store them in a central location.It is free and open
source under Apache license.
ElasticSearch: Elasticsearch is a search server based on Lucene. It provides a distributed,
multitenant-capable full-text search engine with a RESTful web interface and schema-free
JSON documents. Elasticsearch is free and open source under Apache license.
Kibana: Kibana is a web-based, highly scalable dashboard solution seamlessly integrated
with ElasticSearch and provides real-time analysis of streaming data. This is also free and
open source product.

Problem Definition
Logs are extremely useful in identifying security incidents, policy violations, fraudulent
activity, and operational problems. They are also valuable when performing audits, forensic
analysis, internal investigations and identifying operational trends and long-term problems.
However, the infinite variety of log data formats makes it impossible to utilize the data
without data normalization.
As organizations grow, the variety of log data sources and the volume of data will increase.
Compounding this challenge is the variability of data formats and distributed nature of these
sources; in addition, every network infrastructure is in a constant state of change, with new
systems, applications, users, and devices being added every day of the year.

© 2013 Happiest Minds Technologies Pvt. Ltd. All Rights Reserved
5

All these challenges can be handled in a cost-effectiveand efficient manner by a log
management solution which can offer these features:
Centralized
Highly reliable
Searchable
Scalable
Secure

High Level Solution
Given below is brief overview of different technologies used for Log Management solution.
Logstash is a tool for managing events and logs. It is capable of filtering, modifying and
shipping out events and logs. Logstash natively offer plugins for variety of sources like
ElasticSearch, RabbitMQ, Redis, S3, Twitter, ZeroMQ, etc. Apart from single line logs it can
handle json, multi-line logs also. It offers wide range of filters like grok, csv, date, geoip, kv,
etc. and can it can ship out the parsed log to ElasticSearch, S3, Redis, ZeroMQ, MongoDB,
etc. A complete list of Logstashinput, output and filter plugins is available at
http://logstash.net/docs/latest/.
The alternatives for Logstash are Splunk, Chukwa, Flume and Graylog but none of these
offers the features like free and open source, high flexibility, low memory consumption and
native plugins for a range of inputs, codecs, filters and outputs.
ElasticSearch is rapidly growing open source search solution and it is used by thousands of
enterprises in virtually every industry. It is being used in production at companies like
Mozilla, StackOverflow, GitHub, Clout, McGraw-Hill, etc.
ElasticSearch provides amazing features like faceted search, auto-complete, routing,
sharding and scales easily. It provides search results in near real-time (close to milliseconds!).
Kibana is light weight web based dashboard and analysis application capable of real-time
analysis of streaming data. It provides dashboard components like maps, histogram, trends
and many other basic components.
The high level architecture for this solution is given in the diagram below:

© 2013 Happiest Minds Technologies Pvt. Ltd. All Rights Reserved
6

Diagram – HLD of Log Management Solution
In the above architecture we have three components:
Logstash Agent
ElasticSearch Cluster
Kibana UI
Logstash agent is a light java application running on the server(s) which is/are producing logs.
It filters and parses log and then ships out a json document to ElasticSearch cluster.
ElasticSearch cluster acts as a persistent store for logs and offers real-time search
capabilities. Using its distributed architecture ElasticSearch can scale massively without
compromising on performance.
Kibana is an UI dashboard and analysis tool. It offers both pre-configured dashboards and
on-demand dashboards. Kibana makes use of REST APIs to interact with ElasticSearch.

Solution Details
For purpose of demo of this solution I have used clickstream logs from ECML/PKDD 2005
Discovery Challenge. Some sample log lines are shown below:
12;1075658406;195.146.109.248;05aa4f4db0162e5723331042eb9ce8a7;/ct/?c=153;http://www.shop3.cz/
12;1075658407;212.65.194.144;86140090a2e102f1644f29e5ddadad9b;/ls/?id=34;http://www.shop3.cz/ct/?c=155
12;1075658409;62.24.70.41;851f20e644eb8bf82bfdbe4379050e2e;/txt/?c=734;http://www.shop3.cz/onakupu/

These log lines are delimited by semi-colon (;) and have below mentioned fields in order:
shop_id

© 2013 Happiest Minds Technologies Pvt. Ltd. All Rights Reserved
7

unixtime
client ip
session
visited page
referrer
For making the demo we need to create a logstash configuration file (clickstream.conf) which
consists of specifying inputs, filters and outputs.
The clickstream.conf file looks like:

© 2013 Happiest Minds Technologies Pvt. Ltd. All Rights Reserved
8

input {
file { # path for clickstream log
path =>"/path/to/_2004_02_01_19_click_stream.log" # define a type for all events
handeled by this input
type =>"weblog"
start_position =>"beginning" # the clickstream log is in character set ISO-8859-1
codec => plain {charset =>"ISO-8859-1"}
}
}
filter {
csv { # define columns present in weblog
columns =>[shop_id, unixtime, client_ip, session, page, referrer]
separator =>";"
}
grok { # get visited page and page parameters
match =>["page","%{URIPATH:page_visited}(?:%{URIPARAM:page_params})?"]
remove_field =>["page"]
}
date { # as we are getting unixtime field in epoch seconds we will convert it to normal
timestamp
match =>["unixtime","UNIX"]
}
geoip { # this will convert ip to longitude-latitude using GeoLiteCity database from Maxmind
source =>"client_ip"
fields =>["latitude","longitude"]
target =>"geoip"
add_field =>["[geoip][coordinates]","%{[geoip][longitude]}"]
add_field =>["[geoip][coordinates]","%{[geoip][latitude]}"]
}
mutate { # this will convert geoip.coordinates to float values
convert =>["[geoip][coordinates]","float"]}
}
output { # store output in local elasticsearch cluster
elasticsearch {
host =>"127.0.0.1"
}
}

In the above logstash configuration file we have defined the input to be a log file and given
the absolute path for the log. In filter section of we are parsing different fields, converting
epoch seconds to date time format and converting IP address to latitude-longitude

© 2013 Happiest Minds Technologies Pvt. Ltd. All Rights Reserved
9

combination for plotting them on map. Finally we are storing the parsed logs to a local
ElasticSearch cluster.
To start the logstash agent on the server run below command:
java -jar logstash-1.3.2-flatjar.jar agent -f clickstream.conf --web
This command will invoke logstash JVM process which will parse the logs, index them to
ElasticSearch and also start Kibana UI on http://localhost:9292/. By making some simple
dashboard in Kibana UI we can visualize the logs.
Some sample screenshots from Kibana UI are given below:

Screenshot 1 - Histogram showing page landing count for different time interval.

Screenshot 2 – Map showing geographical distribution of users.

© 2013 Happiest Minds Technologies Pvt. Ltd. All Rights Reserved
10

Screenshot 3 – Table showing different fields of logs.

Solution Benefits
The benefits offered by this solution are listed below:
1. All the tools used in this solution are free and open source so this is a very costeffective solution.
2. Development effort required is very low, as on coding part only logstash
configuration file needs to be written and for UI, Kibana dashboards needs to be
designed.
3. This solution is highly scalable. Logstash is tested to process around 25,000
events/per node/per second and ElasticSearch is used in production by many web
scale companies.
4. All the tools are open sourced and are being actively contributed to, by a large
developer community.
5. Logstash consumes very less memory, around 150MB.

Solution extend-ability
Logstash not only manages logs but it is capable of handling different types of events like
JSON, ActiveMQ, RabbitMQ, ZeroMQ, Twitter feeds, etc. It can also output aggregated
counts of different events. And it is capable of shipping out events to a variety of tools like
Riak, Redis, S3, Graphite, etc.
Apart from used as a search engine ElasticSearch be used as a NoSQL database, historical
archive and real-time analytics tool.

© 2013 Happiest Minds Technologies Pvt. Ltd. All Rights Reserved
11

The above mentioned features of Logstash and ElasticSearch offers us practical application of
this solution for many business problems.

Deliverables
Presentation of the solution with a focus on architecture, design and use cases.

Conclusion
The Log Management solution proposed using Logstash, ElasticSearch and Kibana is a costeffective, efficient, reliable and highly scalable solution.
These products are backed by an active user community which keeps adding values and new
functionalities to them. These are also backed and supported by the ElasticSearchcompany

References
Logstash - http://www.elasticsearch.org/overview/logstash/
ElasticSearch - http://www.elasticsearch.org/overview/
Kibana - http://www.elasticsearch.org/overview/kibana/
ElasticSearch Users - http://www.elasticsearch.com/case-studies/
Logstash Performance Test - https://gist.github.com/paulczar/4513552
Logstash Memory Consumption - http://blog.sematext.com/2013/11/05/logstashperformance-monitoring/
ECML/PKDD 2005 Discovery Challenge - http://lisp.vse.cz/challenge/ecmlpkdd2005/

Happiest Mind Innovators
Number of contributors - 1
Names of the contributors – Rishav Rohit
Role of the contributor – Solution design and development

© 2013 Happiest Minds Technologies Pvt. Ltd. All Rights Reserved

More Related Content

PDF
WSO2 Analytics Platform - The one stop shop for all your data needs
PDF
Elasticsearch for Data Analytics
PPTX
ElasticSearch - Introduction to Aggregations
PDF
PDF
WSO2 Analytics Platform: The one stop shop for all your data needs
PPTX
Elasticsearch - under the hood
PDF
WSO2 Stream Processor: Graphical Editor, HTTP & Message Trace Analytics and m...
PDF
Siddhi - cloud-native stream processor
WSO2 Analytics Platform - The one stop shop for all your data needs
Elasticsearch for Data Analytics
ElasticSearch - Introduction to Aggregations
WSO2 Analytics Platform: The one stop shop for all your data needs
Elasticsearch - under the hood
WSO2 Stream Processor: Graphical Editor, HTTP & Message Trace Analytics and m...
Siddhi - cloud-native stream processor

What's hot (20)

PDF
Analyze and visualize non-relational data with DocumentDB + Power BI
PDF
The Rise of Streaming SQL
PDF
Patterns for Deploying Analytics in the Real World
PDF
Introduction to Data Science and Analytics
PDF
A head start on cloud native event driven applications - bigdatadays
PDF
Introduction to Elasticsearch
PPTX
Data to Insight in a Flash: Introduction to Real-Time Analytics with WSO2 Com...
PDF
PDF
Data Analytics with Druid
PDF
Managing your Black Friday Logs - Antonio Bonuccelli - Codemotion Rome 2018
PPTX
Introduction to Elasticsearch with basics of Lucene
PPTX
Elasticsearch - DevNexus 2015
PDF
Drinking from the Firehose - Real-time Metrics
PDF
Spark and MongoDB
PDF
Make it fast for everyone - performance and middleware design
PDF
MongoDB - General Purpose Database
PPTX
Elasticsearch
PDF
Blazing Fast Analytics with MongoDB & Spark
PDF
Real-time Analytics with Apache Flink and Druid
PDF
Session 2 - NGSI-LD primer & Smart Data Models | Train the Trainers Program
Analyze and visualize non-relational data with DocumentDB + Power BI
The Rise of Streaming SQL
Patterns for Deploying Analytics in the Real World
Introduction to Data Science and Analytics
A head start on cloud native event driven applications - bigdatadays
Introduction to Elasticsearch
Data to Insight in a Flash: Introduction to Real-Time Analytics with WSO2 Com...
Data Analytics with Druid
Managing your Black Friday Logs - Antonio Bonuccelli - Codemotion Rome 2018
Introduction to Elasticsearch with basics of Lucene
Elasticsearch - DevNexus 2015
Drinking from the Firehose - Real-time Metrics
Spark and MongoDB
Make it fast for everyone - performance and middleware design
MongoDB - General Purpose Database
Elasticsearch
Blazing Fast Analytics with MongoDB & Spark
Real-time Analytics with Apache Flink and Druid
Session 2 - NGSI-LD primer & Smart Data Models | Train the Trainers Program
Ad

Similar to Log management with_logstash_and_elastic_search (20)

PPTX
centralization of log systems pour suivis
PPTX
ELK Stack Online Training - Elasticsearch Online Training Course.pptx
PPTX
Centralized Logging System Using ELK Stack
PPTX
Log analysis using elk
PPTX
Filebeat Elastic Search Presentation.pptx
PDF
Log Analysis Engine with Integration of Hadoop and Spark
PDF
FluentD vs. Logstash
PDF
Enabling SQL Access to Data Lakes
PDF
LogStash: Concept Run-Through
PDF
Archonnex at ICPSR
PDF
Using Elasticsearch for Analytics
PPT
Elk presentation 2#3
PPTX
Overview on elastic search
PPT
Configuring elasticsearch for performance and scale
PDF
Practical operability techniques for teams - Matthew Skelton - Agile in the C...
PPTX
Open source log analytics
PPTX
Elastic Search Capability Presentation.pptx
PPTX
UCIAD overview
PPTX
Centralized logging
PPTX
Azure Data Explorer deep dive - review 04.2020
centralization of log systems pour suivis
ELK Stack Online Training - Elasticsearch Online Training Course.pptx
Centralized Logging System Using ELK Stack
Log analysis using elk
Filebeat Elastic Search Presentation.pptx
Log Analysis Engine with Integration of Hadoop and Spark
FluentD vs. Logstash
Enabling SQL Access to Data Lakes
LogStash: Concept Run-Through
Archonnex at ICPSR
Using Elasticsearch for Analytics
Elk presentation 2#3
Overview on elastic search
Configuring elasticsearch for performance and scale
Practical operability techniques for teams - Matthew Skelton - Agile in the C...
Open source log analytics
Elastic Search Capability Presentation.pptx
UCIAD overview
Centralized logging
Azure Data Explorer deep dive - review 04.2020
Ad

Recently uploaded (20)

PDF
Diabetes mellitus diagnosis method based random forest with bat algorithm
PDF
MIND Revenue Release Quarter 2 2025 Press Release
PDF
Univ-Connecticut-ChatGPT-Presentaion.pdf
PPTX
SOPHOS-XG Firewall Administrator PPT.pptx
PPTX
Tartificialntelligence_presentation.pptx
PDF
Advanced methodologies resolving dimensionality complications for autism neur...
PPTX
Digital-Transformation-Roadmap-for-Companies.pptx
PPTX
OMC Textile Division Presentation 2021.pptx
PPT
Teaching material agriculture food technology
PPTX
Machine Learning_overview_presentation.pptx
PDF
Encapsulation theory and applications.pdf
PDF
Machine learning based COVID-19 study performance prediction
PDF
Empathic Computing: Creating Shared Understanding
PDF
Unlocking AI with Model Context Protocol (MCP)
PDF
Approach and Philosophy of On baking technology
PPTX
cloud_computing_Infrastucture_as_cloud_p
PDF
Getting Started with Data Integration: FME Form 101
PPTX
Spectroscopy.pptx food analysis technology
PPTX
A Presentation on Artificial Intelligence
PDF
Accuracy of neural networks in brain wave diagnosis of schizophrenia
Diabetes mellitus diagnosis method based random forest with bat algorithm
MIND Revenue Release Quarter 2 2025 Press Release
Univ-Connecticut-ChatGPT-Presentaion.pdf
SOPHOS-XG Firewall Administrator PPT.pptx
Tartificialntelligence_presentation.pptx
Advanced methodologies resolving dimensionality complications for autism neur...
Digital-Transformation-Roadmap-for-Companies.pptx
OMC Textile Division Presentation 2021.pptx
Teaching material agriculture food technology
Machine Learning_overview_presentation.pptx
Encapsulation theory and applications.pdf
Machine learning based COVID-19 study performance prediction
Empathic Computing: Creating Shared Understanding
Unlocking AI with Model Context Protocol (MCP)
Approach and Philosophy of On baking technology
cloud_computing_Infrastucture_as_cloud_p
Getting Started with Data Integration: FME Form 101
Spectroscopy.pptx food analysis technology
A Presentation on Artificial Intelligence
Accuracy of neural networks in brain wave diagnosis of schizophrenia

Log management with_logstash_and_elastic_search

  • 1. Jan 2014, HAPPIEST MINDS TECHNOLOGIES Innovation @Work Log Management with Logstash and ElasticSearch Rishav Rohit SHARING. MINDFUL. INTEGRITY. LEARNING. EXCELLENCE. SOCIAL RESPONSIBILITY.
  • 2. 2 Copyright Information This document is exclusive property of Happiest Minds Technologies Pvt. Ltd.It is intended for limited circulation. © 2013 Happiest Minds Technologies Pvt. Ltd. All Rights Reserved
  • 3. 3 Contents Copyright Information ........................................................................................................... 2 Abstract ................................................................................................................................ 4 Introduction .......................................................................................................................... 4 Problem Definition ................................................................................................................ 4 High Level Solution ................................................................................................................ 5 Solution Details ..................................................................................................................... 6 Solution Benefits ................................................................................................................... 6 Solution extend-ability ........................................................................................................ 10 Deliverables ........................................................................................................................ 10 Conclusion........................................................................................................................... 11 References .......................................................................................................................... 11 Happiest Mind Innovators ................................................................................................... 11 © 2013 Happiest Minds Technologies Pvt. Ltd. All Rights Reserved
  • 4. 4 Abstract Gathering logs from a wide array of servers and applications to be collected,searched, and analyzed centrally, in real-time, is a challenging task. Once we overcome this challenge we can get an ocean of insights from these logs, identify problems and come up with a solution or corrective measures much quickly. In this paper, we will build a highly scalable real-time log collection, search, visualization and analysis application using Logstash, ElasticSearch and Kibana. Introduction Recent compliance mandates require not only that organizations collect all logs, but also that they be reviewed regularly, are searchable, and are stored in their original, unaltered, raw form for mandate-specific timeframes. Log management solutions address data collection and retention needs in a way that allows them to inexpensively collect, store and manage large amounts of log data. To solve this problem we can build a highly scalable solution with real-time analysis using Logstash, ElasticSearch and Kibana. Logstash: Logstash is a free, light weight and high-integrality tool for managing events and logs. It can collect logs, parse them and store them in a central location.It is free and open source under Apache license. ElasticSearch: Elasticsearch is a search server based on Lucene. It provides a distributed, multitenant-capable full-text search engine with a RESTful web interface and schema-free JSON documents. Elasticsearch is free and open source under Apache license. Kibana: Kibana is a web-based, highly scalable dashboard solution seamlessly integrated with ElasticSearch and provides real-time analysis of streaming data. This is also free and open source product. Problem Definition Logs are extremely useful in identifying security incidents, policy violations, fraudulent activity, and operational problems. They are also valuable when performing audits, forensic analysis, internal investigations and identifying operational trends and long-term problems. However, the infinite variety of log data formats makes it impossible to utilize the data without data normalization. As organizations grow, the variety of log data sources and the volume of data will increase. Compounding this challenge is the variability of data formats and distributed nature of these sources; in addition, every network infrastructure is in a constant state of change, with new systems, applications, users, and devices being added every day of the year. © 2013 Happiest Minds Technologies Pvt. Ltd. All Rights Reserved
  • 5. 5 All these challenges can be handled in a cost-effectiveand efficient manner by a log management solution which can offer these features: Centralized Highly reliable Searchable Scalable Secure High Level Solution Given below is brief overview of different technologies used for Log Management solution. Logstash is a tool for managing events and logs. It is capable of filtering, modifying and shipping out events and logs. Logstash natively offer plugins for variety of sources like ElasticSearch, RabbitMQ, Redis, S3, Twitter, ZeroMQ, etc. Apart from single line logs it can handle json, multi-line logs also. It offers wide range of filters like grok, csv, date, geoip, kv, etc. and can it can ship out the parsed log to ElasticSearch, S3, Redis, ZeroMQ, MongoDB, etc. A complete list of Logstashinput, output and filter plugins is available at http://logstash.net/docs/latest/. The alternatives for Logstash are Splunk, Chukwa, Flume and Graylog but none of these offers the features like free and open source, high flexibility, low memory consumption and native plugins for a range of inputs, codecs, filters and outputs. ElasticSearch is rapidly growing open source search solution and it is used by thousands of enterprises in virtually every industry. It is being used in production at companies like Mozilla, StackOverflow, GitHub, Clout, McGraw-Hill, etc. ElasticSearch provides amazing features like faceted search, auto-complete, routing, sharding and scales easily. It provides search results in near real-time (close to milliseconds!). Kibana is light weight web based dashboard and analysis application capable of real-time analysis of streaming data. It provides dashboard components like maps, histogram, trends and many other basic components. The high level architecture for this solution is given in the diagram below: © 2013 Happiest Minds Technologies Pvt. Ltd. All Rights Reserved
  • 6. 6 Diagram – HLD of Log Management Solution In the above architecture we have three components: Logstash Agent ElasticSearch Cluster Kibana UI Logstash agent is a light java application running on the server(s) which is/are producing logs. It filters and parses log and then ships out a json document to ElasticSearch cluster. ElasticSearch cluster acts as a persistent store for logs and offers real-time search capabilities. Using its distributed architecture ElasticSearch can scale massively without compromising on performance. Kibana is an UI dashboard and analysis tool. It offers both pre-configured dashboards and on-demand dashboards. Kibana makes use of REST APIs to interact with ElasticSearch. Solution Details For purpose of demo of this solution I have used clickstream logs from ECML/PKDD 2005 Discovery Challenge. Some sample log lines are shown below: 12;1075658406;195.146.109.248;05aa4f4db0162e5723331042eb9ce8a7;/ct/?c=153;http://www.shop3.cz/ 12;1075658407;212.65.194.144;86140090a2e102f1644f29e5ddadad9b;/ls/?id=34;http://www.shop3.cz/ct/?c=155 12;1075658409;62.24.70.41;851f20e644eb8bf82bfdbe4379050e2e;/txt/?c=734;http://www.shop3.cz/onakupu/ These log lines are delimited by semi-colon (;) and have below mentioned fields in order: shop_id © 2013 Happiest Minds Technologies Pvt. Ltd. All Rights Reserved
  • 7. 7 unixtime client ip session visited page referrer For making the demo we need to create a logstash configuration file (clickstream.conf) which consists of specifying inputs, filters and outputs. The clickstream.conf file looks like: © 2013 Happiest Minds Technologies Pvt. Ltd. All Rights Reserved
  • 8. 8 input { file { # path for clickstream log path =>"/path/to/_2004_02_01_19_click_stream.log" # define a type for all events handeled by this input type =>"weblog" start_position =>"beginning" # the clickstream log is in character set ISO-8859-1 codec => plain {charset =>"ISO-8859-1"} } } filter { csv { # define columns present in weblog columns =>[shop_id, unixtime, client_ip, session, page, referrer] separator =>";" } grok { # get visited page and page parameters match =>["page","%{URIPATH:page_visited}(?:%{URIPARAM:page_params})?"] remove_field =>["page"] } date { # as we are getting unixtime field in epoch seconds we will convert it to normal timestamp match =>["unixtime","UNIX"] } geoip { # this will convert ip to longitude-latitude using GeoLiteCity database from Maxmind source =>"client_ip" fields =>["latitude","longitude"] target =>"geoip" add_field =>["[geoip][coordinates]","%{[geoip][longitude]}"] add_field =>["[geoip][coordinates]","%{[geoip][latitude]}"] } mutate { # this will convert geoip.coordinates to float values convert =>["[geoip][coordinates]","float"]} } output { # store output in local elasticsearch cluster elasticsearch { host =>"127.0.0.1" } } In the above logstash configuration file we have defined the input to be a log file and given the absolute path for the log. In filter section of we are parsing different fields, converting epoch seconds to date time format and converting IP address to latitude-longitude © 2013 Happiest Minds Technologies Pvt. Ltd. All Rights Reserved
  • 9. 9 combination for plotting them on map. Finally we are storing the parsed logs to a local ElasticSearch cluster. To start the logstash agent on the server run below command: java -jar logstash-1.3.2-flatjar.jar agent -f clickstream.conf --web This command will invoke logstash JVM process which will parse the logs, index them to ElasticSearch and also start Kibana UI on http://localhost:9292/. By making some simple dashboard in Kibana UI we can visualize the logs. Some sample screenshots from Kibana UI are given below: Screenshot 1 - Histogram showing page landing count for different time interval. Screenshot 2 – Map showing geographical distribution of users. © 2013 Happiest Minds Technologies Pvt. Ltd. All Rights Reserved
  • 10. 10 Screenshot 3 – Table showing different fields of logs. Solution Benefits The benefits offered by this solution are listed below: 1. All the tools used in this solution are free and open source so this is a very costeffective solution. 2. Development effort required is very low, as on coding part only logstash configuration file needs to be written and for UI, Kibana dashboards needs to be designed. 3. This solution is highly scalable. Logstash is tested to process around 25,000 events/per node/per second and ElasticSearch is used in production by many web scale companies. 4. All the tools are open sourced and are being actively contributed to, by a large developer community. 5. Logstash consumes very less memory, around 150MB. Solution extend-ability Logstash not only manages logs but it is capable of handling different types of events like JSON, ActiveMQ, RabbitMQ, ZeroMQ, Twitter feeds, etc. It can also output aggregated counts of different events. And it is capable of shipping out events to a variety of tools like Riak, Redis, S3, Graphite, etc. Apart from used as a search engine ElasticSearch be used as a NoSQL database, historical archive and real-time analytics tool. © 2013 Happiest Minds Technologies Pvt. Ltd. All Rights Reserved
  • 11. 11 The above mentioned features of Logstash and ElasticSearch offers us practical application of this solution for many business problems. Deliverables Presentation of the solution with a focus on architecture, design and use cases. Conclusion The Log Management solution proposed using Logstash, ElasticSearch and Kibana is a costeffective, efficient, reliable and highly scalable solution. These products are backed by an active user community which keeps adding values and new functionalities to them. These are also backed and supported by the ElasticSearchcompany References Logstash - http://www.elasticsearch.org/overview/logstash/ ElasticSearch - http://www.elasticsearch.org/overview/ Kibana - http://www.elasticsearch.org/overview/kibana/ ElasticSearch Users - http://www.elasticsearch.com/case-studies/ Logstash Performance Test - https://gist.github.com/paulczar/4513552 Logstash Memory Consumption - http://blog.sematext.com/2013/11/05/logstashperformance-monitoring/ ECML/PKDD 2005 Discovery Challenge - http://lisp.vse.cz/challenge/ecmlpkdd2005/ Happiest Mind Innovators Number of contributors - 1 Names of the contributors – Rishav Rohit Role of the contributor – Solution design and development © 2013 Happiest Minds Technologies Pvt. Ltd. All Rights Reserved