SlideShare a Scribd company logo
Introduction to
By Melvyn Peignon
What will I cover?
- Company and products presentation
- Elasticsearch architecture
- Presentation of Kibana
- Presentation of the search API
- Analyzer
- TF/IDF and relevance
- Elasticsearch use case
- Conclusion
Elastic
Founded in 2012
- Is behind:
- Kibana
- Elasticsearch
- Logstash
- Beats
What is elasticsearch?
- Full text search engine
- Based on Lucene
- Highly available
- Distributed
- Scalable
- RESTful
- Open Source
Shay
Bannon
Trending between search-engine (ES is blue)
How do they make money?
CRUD
CREATE
READ
UPDATE
DELETE
Some concepts to know
- Near real time (NRT)
- Cluster
- Node
- Index
- Document
- Shards and Replicas
Documents, Types, indexes
- An index is a collection of documents that share similar
properties.
- A document is the basic piece of information that can be
indexed.
- A type is a logical partition of the data in your index
Cluster, Nodes, Shards and Replicas
Cluster
Node 1
S1 S2
S3 S4
Cluster, Nodes, Shards and Replicas
Cluster
Node 1 Node 2
S3 S4S1 S2
Cluster, Nodes, Shards and Replicas
Cluster
Node 1 Node 2
S3 S4S1 S2
Cluster, Nodes, Shards and Replicas
Cluster
Node 1 Node 2 Node 3 Node 4
S1 S2 S3 S4R2 R1 R4 R3
Cluster, Nodes, Shards and Replicas
Cluster
Node 1 Node 2 Node 3 Node 4
S1 S2 S3 S4R2 R1 R4 R3
Cluster, Nodes, Shards and Replicas
Cluster
Node 1 Node 2 Node 3 Node 4
S1 S2 S3 S4R2 R1 R4 R3
Ping
PongPing
Cluster, Nodes, Shards and Replicas
Cluster
Node 1 Node 2 Node 3 Node 4
S1 S2 S3 S4R2 R1 R4 R3
Cluster, Nodes, Shards and Replicas
Cluster
Node 1 Node 2 Node 3 Node 4
S1 S2 S3 S4R2 R1 R4 R3
Responsibilities of the master
- Cluster health
- All the creation of index
- Repartition of the Shards
- Repartition of the Replicas
Cluster recommendation
- Your servers in the same data center
- Your machines on different Rack
- Keeping at least 3 eligible master node (Quorum of 2 is 2)
What’s Kibana?
- Another elastic product
- A tool allowing you to communicate in a more “human”
way to your elasticsearch
- A product that allow you to do dashboard and data
visualization
Introduction to elasticsearch
Let’s go for a demonstration
Demonstration done on Kibana
Query can be found on Github:
The analyzer
{“a”: [id_0], “walk”: [id_0], “in”: [id_0], “the”: [id_0], “wood”: [id_0]}
Standard Analyzer
The analyzer
{“a”: [id_0, id_1], “walk”: [id_0], “in”: [id_0], “the”: [id_0],
“wood”: [id_0], “probability”:[id_1], “complete”:[id_1],
“guide”:[id_1]}
Standard Analyzer
The analyzer
{“a”: [id_0, id_1], “walk”: [id_0], “in”: [id_0],
“the”: [id_0], “wood”: [id_0], “probability”:[id_1],
“complete”:[id_1], “guide”:[id_1]}
[id_0, id_1]
The analyzer
{“a”: [id_0, id_1], “walk”: [id_0], “in”: [id_0],
“the”: [id_0], “wood”: [id_0],
“probability”:[id_1], “complete”:[id_1],
“guide”:[id_1]}
[]
The english analyzer
English Analyzer
{“walk”: [id_0], “wood”: [id_0]}
The english analyzer
{ “walk”: [id_0], “wood”: [id_0]}
[]
What is relevance?
Two theories to know:
- Boolean model
- Space vector model
Boolean model
O0 = “Eric is ... always feeding”
O1 = “Jherez is ... with the friends”
….
O6 = “Manage Idea… to Melvyn)”
QT= {“lab”, “manager”} QO = “OR”
T = {t1:”lab”, t2:”manager”, t3:”Idea”, …, “t4”:
feeding}
D = {D0, D1, …, D6}
D0 = {Eric, is, …, feeding}
D1 = {Jherez, is, …, friends}
D6 = {Manage, idea, …,
Melvyn}
S1 = {D0, D1, D6}
S2 = {D0, D6}
SF = S1 ∪ S2 = S1
Space vector model
S1 = {D0, D1, D6}
T0 = D0 ∩ QT (“lab”, “manager”) ⇒ V0 = (L0, M0)
T1 = D1 ∩ QT (“lab”) ⇒ V1 = (L1, 0)
T6 = D6 ∩ QT (“lab”, “manager”) ⇒ V6 = (L6, M6)
Weight of a token in a document
- Term frequency
TF = √Frequency
- Inverse Document Frequency
IDF = 1 + log(1/ (docFrequency + 1))
- Field length
FL = 1 / √TokenInField
Weight = TF x IDF x FL
Relevance
Vq = [1, 1.47]
V0 = [0.81, 0.85]
V1 = [0.37, 0]
V6 = [0.8, 1.2]
Relevance(Vq, Vx) = cos(Vq, Vx) =
(Vq . Vx) / (॥Vq॥.॥Vx॥)
Let’s Kaggle with elasticsearch
https://www.kaggle.com/c/whats-cooking
Results of our “Classifier”
Explanation of the methodology:
http://melvyn.pythonanywhere.com/posts/1/
Last advices?
- Mapping (I highly recommend having a mapping. You cannot update the type
defined in a field in the mapping)
- Elasticsearch as a database (I prefer having both, easier for reindexation,
having a back up, do my search and analytics on ES and use my database for
identification, etc ...)
- Elasticsearch as a NOSQL database (I wouldn’t do it on a serious project, but
nice to have if you wanna do a quick implementation for a POC)
Hope you enjoyed the presentation!
Thank you for your attention!
Questions?

More Related Content

ODP
Elasticsearch for beginners
PDF
Elasticsearch
PDF
Introduction to elasticsearch
PPTX
ElasticSearch Basic Introduction
PPTX
Introduction to Elasticsearch with basics of Lucene
PDF
Elasticsearch Tutorial | Getting Started with Elasticsearch | ELK Stack Train...
PPTX
Elastic search overview
PPTX
An Introduction to Elastic Search.
Elasticsearch for beginners
Elasticsearch
Introduction to elasticsearch
ElasticSearch Basic Introduction
Introduction to Elasticsearch with basics of Lucene
Elasticsearch Tutorial | Getting Started with Elasticsearch | ELK Stack Train...
Elastic search overview
An Introduction to Elastic Search.

What's hot (20)

ODP
Deep Dive Into Elasticsearch
PPTX
Elasticsearch Introduction
PDF
Introduction to Elasticsearch
PPTX
An Intro to Elasticsearch and Kibana
PDF
Elasticsearch From the Bottom Up
PPTX
Elasticsearch
PPTX
Centralized log-management-with-elastic-stack
PPTX
Elastic - ELK, Logstash & Kibana
PDF
What Is ELK Stack | ELK Tutorial For Beginners | Elasticsearch Kibana | ELK S...
PPTX
PPTX
Elastic Stack Introduction
PPTX
Introduction to Elasticsearch
ODP
Elasticsearch presentation 1
PDF
Introduction à ElasticSearch
PDF
Elasticsearch: An Overview
PPTX
Elasticsearch
PDF
Elk - An introduction
PPTX
quick intro to elastic search
PDF
Elasticsearch
PPSX
What I learnt: Elastic search & Kibana : introduction, installtion & configur...
Deep Dive Into Elasticsearch
Elasticsearch Introduction
Introduction to Elasticsearch
An Intro to Elasticsearch and Kibana
Elasticsearch From the Bottom Up
Elasticsearch
Centralized log-management-with-elastic-stack
Elastic - ELK, Logstash & Kibana
What Is ELK Stack | ELK Tutorial For Beginners | Elasticsearch Kibana | ELK S...
Elastic Stack Introduction
Introduction to Elasticsearch
Elasticsearch presentation 1
Introduction à ElasticSearch
Elasticsearch: An Overview
Elasticsearch
Elk - An introduction
quick intro to elastic search
Elasticsearch
What I learnt: Elastic search & Kibana : introduction, installtion & configur...
Ad

Similar to Introduction to elasticsearch (20)

PPTX
Dev nexus 2017
PPTX
Devnexus 2018
PDF
Elasticsearch Introduction at BigData meetup
PDF
Vancouver part 1 intro to elasticsearch and kibana-beginner's crash course ...
PDF
Introduction to Elasticsearch
PPTX
Elasticsearch python
PPTX
Introduction to ElasticSearch
PDF
Elasticsearch speed is key
PDF
Elasticsearch and Spark
PDF
ElasticSearch: Distributed Multitenant NoSQL Datastore and Search Engine
PDF
Elasticsearch, a distributed search engine with real-time analytics
PPTX
Elasticsearch - DevNexus 2015
PPTX
Elasticsearch
PDF
Intro to Elasticsearch
PDF
Making your elastic cluster perform - Jettro Coenradie - Codemotion Amsterdam...
PPTX
Big data elasticsearch practical
PPTX
Elasticsearch, Logstash, Kibana. Cool search, analytics, data mining and more...
PPTX
Elasticsearch as a search alternative to a relational database
PDF
JavaCro'15 - Elasticsearch as a search alternative to a relational database -...
ODP
Elastic search
Dev nexus 2017
Devnexus 2018
Elasticsearch Introduction at BigData meetup
Vancouver part 1 intro to elasticsearch and kibana-beginner's crash course ...
Introduction to Elasticsearch
Elasticsearch python
Introduction to ElasticSearch
Elasticsearch speed is key
Elasticsearch and Spark
ElasticSearch: Distributed Multitenant NoSQL Datastore and Search Engine
Elasticsearch, a distributed search engine with real-time analytics
Elasticsearch - DevNexus 2015
Elasticsearch
Intro to Elasticsearch
Making your elastic cluster perform - Jettro Coenradie - Codemotion Amsterdam...
Big data elasticsearch practical
Elasticsearch, Logstash, Kibana. Cool search, analytics, data mining and more...
Elasticsearch as a search alternative to a relational database
JavaCro'15 - Elasticsearch as a search alternative to a relational database -...
Elastic search
Ad

Recently uploaded (20)

PDF
gpt5_lecture_notes_comprehensive_20250812015547.pdf
PPTX
Digital-Transformation-Roadmap-for-Companies.pptx
PPTX
A Presentation on Artificial Intelligence
PDF
Mushroom cultivation and it's methods.pdf
PPTX
TechTalks-8-2019-Service-Management-ITIL-Refresh-ITIL-4-Framework-Supports-Ou...
PPTX
KOM of Painting work and Equipment Insulation REV00 update 25-dec.pptx
PDF
7 ChatGPT Prompts to Help You Define Your Ideal Customer Profile.pdf
PDF
Getting Started with Data Integration: FME Form 101
PDF
Assigned Numbers - 2025 - Bluetooth® Document
PDF
Approach and Philosophy of On baking technology
PDF
Profit Center Accounting in SAP S/4HANA, S4F28 Col11
PPTX
SOPHOS-XG Firewall Administrator PPT.pptx
PDF
Encapsulation_ Review paper, used for researhc scholars
PPTX
Group 1 Presentation -Planning and Decision Making .pptx
PDF
Build a system with the filesystem maintained by OSTree @ COSCUP 2025
PDF
Unlocking AI with Model Context Protocol (MCP)
PDF
Blue Purple Modern Animated Computer Science Presentation.pdf.pdf
PPTX
OMC Textile Division Presentation 2021.pptx
PDF
Spectral efficient network and resource selection model in 5G networks
PDF
Diabetes mellitus diagnosis method based random forest with bat algorithm
gpt5_lecture_notes_comprehensive_20250812015547.pdf
Digital-Transformation-Roadmap-for-Companies.pptx
A Presentation on Artificial Intelligence
Mushroom cultivation and it's methods.pdf
TechTalks-8-2019-Service-Management-ITIL-Refresh-ITIL-4-Framework-Supports-Ou...
KOM of Painting work and Equipment Insulation REV00 update 25-dec.pptx
7 ChatGPT Prompts to Help You Define Your Ideal Customer Profile.pdf
Getting Started with Data Integration: FME Form 101
Assigned Numbers - 2025 - Bluetooth® Document
Approach and Philosophy of On baking technology
Profit Center Accounting in SAP S/4HANA, S4F28 Col11
SOPHOS-XG Firewall Administrator PPT.pptx
Encapsulation_ Review paper, used for researhc scholars
Group 1 Presentation -Planning and Decision Making .pptx
Build a system with the filesystem maintained by OSTree @ COSCUP 2025
Unlocking AI with Model Context Protocol (MCP)
Blue Purple Modern Animated Computer Science Presentation.pdf.pdf
OMC Textile Division Presentation 2021.pptx
Spectral efficient network and resource selection model in 5G networks
Diabetes mellitus diagnosis method based random forest with bat algorithm

Introduction to elasticsearch

  • 2. What will I cover? - Company and products presentation - Elasticsearch architecture - Presentation of Kibana - Presentation of the search API - Analyzer - TF/IDF and relevance - Elasticsearch use case - Conclusion
  • 3. Elastic Founded in 2012 - Is behind: - Kibana - Elasticsearch - Logstash - Beats
  • 4. What is elasticsearch? - Full text search engine - Based on Lucene - Highly available - Distributed - Scalable - RESTful - Open Source Shay Bannon
  • 6. How do they make money?
  • 8. Some concepts to know - Near real time (NRT) - Cluster - Node - Index - Document - Shards and Replicas
  • 9. Documents, Types, indexes - An index is a collection of documents that share similar properties. - A document is the basic piece of information that can be indexed. - A type is a logical partition of the data in your index
  • 10. Cluster, Nodes, Shards and Replicas Cluster Node 1 S1 S2 S3 S4
  • 11. Cluster, Nodes, Shards and Replicas Cluster Node 1 Node 2 S3 S4S1 S2
  • 12. Cluster, Nodes, Shards and Replicas Cluster Node 1 Node 2 S3 S4S1 S2
  • 13. Cluster, Nodes, Shards and Replicas Cluster Node 1 Node 2 Node 3 Node 4 S1 S2 S3 S4R2 R1 R4 R3
  • 14. Cluster, Nodes, Shards and Replicas Cluster Node 1 Node 2 Node 3 Node 4 S1 S2 S3 S4R2 R1 R4 R3
  • 15. Cluster, Nodes, Shards and Replicas Cluster Node 1 Node 2 Node 3 Node 4 S1 S2 S3 S4R2 R1 R4 R3 Ping PongPing
  • 16. Cluster, Nodes, Shards and Replicas Cluster Node 1 Node 2 Node 3 Node 4 S1 S2 S3 S4R2 R1 R4 R3
  • 17. Cluster, Nodes, Shards and Replicas Cluster Node 1 Node 2 Node 3 Node 4 S1 S2 S3 S4R2 R1 R4 R3
  • 18. Responsibilities of the master - Cluster health - All the creation of index - Repartition of the Shards - Repartition of the Replicas
  • 19. Cluster recommendation - Your servers in the same data center - Your machines on different Rack - Keeping at least 3 eligible master node (Quorum of 2 is 2)
  • 20. What’s Kibana? - Another elastic product - A tool allowing you to communicate in a more “human” way to your elasticsearch - A product that allow you to do dashboard and data visualization
  • 22. Let’s go for a demonstration
  • 23. Demonstration done on Kibana Query can be found on Github:
  • 24. The analyzer {“a”: [id_0], “walk”: [id_0], “in”: [id_0], “the”: [id_0], “wood”: [id_0]} Standard Analyzer
  • 25. The analyzer {“a”: [id_0, id_1], “walk”: [id_0], “in”: [id_0], “the”: [id_0], “wood”: [id_0], “probability”:[id_1], “complete”:[id_1], “guide”:[id_1]} Standard Analyzer
  • 26. The analyzer {“a”: [id_0, id_1], “walk”: [id_0], “in”: [id_0], “the”: [id_0], “wood”: [id_0], “probability”:[id_1], “complete”:[id_1], “guide”:[id_1]} [id_0, id_1]
  • 27. The analyzer {“a”: [id_0, id_1], “walk”: [id_0], “in”: [id_0], “the”: [id_0], “wood”: [id_0], “probability”:[id_1], “complete”:[id_1], “guide”:[id_1]} []
  • 28. The english analyzer English Analyzer {“walk”: [id_0], “wood”: [id_0]}
  • 29. The english analyzer { “walk”: [id_0], “wood”: [id_0]} []
  • 30. What is relevance? Two theories to know: - Boolean model - Space vector model
  • 31. Boolean model O0 = “Eric is ... always feeding” O1 = “Jherez is ... with the friends” …. O6 = “Manage Idea… to Melvyn)” QT= {“lab”, “manager”} QO = “OR” T = {t1:”lab”, t2:”manager”, t3:”Idea”, …, “t4”: feeding} D = {D0, D1, …, D6} D0 = {Eric, is, …, feeding} D1 = {Jherez, is, …, friends} D6 = {Manage, idea, …, Melvyn} S1 = {D0, D1, D6} S2 = {D0, D6} SF = S1 ∪ S2 = S1
  • 32. Space vector model S1 = {D0, D1, D6} T0 = D0 ∩ QT (“lab”, “manager”) ⇒ V0 = (L0, M0) T1 = D1 ∩ QT (“lab”) ⇒ V1 = (L1, 0) T6 = D6 ∩ QT (“lab”, “manager”) ⇒ V6 = (L6, M6)
  • 33. Weight of a token in a document - Term frequency TF = √Frequency - Inverse Document Frequency IDF = 1 + log(1/ (docFrequency + 1)) - Field length FL = 1 / √TokenInField Weight = TF x IDF x FL
  • 34. Relevance Vq = [1, 1.47] V0 = [0.81, 0.85] V1 = [0.37, 0] V6 = [0.8, 1.2] Relevance(Vq, Vx) = cos(Vq, Vx) = (Vq . Vx) / (॥Vq॥.॥Vx॥)
  • 35. Let’s Kaggle with elasticsearch https://www.kaggle.com/c/whats-cooking
  • 36. Results of our “Classifier” Explanation of the methodology: http://melvyn.pythonanywhere.com/posts/1/
  • 37. Last advices? - Mapping (I highly recommend having a mapping. You cannot update the type defined in a field in the mapping) - Elasticsearch as a database (I prefer having both, easier for reindexation, having a back up, do my search and analytics on ES and use my database for identification, etc ...) - Elasticsearch as a NOSQL database (I wouldn’t do it on a serious project, but nice to have if you wanna do a quick implementation for a POC)
  • 38. Hope you enjoyed the presentation! Thank you for your attention! Questions?