Introduction to elasticsearch

Introduction to
By Melvyn Peignon

What will I cover?
- Company and products presentation
- Elasticsearch architecture
- Presentation of Kibana
- Presentation of the search API
- Analyzer
- TF/IDF and relevance
- Elasticsearch use case
- Conclusion

Elastic
Founded in 2012
- Is behind:
- Kibana
- Elasticsearch
- Logstash
- Beats

What is elasticsearch?
- Full text search engine
- Based on Lucene
- Highly available
- Distributed
- Scalable
- RESTful
- Open Source
Shay
Bannon

Trending between search-engine (ES is blue)

CRUD
CREATE
READ
UPDATE
DELETE

Some concepts to know
- Near real time (NRT)
- Cluster
- Node
- Index
- Document
- Shards and Replicas

Documents, Types, indexes
- An index is a collection of documents that share similar
properties.
- A document is the basic piece of information that can be
indexed.
- A type is a logical partition of the data in your index

Cluster, Nodes, Shards and Replicas
Cluster
Node 1
S1 S2
S3 S4

Cluster
Node 1 Node 2
S3 S4S1 S2

Cluster
Node 1 Node 2 Node 3 Node 4
S1 S2 S3 S4R2 R1 R4 R3

Cluster
Node 1 Node 2 Node 3 Node 4
S1 S2 S3 S4R2 R1 R4 R3
Ping
PongPing

Responsibilities of the master
- Cluster health
- All the creation of index
- Repartition of the Shards
- Repartition of the Replicas

Cluster recommendation
- Your servers in the same data center
- Your machines on different Rack
- Keeping at least 3 eligible master node (Quorum of 2 is 2)

What’s Kibana?
- Another elastic product
- A tool allowing you to communicate in a more “human”
way to your elasticsearch
- A product that allow you to do dashboard and data
visualization

Let’s go for a demonstration

Demonstration done on Kibana
Query can be found on Github:

The analyzer
{“a”: [id_0], “walk”: [id_0], “in”: [id_0], “the”: [id_0], “wood”: [id_0]}
Standard Analyzer

The analyzer
{“a”: [id_0, id_1], “walk”: [id_0], “in”: [id_0], “the”: [id_0],
“wood”: [id_0], “probability”:[id_1], “complete”:[id_1],
“guide”:[id_1]}
Standard Analyzer

The analyzer
{“a”: [id_0, id_1], “walk”: [id_0], “in”: [id_0],
“the”: [id_0], “wood”: [id_0], “probability”:[id_1],
“complete”:[id_1], “guide”:[id_1]}
[id_0, id_1]

The analyzer
{“a”: [id_0, id_1], “walk”: [id_0], “in”: [id_0],
“the”: [id_0], “wood”: [id_0],
“probability”:[id_1], “complete”:[id_1],
“guide”:[id_1]}
[]

The english analyzer
English Analyzer
{“walk”: [id_0], “wood”: [id_0]}

The english analyzer
{ “walk”: [id_0], “wood”: [id_0]}
[]

What is relevance?
Two theories to know:
- Boolean model
- Space vector model

Boolean model
O0 = “Eric is ... always feeding”
O1 = “Jherez is ... with the friends”
….
O6 = “Manage Idea… to Melvyn)”
QT= {“lab”, “manager”} QO = “OR”
T = {t1:”lab”, t2:”manager”, t3:”Idea”, …, “t4”:
feeding}
D = {D0, D1, …, D6}
D0 = {Eric, is, …, feeding}
D1 = {Jherez, is, …, friends}
D6 = {Manage, idea, …,
Melvyn}
S1 = {D0, D1, D6}
S2 = {D0, D6}
SF = S1 ∪ S2 = S1

Space vector model
S1 = {D0, D1, D6}
T0 = D0 ∩ QT (“lab”, “manager”) ⇒ V0 = (L0, M0)
T1 = D1 ∩ QT (“lab”) ⇒ V1 = (L1, 0)
T6 = D6 ∩ QT (“lab”, “manager”) ⇒ V6 = (L6, M6)

Weight of a token in a document
- Term frequency
TF = √Frequency
- Inverse Document Frequency
IDF = 1 + log(1/ (docFrequency + 1))
- Field length
FL = 1 / √TokenInField
Weight = TF x IDF x FL

Relevance
Vq = [1, 1.47]
V0 = [0.81, 0.85]
V1 = [0.37, 0]
V6 = [0.8, 1.2]
Relevance(Vq, Vx) = cos(Vq, Vx) =
(Vq . Vx) / (॥Vq॥.॥Vx॥)

Let’s Kaggle with elasticsearch
https://www.kaggle.com/c/whats-cooking

Results of our “Classifier”
Explanation of the methodology:
http://melvyn.pythonanywhere.com/posts/1/

Last advices?
- Mapping (I highly recommend having a mapping. You cannot update the type
defined in a field in the mapping)
- Elasticsearch as a database (I prefer having both, easier for reindexation,
having a back up, do my search and analytics on ES and use my database for
identification, etc ...)
- Elasticsearch as a NOSQL database (I wouldn’t do it on a serious project, but
nice to have if you wanna do a quick implementation for a POC)

Hope you enjoyed the presentation!
Thank you for your attention!
Questions?

Introduction to elasticsearch

More Related Content

What's hot (20)

Similar to Introduction to elasticsearch (20)

Recently uploaded (20)

Introduction to elasticsearch