SlideShare a Scribd company logo
You know, for search
  February 18th, 2011, RivieraJUG
About me
●   Lukáš Vlček ( @lukasvlcek )
●   Java developer since 2001
●   Joined Red Hat (JBoss division) in 2010
●   Member of JBoss.org team, focusing on search
In the beginning there was...
Elastic Search
Zzzzz...




...Shay Banon is not sleeping but dreaming!
Highly-available
                {JSON}



   RESTful
Search Engine                                     Distributed




                                              CLOUD
 Asynchronous




       Open Sourced
                         ... buzzword dreaming!
          (ASL2)
!                              *
                                 :-)

*
        ... and @ElasticSearch
                     was born!
Dreams do come true




https://github.com/elasticsearch/elasticsearch




        http://www.elasticsearch.org
What is ElasticSearch ?
●   Distributed
●   Highly-available
●   RESTful search engine (on top of Lucene)
●   Designed to speak JSON (JSON in, JSON out)
●   and more...


    ... but first, let's check simple examples.
Demo #1




RESTful JSON teaser
RESTful
●   Network interface for data indexing, searching
    and administration.

curl ­XGET 'http://localhost:9200/index1,index2/typeA,typeB/_search' ­d '{
  “query“ : { “match_all“ : {} }
}'




You can query one or more indices.
Indices can have aliases, you can also
use _all for all indices.


Each index have one or more types, something
like columns in DB table.
Highly available
●   For each index you can specify:
    ●   Number of shards
        –   Each index has fixed number of shards
    ●   Number of replicas
        –   Each shard can have 0-many replicas, can be changed
            dynamically
Distributed
●   Check next slides...
ZEN Discovery




      Node 1                    Node 2              Node 3              Node 4



                      A                    B                        C
A: { shards: 3, replicas: 2 }       B: { shards: 2, replicas: 3 }       C: { shards: 1, replicas: 0 }

 A1      A1      A1
                                     B1      B1      B1      B1
 A2      A2      A2                                                      C1
                                     B2      B2      B2      B2
 A3      A3      A3



        Gateway (longterm persistency of cluster data & metadata)
ZEN Discovery




      Node 1                    Node 2              Node 3              Node 4



                      A                    B                        C
A: { shards: 3, replicas: 2 }       B: { shards: 2, replicas: 3 }       C: { shards: 1, replicas: 4 }

 A1      A1      A1
                                     B1      B1      B1      B1
 A2      A2      A2                                                      C1     C1     C1      C1       C1
                                     B2      B2      B2      B2
 A3      A3      A3
                                                                                Can not allocate all replicas!
                                                                                Check Health API


        Gateway (longterm persistency of cluster data & metadata)
Talking to the cluster
●   Native client in Java and Groovy

          Client




                    Node 1   Node 2    Node 3


●   Client type:
    ● Node client
    ● Transport Client
Talking to the cluster
●   REST client

             Client




                       Node 1      Node 2      Node 3


●   Many clients built on top of REST API
    ●   Perl, PHP, Python, Ruby, Erlang, ... etc
Nodes do not have to be equal
●   Can be a master
●   Can be a data node
●   Can allow for REST transport interface
    ●     Http, memcached, thrift
●   Index store (file, memory)               Node 1
●   JMX enabled
●   Thread pool type
●   ...                             Node 2            Node 3
Gateway
●   Long time persistency allows for whole (and
    partial) cluster backup and recovery.

    Types:
    ●   Local (default)
    ●   NFS
    ●   HDFS
    ●   AWS: S3
Distributed queries
●   You can control the type of the search query
    per search request:
    ●   Query and Fetch
    ●   Query then Fetch
    ●   Dfs, Query and Fetch
    ●   Dfs, Query then Fetch
Demo #2




 Dynamic allocation of indices,
shards, replicas and Health API
Admin API
●   Indices
       –   Status
       –   CRUD operation
       –   Mapping, Open/Close, Update settings
       –   Flush, Refresh, Snapshot, Optimize
●   Cluster
       –   Health
       –   State
       –   Node Info and stats
       –   Shutdown
Demo #3




Admin API: getting JVM and OS stats
Rich query API
●   There is rich Query DSL for search, includes:
    ●   Queries
         –   Boolean, Fuzzy, MLT, Prefix, DisMax, ...
    ●   Filters
         –   And/Or/Not, Boolean, Geo, Missing, Exists, ...
    ●   Highlighting
    ●   Sort
    ●   Facets
         –   on a next slide...
Facets
●   Facets allows to provide aggregated data on
    the search request.
    ●   query
    ●   filter
    ●   terms
    ●   range
    ●   (date) histogram
    ●   statistical
    ●   geo distance
Scripting support
●   There is a support for using scripting languages
    in many places (for example for custom scoring,
    script fields, script key in facets ...)
    ●   mvel (default)
    ●   JS
    ●   Groovy
    ●   Python
Demo #4




      Java API: Indexing data
REST API: Faceted search, Highlighting
Parent / Child
●   The parent/child support allows to define a
    parent relationship from a child to a parent type.
    ●   has_child (query, filter)
    ●   top_children (filter)
River
●   Let's listen on stream of changes and index the
    data...
    ●   CouchDB
    ●   RabbitMQ
    ●   Twitter
    ●   Wikipedia
Versioning (new in 0.15)
●   “update if current” functionality
●   ie: I can get a document, change it and then put
    it back in (referencing the version ID I fetched)
    and it will either index or fail (if the document
    has been modified in the interim)
●   Completely real-time
Percolator (new in 0.15)
●   The percolator API allows to register queries
    against an index, and then send a percolate
    request which includes a document, and getting
    back the queries that match on that document
    out of set of registered queries.
Q&A
Thank you!

More Related Content

PDF
Dcm#8 elastic search
PPTX
quick intro to elastic search
PDF
Introduction to Elasticsearch
PPTX
Elastic Search
PPTX
An Introduction to Elastic Search.
PPTX
ElasticSearch AJUG 2013
PPTX
ElasticSearch - DevNexus Atlanta - 2014
PPTX
Intro to elasticsearch
Dcm#8 elastic search
quick intro to elastic search
Introduction to Elasticsearch
Elastic Search
An Introduction to Elastic Search.
ElasticSearch AJUG 2013
ElasticSearch - DevNexus Atlanta - 2014
Intro to elasticsearch

What's hot (19)

PDF
Simple search with elastic search
PDF
Managing Your Content with Elasticsearch
PPTX
Philly PHP: April '17 Elastic Search Introduction by Aditya Bhamidpati
PDF
ElasticSearch - index server used as a document database
PDF
ElasticSearch in action
PDF
Your Data, Your Search, ElasticSearch (EURUKO 2011)
ODP
Elasticsearch presentation 1
PPTX
Elasticsearch - DevNexus 2015
ODP
Query DSL In Elasticsearch
ODP
Elastic search
PPTX
Solr vs. Elasticsearch - Case by Case
PPTX
Elasticsearch - under the hood
ODP
Cool bonsai cool - an introduction to ElasticSearch
PDF
Elasticsearch: You know, for search! and more!
PPT
Elastic search apache_solr
PDF
Elasticsearch 101 - Cluster setup and tuning
PPTX
Elastic search Walkthrough
PDF
Null Bachaav - May 07 Attack Monitoring workshop.
PPTX
Introduction to ELK
Simple search with elastic search
Managing Your Content with Elasticsearch
Philly PHP: April '17 Elastic Search Introduction by Aditya Bhamidpati
ElasticSearch - index server used as a document database
ElasticSearch in action
Your Data, Your Search, ElasticSearch (EURUKO 2011)
Elasticsearch presentation 1
Elasticsearch - DevNexus 2015
Query DSL In Elasticsearch
Elastic search
Solr vs. Elasticsearch - Case by Case
Elasticsearch - under the hood
Cool bonsai cool - an introduction to ElasticSearch
Elasticsearch: You know, for search! and more!
Elastic search apache_solr
Elasticsearch 101 - Cluster setup and tuning
Elastic search Walkthrough
Null Bachaav - May 07 Attack Monitoring workshop.
Introduction to ELK
Ad

Viewers also liked (20)

PDF
(Elastic)search in big data
ODP
Elastic search
PDF
Elastic search & patent information @ mtc
PPTX
ElasticSearch Basic Introduction
PPTX
Power of Elastic Search - nLocate
PPSX
What I learnt: Elastic search & Kibana : introduction, installtion & configur...
PPTX
Elastic pivorak
PPTX
Elastic search overview
PPTX
Percolation Model and Controllability
PPTX
Machine Learning at Scale
PDF
First-passage percolation on random planar maps
PDF
mtc All Hands 8/15 Werte
PPTX
20131011 - Los Gatos - Netflix - Big Data Design Patterns
PDF
Percolation
PDF
Paper Review: An exact mapping between the Variational Renormalization Group ...
PDF
Artificial intelligence 2015: Quo Vadis?
PPTX
Machine Learning and Logging for Monitoring Microservices
PDF
Network-Growth Rule Dependence of Fractal Dimension of Percolation Cluster on...
PDF
Scalable and Reliable Logging at Pinterest
(Elastic)search in big data
Elastic search
Elastic search & patent information @ mtc
ElasticSearch Basic Introduction
Power of Elastic Search - nLocate
What I learnt: Elastic search & Kibana : introduction, installtion & configur...
Elastic pivorak
Elastic search overview
Percolation Model and Controllability
Machine Learning at Scale
First-passage percolation on random planar maps
mtc All Hands 8/15 Werte
20131011 - Los Gatos - Netflix - Big Data Design Patterns
Percolation
Paper Review: An exact mapping between the Variational Renormalization Group ...
Artificial intelligence 2015: Quo Vadis?
Machine Learning and Logging for Monitoring Microservices
Network-Growth Rule Dependence of Fractal Dimension of Percolation Cluster on...
Scalable and Reliable Logging at Pinterest
Ad

Similar to Elastic Search (20)

ODP
Graph databases in computational bioloby: case of neo4j and TitanDB
PDF
SQL for Elasticsearch
PDF
Stripe CTF3 wrap-up
PDF
I know Java, why should I consider Clojure?
PDF
DEVIEW 2013
PDF
CockroachDB: Architecture of a Geo-Distributed SQL Database
PDF
Chapter 8. Partial updates and retrievals.pdf
PDF
Scaling massive elastic search clusters - Rafał Kuć - Sematext
KEY
Grand Central Dispatch
PDF
Elasticsearch Basics
PDF
"Source Code Abstracts Classification Using CNN", Vadim Markovtsev, Lead Soft...
PDF
NoSQL Yes, But YesCQL, No?
PDF
Work items
PDF
Work items
PDF
From Lisp to Clojure/Incanter and RAn Introduction
PDF
Scalable up genomic analysis with ADAM
PDF
Doug Cutting on the State of the Hadoop Ecosystem
PDF
Cassandra: Not Just NoSQL, It's MoSQL
PDF
Introduction to ArangoDB (nosql matters Barcelona 2012)
PDF
Concurrecy in Ruby
Graph databases in computational bioloby: case of neo4j and TitanDB
SQL for Elasticsearch
Stripe CTF3 wrap-up
I know Java, why should I consider Clojure?
DEVIEW 2013
CockroachDB: Architecture of a Geo-Distributed SQL Database
Chapter 8. Partial updates and retrievals.pdf
Scaling massive elastic search clusters - Rafał Kuć - Sematext
Grand Central Dispatch
Elasticsearch Basics
"Source Code Abstracts Classification Using CNN", Vadim Markovtsev, Lead Soft...
NoSQL Yes, But YesCQL, No?
Work items
Work items
From Lisp to Clojure/Incanter and RAn Introduction
Scalable up genomic analysis with ADAM
Doug Cutting on the State of the Hadoop Ecosystem
Cassandra: Not Just NoSQL, It's MoSQL
Introduction to ArangoDB (nosql matters Barcelona 2012)
Concurrecy in Ruby

More from Lukas Vlcek (7)

PDF
Elasticsearch Monitoring in Openshift
PDF
JBug_React_and_Flux_2015
PDF
Elasticsearch @JBoss.org, 2014
PDF
An Introduction to Apache Hadoop, Mahout and HBase
PDF
Building search app with ElasticSearch
PDF
JBoss Snowdrop
PDF
Compass Framework
Elasticsearch Monitoring in Openshift
JBug_React_and_Flux_2015
Elasticsearch @JBoss.org, 2014
An Introduction to Apache Hadoop, Mahout and HBase
Building search app with ElasticSearch
JBoss Snowdrop
Compass Framework

Elastic Search

  • 1. You know, for search February 18th, 2011, RivieraJUG
  • 2. About me ● Lukáš Vlček ( @lukasvlcek ) ● Java developer since 2001 ● Joined Red Hat (JBoss division) in 2010 ● Member of JBoss.org team, focusing on search
  • 3. In the beginning there was...
  • 5. Zzzzz... ...Shay Banon is not sleeping but dreaming!
  • 6. Highly-available {JSON} RESTful Search Engine Distributed CLOUD Asynchronous Open Sourced ... buzzword dreaming! (ASL2)
  • 7. ! * :-) * ... and @ElasticSearch was born!
  • 8. Dreams do come true https://github.com/elasticsearch/elasticsearch http://www.elasticsearch.org
  • 9. What is ElasticSearch ? ● Distributed ● Highly-available ● RESTful search engine (on top of Lucene) ● Designed to speak JSON (JSON in, JSON out) ● and more... ... but first, let's check simple examples.
  • 11. RESTful ● Network interface for data indexing, searching and administration. curl ­XGET 'http://localhost:9200/index1,index2/typeA,typeB/_search' ­d '{   “query“ : { “match_all“ : {} } }' You can query one or more indices. Indices can have aliases, you can also use _all for all indices. Each index have one or more types, something like columns in DB table.
  • 12. Highly available ● For each index you can specify: ● Number of shards – Each index has fixed number of shards ● Number of replicas – Each shard can have 0-many replicas, can be changed dynamically
  • 13. Distributed ● Check next slides...
  • 14. ZEN Discovery Node 1 Node 2 Node 3 Node 4 A B C A: { shards: 3, replicas: 2 } B: { shards: 2, replicas: 3 } C: { shards: 1, replicas: 0 } A1 A1 A1 B1 B1 B1 B1 A2 A2 A2 C1 B2 B2 B2 B2 A3 A3 A3 Gateway (longterm persistency of cluster data & metadata)
  • 15. ZEN Discovery Node 1 Node 2 Node 3 Node 4 A B C A: { shards: 3, replicas: 2 } B: { shards: 2, replicas: 3 } C: { shards: 1, replicas: 4 } A1 A1 A1 B1 B1 B1 B1 A2 A2 A2 C1 C1 C1 C1 C1 B2 B2 B2 B2 A3 A3 A3 Can not allocate all replicas! Check Health API Gateway (longterm persistency of cluster data & metadata)
  • 16. Talking to the cluster ● Native client in Java and Groovy Client Node 1 Node 2 Node 3 ● Client type: ● Node client ● Transport Client
  • 17. Talking to the cluster ● REST client Client Node 1 Node 2 Node 3 ● Many clients built on top of REST API ● Perl, PHP, Python, Ruby, Erlang, ... etc
  • 18. Nodes do not have to be equal ● Can be a master ● Can be a data node ● Can allow for REST transport interface ● Http, memcached, thrift ● Index store (file, memory) Node 1 ● JMX enabled ● Thread pool type ● ... Node 2 Node 3
  • 19. Gateway ● Long time persistency allows for whole (and partial) cluster backup and recovery. Types: ● Local (default) ● NFS ● HDFS ● AWS: S3
  • 20. Distributed queries ● You can control the type of the search query per search request: ● Query and Fetch ● Query then Fetch ● Dfs, Query and Fetch ● Dfs, Query then Fetch
  • 21. Demo #2 Dynamic allocation of indices, shards, replicas and Health API
  • 22. Admin API ● Indices – Status – CRUD operation – Mapping, Open/Close, Update settings – Flush, Refresh, Snapshot, Optimize ● Cluster – Health – State – Node Info and stats – Shutdown
  • 23. Demo #3 Admin API: getting JVM and OS stats
  • 24. Rich query API ● There is rich Query DSL for search, includes: ● Queries – Boolean, Fuzzy, MLT, Prefix, DisMax, ... ● Filters – And/Or/Not, Boolean, Geo, Missing, Exists, ... ● Highlighting ● Sort ● Facets – on a next slide...
  • 25. Facets ● Facets allows to provide aggregated data on the search request. ● query ● filter ● terms ● range ● (date) histogram ● statistical ● geo distance
  • 26. Scripting support ● There is a support for using scripting languages in many places (for example for custom scoring, script fields, script key in facets ...) ● mvel (default) ● JS ● Groovy ● Python
  • 27. Demo #4 Java API: Indexing data REST API: Faceted search, Highlighting
  • 28. Parent / Child ● The parent/child support allows to define a parent relationship from a child to a parent type. ● has_child (query, filter) ● top_children (filter)
  • 29. River ● Let's listen on stream of changes and index the data... ● CouchDB ● RabbitMQ ● Twitter ● Wikipedia
  • 30. Versioning (new in 0.15) ● “update if current” functionality ● ie: I can get a document, change it and then put it back in (referencing the version ID I fetched) and it will either index or fail (if the document has been modified in the interim) ● Completely real-time
  • 31. Percolator (new in 0.15) ● The percolator API allows to register queries against an index, and then send a percolate request which includes a document, and getting back the queries that match on that document out of set of registered queries.
  • 32. Q&A