SlideShare a Scribd company logo
Introduction to
Lucene & Solr and Use-cases
October Solr/Lucene Meetup
Rahul Jain
@rahuldausa
Who am I?
 Software Engineer
 7 years of programming experience
 Areas of expertise/interest





High traffic web applications
JAVA/J2EE
Big data, NoSQL
Information-Retrieval, Machine learning

2
Agenda
•
•
•
•
•
•
•

Overview
Information Retrieval
Lucene
Solr
Use-cases
Solr In Action (demo)
Q&A
3
Information Retrieval (IR)
”Information retrieval is the activity of
obtaining information resources (in the
form of documents) relevant to an
information need from a collection of
information resources. Searches can
be based on metadata or on full-text
(or other content-based) indexing”
- Wikipedia
4
Inverted Index

5

Credit: https://developer.apple.com/library/mac/documentation/userexperience/conceptual/SearchKitConcepts/searchKit_basics/searchKit_basics.html
Basic Concepts
• tf (t in d) : term frequency in a document
• measure of how often a term appears in the document
• the number of times term t appears in the currently scored document d

• idf (t) : inverse document frequency
• measure of whether the term is common or rare across all documents, i.e. how often the
term appears across the index
• obtained by dividing the total number of documents by the number of documents
containing the term, and then taking the logarithm of that quotient.

• coord : coordinate-level matching
• number of terms in the query that were found in the document,
• e.g. term ‘x’ and ‘y’ found in doc1 but only term ‘x’ is found in doc2 so for a query of ‘x’ OR
‘y’ doc1 will receive a higher score.

• boost (index) : boost of the field at index-time
• boost (query) : boost of the field at query-time

6
Apache Lucene

7
Apache Lucene
• Information Retrieval library
• Open source
• Initially developed by Doug Cutting (Also author
of Hadoop)
• Indexing and Searching
• Inverted Index of documents
• High performance, scalable
• Provides advanced Search options like synonyms,
stopwords, based on similarity, proximity.
8
Apache Solr

9
Apache Solr
• Initially Developed by Yonik Seeley
• Enterprise Search platform for Apache Lucene
• Open source
• Highly reliable, scalable, fault tolerant

• Support distributed Indexing (SolrCloud),
Replication, and load balanced querying
10
Apache Solr - Features
•
•
•
•
•

full-text search
hit highlighting
faceted search (similar to GroupBy clause in RDBMS)
near real-time indexing
dynamic clustering (e.g. Cluster of most frequent
words, tagCloud)
• database integration
• rich document (e.g., Word, PDF) handling
• geospatial search
11
Solr – schema.xml
• Types with index and query Analyzers - similar
to data type
• Fields with name, type and options
• Unique Key
• Dynamic Fields
• Copy Fields

12
Solr – Content Analysis
•
•
•
•

Defines documents Model
Index contains documents.
Each document consists of fields.
Each Field has attributes.
– What is the data type (FieldType)
– How to handle the content (Analyzers, Filters)
– Is it a stored field (stored="true") or Index field
(indexed="true")
13
Solr – Content Analysis
• Field Attributes






Name : Name of the field
Type : Data-type (FieldType) of the field
Indexed : Should it be indexed (indexed="true/false")
Stored : Should it be stored (stored="true/false")
Required : is it a mandatory field
(required="true/false")
 Multi-Valued : Would it will contains multiple values
e.g. text: pizza, food (multiValued="true/false")
e.g. <field name="id" type="string" indexed="true" stored="true" required="true" multiValued="false" />
14
Solr – Content Analysis
• FieldType can be
–
–
–
–
–

StrField : String Field
TextField : Similar to StrField but can be analyzed
BoolField : Boolean Field
IntField : Integer Field
Trie Based
•
•
•
•

TrieIntField
TrieLongField
TrieDateField
TrieDoubleField

– Few more….
e.g.
<fieldType name="string" class="solr.StrField" sortMissingLast="true" omitNorms="true"/>
<fieldType name="boolean" class="solr.BoolField" sortMissingLast="true" omitNorms="true"/>
<fieldType name="tint" class="solr.TrieIntField" precisionStep="8" positionIncrementGap="0" omitNorms="true"/>
<fieldType name="tfloat" class="solr.TrieFloatField" precisionStep="8" positionIncrementGap="0" omitNorms="true"/>
<fieldType name="tlong" class="solr.TrieLongField" precisionStep="8" positionIncrementGap="0" omitNorms="true"/>
<fieldType name="tdouble" class="solr.TrieDoubleField" precisionStep="8" positionIncrementGap="0" omitNorms="true"/>
<fieldType name="tdate" class="solr.TrieDateField" precisionStep="6" positionIncrementGap="0" omitNorms="true"/>
Check for more Field Types @ https://cwiki.apache.org/confluence/display/solr/Field+Types+Included+with+Solr

15
Indexing Pipeline

• Analyzer : create tokens using a Tokenizer and/or applying
Filters (Token Filters)
• Each field can define an Analyzer at index time/query time or
the both at same time.
Credit : http://www.slideshare.net/otisg/lucene-introduction

16
Solr – Content Analysis
• Commonly used tokenizers:
•
•
•
•
•
•
•
•

StandardTokenizerFactory
WhitespaceTokenizerFactory
KeywordTokenizerFactory
LowerCaseTokenizerFactory
PatternTokenizerFactory
LetterTokenizerFactory
ClassicTokenizerFactory
UAX29URLEmailTokenizerFactory

17
Solr – Content Analysis
• Commonly used filters:
•
•
•
•
•
•
•
•
•

ClassicFilterFactory
LowerCaseFilterFactory
CommonGramsFilterFactory
EdgeNGramFilterFactory
TrimFilterFactory
StopFilterFactory
TypeTokenFilterFactory
PatternCaptureGroupFilterFactory
PatternReplaceFilterFactory

18
Solr – solrconfig.xml
• Data dir: where all index data will be stored
• Index configuration: ramBufferSize,
mergePolicy etc.
• Cache configurations: document, query result,
filter, field value cache
• Query Component
• Spell checker component

19
Query Types
• Single and multi term queries
• ex fieldname:value or title: software engineer

• +, -, AND, OR NOT operators.
• ex. title: (software AND engineer)

• Range queries on date or numeric fields,
• ex: timestamp: [ * TO NOW ] or price: [ 1 TO 100 ]

• Boost queries:
• e.g. title:Engineer ^1.5 OR text:Engineer

• Fuzzy search : is a search for words that are similar in
spelling
• e.g. roam~0.8 => noam

• Proximity Search : with a sloppy phrase query. The close
together the two terms appear, higher the score.
• ex “apache lucene”~20 : will look for all documents where “apache”
word occurs within 20 words of “lucene”
20
Solr/Lucene Use-cases

21
Solr/Lucene Use-cases
•
•
•
•
•
•
•
•

Search
Analytics
NoSQL datastore
Auto-suggestion / Auto-correction
Recommendation Engine (MoreLikeThis)
Relevancy Engine
Solr as a White-List
Spatial based Search
22
Search
• Application
– Eclipse, Hibernate search

• E-Commerce :
– Flipkart.com, Infibeam.com, Buy.com, Netflix.com, ebay.com

• Jobs
– Indeed.com, Simplyhired.com, Naukri.com, Shine.com,

• Auto
– AOL.com

• Travel
– Cleartrip.com

• Social Network
– Twitter.com, LinkedIn.com, mylife.com
23
Search (Contd.)
• Search Engine
– Yandex.ru, DuckDuckGo.com

• News Paper
– Guardian.co.uk

• Music/Movies
– Apple.com, Netflix.com

• Events
– Stubhub.com, Eventbrite.com

• Cloud Log Management
– Loggly.com

• Others
– Whitehouse.gov
24
Results Grouping (using facet)

Source: www.career9.com, www.indeed.com

25
Analytics




Analytics source : Kibana.org based on ElasticSearch and Logstash
Image Source : http://semicomplete.com/presentations/logstash-monitorama-2013/#/8

26
Autosuggestion

Source: www.drupal.org , www.yelp.com

27
Integration
•
•
•
•
•

Clustering (Solr – Carrot2)
Named Entity extraction (Solr-UIMA)
SolrCloud (Solr-Zookeeper)
Stanbol EntityHub
Parsing of many Different File Formats (SolrTika)

28
References
•
•
•
•
•

http://en.wikipedia.org/wiki/Tf%E2%80%93idf
http://lucene.apache.org/core/4_5_0/core/org/apache/lucene/search/similarities
/TFIDFSimilarity.html
http://www.quora.com/Which-major-companies-are-using-Solr-for-search
http://marc.info/?l=solr-user&m=137271228610366&w=2
http://java.dzone.com/articles/apache-solr-get-started-get

29
Thanks!
@rahuldausa on twitter and slideshare
http://www.linkedin.com/in/rahuldausa

Found Interesting ?
Join us @ http://www.meetup.com/Hyderabad-Apache-Solr-Lucene-Group/

30

More Related Content

PPTX
Introduction to Apache Lucene/Solr
PDF
Rapid Prototyping with Solr
PDF
Integrating the Solr search engine
PPTX
Battle of the giants: Apache Solr vs ElasticSearch
PDF
Solr: 4 big features
PPTX
Introduction to Apache Solr
PDF
Data Science with Solr and Spark
PDF
Solr Recipes
Introduction to Apache Lucene/Solr
Rapid Prototyping with Solr
Integrating the Solr search engine
Battle of the giants: Apache Solr vs ElasticSearch
Solr: 4 big features
Introduction to Apache Solr
Data Science with Solr and Spark
Solr Recipes

What's hot (20)

PDF
How Solr Search Works
PDF
Introduction to Apache Solr
PDF
Scaling Recommendations, Semantic Search, & Data Analytics with solr
KEY
State-of-the-Art Drupal Search with Apache Solr
PPTX
Building a Large Scale SEO/SEM Application with Apache Solr
PPTX
Battle of the Giants - Apache Solr vs. Elasticsearch (ApacheCon)
PPTX
Case study of Rujhaan.com (A social news app )
PPT
Lucene basics
PPTX
Solr 6 Feature Preview
PPT
Building Intelligent Search Applications with Apache Solr and PHP5
PPT
Solr and Elasticsearch, a performance study
PDF
Apache Solr/Lucene Internals by Anatoliy Sokolenko
PPTX
Introduction to Elasticsearch with basics of Lucene
PDF
Rebuilding Solr 6 Examples - Layer by Layer: Presented by Alexandre Rafalovit...
PDF
Introduction to Solr
PPTX
Battle of the Giants round 2
PPTX
ElasticSearch in Production: lessons learned
PPTX
Elasticsearch
PDF
High Performance JSON Search and Relational Faceted Browsing with Lucene
PDF
Elasticsearch Introduction at BigData meetup
How Solr Search Works
Introduction to Apache Solr
Scaling Recommendations, Semantic Search, & Data Analytics with solr
State-of-the-Art Drupal Search with Apache Solr
Building a Large Scale SEO/SEM Application with Apache Solr
Battle of the Giants - Apache Solr vs. Elasticsearch (ApacheCon)
Case study of Rujhaan.com (A social news app )
Lucene basics
Solr 6 Feature Preview
Building Intelligent Search Applications with Apache Solr and PHP5
Solr and Elasticsearch, a performance study
Apache Solr/Lucene Internals by Anatoliy Sokolenko
Introduction to Elasticsearch with basics of Lucene
Rebuilding Solr 6 Examples - Layer by Layer: Presented by Alexandre Rafalovit...
Introduction to Solr
Battle of the Giants round 2
ElasticSearch in Production: lessons learned
Elasticsearch
High Performance JSON Search and Relational Faceted Browsing with Lucene
Elasticsearch Introduction at BigData meetup
Ad

Viewers also liked (20)

PDF
Search at Twitter: Presented by Michael Busch, Twitter
PDF
Galene - LinkedIn's Search Architecture: Presented by Diego Buthay & Sriram S...
PPT
Introduction to Apache Solr.
PDF
Apache Solr crash course
PPTX
Emerging technologies /frameworks in Big Data
PPTX
Kafka at Scale: Multi-Tier Architectures
PDF
Dockercon State of the Art in Microservices
PPTX
Introduction to Machine Learning
PPTX
Introduction to Machine Learning
PPTX
Assamese search engine using SOLR by Moinuddin Ahmed ( moin )
PDF
Realtime Search at Twitter - Michael Busch
PDF
Search Engine-Building with Lucene and Solr
PPTX
Solr installation
PPTX
Intro to Apache Lucene and Solr
PDF
Hive case studies
PDF
Getting to know alfresco 4
PPTX
Webinar: MongoDB and Polyglot Persistence Architecture
PPTX
Dictionary Based Annotation at Scale with Spark by Sujit Pal
PDF
Scalable Internet Architecture
PDF
NLP Structured Data Investigation on Non-Text by Casey Stella
Search at Twitter: Presented by Michael Busch, Twitter
Galene - LinkedIn's Search Architecture: Presented by Diego Buthay & Sriram S...
Introduction to Apache Solr.
Apache Solr crash course
Emerging technologies /frameworks in Big Data
Kafka at Scale: Multi-Tier Architectures
Dockercon State of the Art in Microservices
Introduction to Machine Learning
Introduction to Machine Learning
Assamese search engine using SOLR by Moinuddin Ahmed ( moin )
Realtime Search at Twitter - Michael Busch
Search Engine-Building with Lucene and Solr
Solr installation
Intro to Apache Lucene and Solr
Hive case studies
Getting to know alfresco 4
Webinar: MongoDB and Polyglot Persistence Architecture
Dictionary Based Annotation at Scale with Spark by Sujit Pal
Scalable Internet Architecture
NLP Structured Data Investigation on Non-Text by Casey Stella
Ad

Similar to Introduction to Lucene & Solr and Usecases (20)

PDF
Introduction to Solr
PDF
Solr search engine with multiple table relation
KEY
Apache Solr - Enterprise search platform
PDF
Apache Solr Workshop
KEY
Solr 101
PPTX
Apache Solr Workshop
PPTX
The Apache Solr Smart Data Ecosystem
PPTX
Apache solr
PDF
Solr Powered Lucene
PPTX
Building Search & Recommendation Engines
PDF
Solr Recipes Workshop
PDF
Information Retrieval - Data Science Bootcamp
PPTX
Self-learned Relevancy with Apache Solr
PDF
Lucene for Solr Developers
PDF
PPT
Boosting Documents in Solr (Lucene Revolution 2011)
PDF
Lucene for Solr Developers
PDF
Your Big Data Stack is Too Big!: Presented by Timothy Potter, Lucidworks
PDF
Solr Architecture
Introduction to Solr
Solr search engine with multiple table relation
Apache Solr - Enterprise search platform
Apache Solr Workshop
Solr 101
Apache Solr Workshop
The Apache Solr Smart Data Ecosystem
Apache solr
Solr Powered Lucene
Building Search & Recommendation Engines
Solr Recipes Workshop
Information Retrieval - Data Science Bootcamp
Self-learned Relevancy with Apache Solr
Lucene for Solr Developers
Boosting Documents in Solr (Lucene Revolution 2011)
Lucene for Solr Developers
Your Big Data Stack is Too Big!: Presented by Timothy Potter, Lucidworks
Solr Architecture

More from Rahul Jain (9)

PDF
Flipkart Strategy Analysis and Recommendation
PPTX
Real time Analytics with Apache Kafka and Apache Spark
PPTX
Introduction to Apache Spark
PPTX
Introduction to Scala
PPTX
What is NoSQL and CAP Theorem
PPTX
Introduction to Kafka and Zookeeper
PPTX
Apache kafka
PPTX
Hadoop & HDFS for Beginners
DOC
Hibernate tutorial for beginners
Flipkart Strategy Analysis and Recommendation
Real time Analytics with Apache Kafka and Apache Spark
Introduction to Apache Spark
Introduction to Scala
What is NoSQL and CAP Theorem
Introduction to Kafka and Zookeeper
Apache kafka
Hadoop & HDFS for Beginners
Hibernate tutorial for beginners

Recently uploaded (20)

PDF
7 ChatGPT Prompts to Help You Define Your Ideal Customer Profile.pdf
PDF
Advanced methodologies resolving dimensionality complications for autism neur...
PDF
Blue Purple Modern Animated Computer Science Presentation.pdf.pdf
PDF
Heart disease approach using modified random forest and particle swarm optimi...
PDF
Encapsulation_ Review paper, used for researhc scholars
PPTX
Digital-Transformation-Roadmap-for-Companies.pptx
PDF
Network Security Unit 5.pdf for BCA BBA.
PPTX
TLE Review Electricity (Electricity).pptx
PDF
TokAI - TikTok AI Agent : The First AI Application That Analyzes 10,000+ Vira...
PDF
Mushroom cultivation and it's methods.pdf
PDF
Agricultural_Statistics_at_a_Glance_2022_0.pdf
PDF
Spectral efficient network and resource selection model in 5G networks
PDF
Machine learning based COVID-19 study performance prediction
PDF
Univ-Connecticut-ChatGPT-Presentaion.pdf
PPTX
A Presentation on Artificial Intelligence
PDF
Assigned Numbers - 2025 - Bluetooth® Document
PDF
Encapsulation theory and applications.pdf
PDF
Reach Out and Touch Someone: Haptics and Empathic Computing
PDF
Accuracy of neural networks in brain wave diagnosis of schizophrenia
PDF
NewMind AI Weekly Chronicles - August'25-Week II
7 ChatGPT Prompts to Help You Define Your Ideal Customer Profile.pdf
Advanced methodologies resolving dimensionality complications for autism neur...
Blue Purple Modern Animated Computer Science Presentation.pdf.pdf
Heart disease approach using modified random forest and particle swarm optimi...
Encapsulation_ Review paper, used for researhc scholars
Digital-Transformation-Roadmap-for-Companies.pptx
Network Security Unit 5.pdf for BCA BBA.
TLE Review Electricity (Electricity).pptx
TokAI - TikTok AI Agent : The First AI Application That Analyzes 10,000+ Vira...
Mushroom cultivation and it's methods.pdf
Agricultural_Statistics_at_a_Glance_2022_0.pdf
Spectral efficient network and resource selection model in 5G networks
Machine learning based COVID-19 study performance prediction
Univ-Connecticut-ChatGPT-Presentaion.pdf
A Presentation on Artificial Intelligence
Assigned Numbers - 2025 - Bluetooth® Document
Encapsulation theory and applications.pdf
Reach Out and Touch Someone: Haptics and Empathic Computing
Accuracy of neural networks in brain wave diagnosis of schizophrenia
NewMind AI Weekly Chronicles - August'25-Week II

Introduction to Lucene & Solr and Usecases

  • 1. Introduction to Lucene & Solr and Use-cases October Solr/Lucene Meetup Rahul Jain @rahuldausa
  • 2. Who am I?  Software Engineer  7 years of programming experience  Areas of expertise/interest     High traffic web applications JAVA/J2EE Big data, NoSQL Information-Retrieval, Machine learning 2
  • 4. Information Retrieval (IR) ”Information retrieval is the activity of obtaining information resources (in the form of documents) relevant to an information need from a collection of information resources. Searches can be based on metadata or on full-text (or other content-based) indexing” - Wikipedia 4
  • 6. Basic Concepts • tf (t in d) : term frequency in a document • measure of how often a term appears in the document • the number of times term t appears in the currently scored document d • idf (t) : inverse document frequency • measure of whether the term is common or rare across all documents, i.e. how often the term appears across the index • obtained by dividing the total number of documents by the number of documents containing the term, and then taking the logarithm of that quotient. • coord : coordinate-level matching • number of terms in the query that were found in the document, • e.g. term ‘x’ and ‘y’ found in doc1 but only term ‘x’ is found in doc2 so for a query of ‘x’ OR ‘y’ doc1 will receive a higher score. • boost (index) : boost of the field at index-time • boost (query) : boost of the field at query-time 6
  • 8. Apache Lucene • Information Retrieval library • Open source • Initially developed by Doug Cutting (Also author of Hadoop) • Indexing and Searching • Inverted Index of documents • High performance, scalable • Provides advanced Search options like synonyms, stopwords, based on similarity, proximity. 8
  • 10. Apache Solr • Initially Developed by Yonik Seeley • Enterprise Search platform for Apache Lucene • Open source • Highly reliable, scalable, fault tolerant • Support distributed Indexing (SolrCloud), Replication, and load balanced querying 10
  • 11. Apache Solr - Features • • • • • full-text search hit highlighting faceted search (similar to GroupBy clause in RDBMS) near real-time indexing dynamic clustering (e.g. Cluster of most frequent words, tagCloud) • database integration • rich document (e.g., Word, PDF) handling • geospatial search 11
  • 12. Solr – schema.xml • Types with index and query Analyzers - similar to data type • Fields with name, type and options • Unique Key • Dynamic Fields • Copy Fields 12
  • 13. Solr – Content Analysis • • • • Defines documents Model Index contains documents. Each document consists of fields. Each Field has attributes. – What is the data type (FieldType) – How to handle the content (Analyzers, Filters) – Is it a stored field (stored="true") or Index field (indexed="true") 13
  • 14. Solr – Content Analysis • Field Attributes      Name : Name of the field Type : Data-type (FieldType) of the field Indexed : Should it be indexed (indexed="true/false") Stored : Should it be stored (stored="true/false") Required : is it a mandatory field (required="true/false")  Multi-Valued : Would it will contains multiple values e.g. text: pizza, food (multiValued="true/false") e.g. <field name="id" type="string" indexed="true" stored="true" required="true" multiValued="false" /> 14
  • 15. Solr – Content Analysis • FieldType can be – – – – – StrField : String Field TextField : Similar to StrField but can be analyzed BoolField : Boolean Field IntField : Integer Field Trie Based • • • • TrieIntField TrieLongField TrieDateField TrieDoubleField – Few more…. e.g. <fieldType name="string" class="solr.StrField" sortMissingLast="true" omitNorms="true"/> <fieldType name="boolean" class="solr.BoolField" sortMissingLast="true" omitNorms="true"/> <fieldType name="tint" class="solr.TrieIntField" precisionStep="8" positionIncrementGap="0" omitNorms="true"/> <fieldType name="tfloat" class="solr.TrieFloatField" precisionStep="8" positionIncrementGap="0" omitNorms="true"/> <fieldType name="tlong" class="solr.TrieLongField" precisionStep="8" positionIncrementGap="0" omitNorms="true"/> <fieldType name="tdouble" class="solr.TrieDoubleField" precisionStep="8" positionIncrementGap="0" omitNorms="true"/> <fieldType name="tdate" class="solr.TrieDateField" precisionStep="6" positionIncrementGap="0" omitNorms="true"/> Check for more Field Types @ https://cwiki.apache.org/confluence/display/solr/Field+Types+Included+with+Solr 15
  • 16. Indexing Pipeline • Analyzer : create tokens using a Tokenizer and/or applying Filters (Token Filters) • Each field can define an Analyzer at index time/query time or the both at same time. Credit : http://www.slideshare.net/otisg/lucene-introduction 16
  • 17. Solr – Content Analysis • Commonly used tokenizers: • • • • • • • • StandardTokenizerFactory WhitespaceTokenizerFactory KeywordTokenizerFactory LowerCaseTokenizerFactory PatternTokenizerFactory LetterTokenizerFactory ClassicTokenizerFactory UAX29URLEmailTokenizerFactory 17
  • 18. Solr – Content Analysis • Commonly used filters: • • • • • • • • • ClassicFilterFactory LowerCaseFilterFactory CommonGramsFilterFactory EdgeNGramFilterFactory TrimFilterFactory StopFilterFactory TypeTokenFilterFactory PatternCaptureGroupFilterFactory PatternReplaceFilterFactory 18
  • 19. Solr – solrconfig.xml • Data dir: where all index data will be stored • Index configuration: ramBufferSize, mergePolicy etc. • Cache configurations: document, query result, filter, field value cache • Query Component • Spell checker component 19
  • 20. Query Types • Single and multi term queries • ex fieldname:value or title: software engineer • +, -, AND, OR NOT operators. • ex. title: (software AND engineer) • Range queries on date or numeric fields, • ex: timestamp: [ * TO NOW ] or price: [ 1 TO 100 ] • Boost queries: • e.g. title:Engineer ^1.5 OR text:Engineer • Fuzzy search : is a search for words that are similar in spelling • e.g. roam~0.8 => noam • Proximity Search : with a sloppy phrase query. The close together the two terms appear, higher the score. • ex “apache lucene”~20 : will look for all documents where “apache” word occurs within 20 words of “lucene” 20
  • 22. Solr/Lucene Use-cases • • • • • • • • Search Analytics NoSQL datastore Auto-suggestion / Auto-correction Recommendation Engine (MoreLikeThis) Relevancy Engine Solr as a White-List Spatial based Search 22
  • 23. Search • Application – Eclipse, Hibernate search • E-Commerce : – Flipkart.com, Infibeam.com, Buy.com, Netflix.com, ebay.com • Jobs – Indeed.com, Simplyhired.com, Naukri.com, Shine.com, • Auto – AOL.com • Travel – Cleartrip.com • Social Network – Twitter.com, LinkedIn.com, mylife.com 23
  • 24. Search (Contd.) • Search Engine – Yandex.ru, DuckDuckGo.com • News Paper – Guardian.co.uk • Music/Movies – Apple.com, Netflix.com • Events – Stubhub.com, Eventbrite.com • Cloud Log Management – Loggly.com • Others – Whitehouse.gov 24
  • 25. Results Grouping (using facet) Source: www.career9.com, www.indeed.com 25
  • 26. Analytics   Analytics source : Kibana.org based on ElasticSearch and Logstash Image Source : http://semicomplete.com/presentations/logstash-monitorama-2013/#/8 26
  • 28. Integration • • • • • Clustering (Solr – Carrot2) Named Entity extraction (Solr-UIMA) SolrCloud (Solr-Zookeeper) Stanbol EntityHub Parsing of many Different File Formats (SolrTika) 28
  • 30. Thanks! @rahuldausa on twitter and slideshare http://www.linkedin.com/in/rahuldausa Found Interesting ? Join us @ http://www.meetup.com/Hyderabad-Apache-Solr-Lucene-Group/ 30