SlideShare a Scribd company logo
IJRET: International Journal of Research in Engineering and Technology eISSN: 2319-1163 | pISSN: 2321-7308 
__________________________________________________________________________________________ 
Volume: 03 Special Issue: 07 | May-2014, Available @ http://www.ijret.org 531 
USER SEARCH GOAL INFERENCE AND FEEDBACK SESSION USING FAST GENERALIZED – FUZZY C MEANS ALGORITHM R.Priyanka 1, N. Rajkumar2 1PG Scholar, Department of Software Engineering (M.E.), Sri Ramakrishna Engineering College, Coimbatore, India 2Head of the department, Department of Software Engineering (M.E.), Sri Ramakrishna Engineering College, Coimbatore, India 
There is a high stress on search engines due to the overload of information content in the internet. Search query submitted by the user to the search engine represents the user requirements. Sometimes, particular desire of the user cannot be fulfilled by the user search query. Also, long listed search result may not be always significant to the user requirements and irrelevant documents are returned by many of the existing search engines which follow the mechanism of keyword matching. Indeed, both the users and search engine developers need to reduce the information content in the internet. In this paper, we aim to infer the user search goal by considering the clicked URLs and reorganize the web search result. We use FG-FCM based clustering for grouping the semantically similar search results which further enhances the reorganized search result. Keywords— Ambiguous Query, Broad-topic Query, Feedback session, Semantics. 
----------------------------------------------------------------------***-------------------------------------------------------------------- 1. INTRODUCTION The dependency on the search engine has grown recently and the users can obtain plenty of information in the internet by submitting the query to the search engine. The requirements of the user are represented by the search query. Finding the right information when searching on search engines can be a pain for sure. Search engines present the search result to the user based on the ranking of website and not according to user interests. Thus, the result of the search engine is same for all the users though different users have different interests. For the broad-topic and ambiguous query, different users will have different search goal. For example, when the query “jaguar” is submitted to a search engine, some users may wish to find the information about the car while some others may intend to find the meaning of animal. Users’ particular information needs may not be satisfied by the query given by the user. Therefore, it is required to know the exact information needs of the user. It is necessary to infer the exact user search goal for satisfying the user needs. In this paper, we aim to improve the search engine relevance by identifying the various goals of a user search query and restructuring the web search results. Inference of user search goal can also be used in recommending the list of related queries [8] for the query submitted by the user. 
The user search goal has to be inferred for a search query submitted by the user based on clustering the feedback session. Feedback sessions can be represented in various ways like binary vector representation, pseudo-documents, etc. In this paper, we use pseudo-documents which contain keywords to represent the feedback session. The feedback sessions are mapped to the corresponding pseudo-documents. The semantically similar keywords are found for the given query. The search results that are semantically similar are clustered by FG-FCM clustering according to the search goal. Each cluster represents one search goal. The FG-FCM algorithm allows one piece of document data belong to two or more clusters. Also, the algorithm restructures and enhances the original search result by inferring the search goal of the user and reduces the time spent by the user in searching their information needs. By this method, user needs are satisfied. Performance of the restructured search result can be done by an evaluation criterion, Classified Average Precision (CAP). 2. RELATED WORKS 
Up to date, many works have been made to investigate on obtaining the user search goals and type of query. We examine some of the previous works to study the problem of clustering. It is important to discover different search goals of the given query to fulfill the needs of the user. Long listed search results can be restructured [2], [7], [9] according to the user requirements. Analysis of user search goals can be divided into three modules: search result reorganization, session boundary detection and query classification. In the first class, authors tried to reorganize the search results of the web. Wang and Zhai [7] analyzed the click-through logs and grouped the search result according to the clicked URLs. In second module, Jones and Klinkner [3] considered session boundaries to identify whether the queries and the goal match. In the third module, people categorized the user goal and queries into some specific classes. Lee et al. [4] categorized the user queries into “Navigational” and “Informational”, and inferred the search goals automatically. The search goal can be used to improve the quality of a search engine's results. They also
IJRET: International Journal of Research in Engineering and Technology eISSN: 2319-1163 | pISSN: 2321-7308 
__________________________________________________________________________________________ 
Volume: 03 Special Issue: 07 | May-2014, Available @ http://www.ijret.org 532 
discussed how to automate the goal identification process. Goal-identification task was based on two types of features: user-click behavior and anchor-link distribution Li et al. [5] defined the objective of the query as “Product intent” and “Job intent” and categorized the search queries accordingly. Today’s Web search engines provide very user friendly interface. Users can submit the queries in the form of keywords similar information retrieval system. Keywords may be a simple keyword or it may be a broad-topic and anything else. Search engine lists the related queries when a query is submitted. Ricardo Baeza-Yates et al. [6] discussed that the associated queries are based on previously issued queries and are provided to the user for redirecting the search process. Semantically similar queries were also identified by the clustering process which clusters the contents stored in query log of search engine. There are many advantages of restructuring the search result according to the user search goal. Joachims used implicit feedback to enhance the quality of search engines. He referred to click-through logs to optimize the search engine. Zheng Lu et al restructured the search result by clustering the pseudo-documents using K- means clustering [10]. But the K-means algorithm is computationally difficult to find the value of K and it does not work well with the clusters (in the original data) of different size and different density. Some of the prior works considered click-through logs as user’s feedback and restructured the search results accordingly. Other works did not consider the user feedback and considered the entire search results returned by the search engine though the link was not clicked by the user. These type of works produced noisy results. In this paper, we infer the user search goal by considering the feedback session. By this, we restructure and enhance the search result in order to satisfy the user needs. We use FG- FCM algorithm, a fast and robust algorithm for clustering the semantically similar links in the feedback sessions which provide better result than the previous works 3. RESEARCH METHODOLOHY The approach followed in this paper for inferring the user search goal is shown in the Fig 1. 3.1 User Search Query Analysis 
The user search query submitted by the user has to be analyzed. The click-through logs are referred for examining the user search queries and defining the feedback sessions. The queries submitted to the search engines by the user may be a simple query or ambiguous query. It is necessary to analyze the different meanings of the ambiguous query and restructure the search result into different clusters in order to get the user needs satisfied. The search results obtained for the query submitted by the user must be collected for restructuring the search result. 
3.2 Feedback Session 
The first process in reorganizing the search result is the feedback session representation. Feedback session consist the list of URLs up to the URL that was clicked by the user at last in a single session. All the unclicked URLs before the last clicked URL in a single session is also included because those URLs also has been browsed and analyzed by the user. Therefore, these unclicked URLs must also be included for the feedback. From this feedback session, the clicked URLs represent what information the user entail and the unclicked URLs reflect what information the user do not require. The URLs that are present after the last clicked URL cannot be taken as a part of feedback because it is not certain whether the user have scanned those URLs or not. Feedback session cannot be used directly for user search goal inference because it varies from that of the user click-through logs. So, it should be represented in some other forms in order to infer the user search goals efficiently. It can be represented in various forms. Binary vector representation is one of the popular ways of representing the feedback session. It consists of 0’s and 1’s where “0” represents the unclicked URL and “1” represents the clicked URL in a single session. This method cannot be used when more feedback sessions are considered because diverse feedback sessions may have unusual aspects. The vague keywords can be used to represent the user interests for a query. But these keywords cannot be used for representing the feedback session because they are usually hidden and not expressed clearly. Therefore, pseudo- documents can be used to infer the goals of the user. The feedback sessions are mapped to the pseudo-documents. These documents can be formed by enriching those URLs present in the feedback session. Enriching the URLs can be done by adding the title and a short snippet in a small text paragraph for the same URLs. 3.3 Semantic Similarity and Fast Generalized-Fuzzy C Means Clustering Semantics of the query submitted by the user must be analyzed and restructured accordingly. Semantically similar words can be identified by the wordnet tool. From the wordnet tool, the semantically similar words for the user search query are extracted. Then the FG-FCM clustering process begins. This algorithm is a variation of FCM algorithm which differs by adding the mathematical exponentiation to the result obtained using FCM. The number of clusters need not be specific.
IJRET: International Journal of Research in Engineering and Technology eISSN: 2319-1163 | pISSN: 2321-7308 
__________________________________________________________________________________________ 
Volume: 03 Special Issue: 07 | May-2014, Available @ http://www.ijret.org 533 
The advantage of using this algorithm is that the same data 
element can be in more than one cluster and also the clustering 
process is more efficient than the existing algorithm. The titles 
in the feedback session are grouped based on the similarity 
between the semantic keywords and the titles. Also, the 
similarity matrix Ui,j is used for the clustering process. 
The matrix consists of rows and columns where both the rows 
and columns represent the same titles of the search result 
arranged in same order. Entry in the matrix represents the 
similarity between the titles. Various similarity measures can 
be used to calculate the similarity. In this paper, cosine 
similarity is used. Cosine score for the titles can be computed 
as, 
Similarity Ti,Tj = Cos ( Ti,Tj) 
(1) 
Where, Ti and Tj represent titles of the search results. 
Fig.1. Search result restructuring process 
The matrix can be computed by 
 
  
 
 
  
 
 
 
 
 
2 
i j 
i j 
ij 
| x C | 
| x C | 
1 
U 
(2) 
Where, Xi represents the keyword count which is the total 
number of the semantic keywords in each URL of the 
feedback session. Cj and Ck are the value of clusters which are 
obtained from the computation, 
 
 
 
ij 
ij i 
j U 
U .x 
C (3) 
Where, Ui,j represents the value of the similarity matrix at the 
position i,j. Σ implies that the process should be repeated for 
the each and every title in the feedback session. 
The search results are grouped based on the clustered value. 
Thus, the search results are restructured and reorganized into 
groups based on semantic similarity. By assembling the 
semantically similar URLs into different clusters and 
restructure the search result, the users’ browsing experience 
can be improved efficiently. This will also satisfy the 
requirements of the user and reduces the time spent in 
browsing the contents. This process will be very much useful 
for the ambiguous queries submitted by the user where there 
will be more than one meaning. 
4. PERFORMANCE EVALUATION 
The performance of the approach we have followed can be 
evaluated from the Classified Average Precision (CAP) 
criterion. It can be computed using Voted Average Precision 
(VAP) and risk as, 
γ 
CAP  VAP  (1 Risk) (4) 
Where, γ is a parameter to adjust the value of risk. Risk can be 
calculated as 
2 
Cm 
m 
i , j 1(i j) ij d 
Risk 
    (5) 
Risk factor is added to avoid error. The value di,j can be either 1 
or 0. The value of di,j is 0 if the pair of ith clicked URL and jth 
clicked URL are classified into same class else the value of di,j 
is 1. 
2 
m C is the count of the URL pairs that are clicked in a 
single session. The Voted Average Precision (VAP) can be 
calculated from Average Precision (AP). It can be calculated as
IJRET: International Journal of Research in Engineering and Technology eISSN: 2319-1163 | pISSN: 2321-7308 
__________________________________________________________________________________________ 
Volume: 03 Special Issue: 07 | May-2014, Available @ http://www.ijret.org 534 
 
 
 
N 
r 1 r 
Rr 
rel(r) 
N 
1 
AP (6) 
Where, N+ is the total documents that are clicked in the 
retrieved search result, r is the rank of the URL, N is the total 
count of the retrieved search result, rel () is the binary function 
which is computed for the value of r. Rr is the total count of the 
retrieved search results with the rank r or less than that. 
In Voted Average Precision (VAP), votes are referred as clicks. 
It can be calculated after reorganizing the search results into 
different clusters and use the AP equation to calculate the 
“Voted Average Precision”. The cluster with more number of 
URLs is taken into account for calculating VAP. Suppose, 
number of clicked URLs in both the clusters is same, then the 
higher value of AP among the clusters is chosen as VAP. 
The comparison between K-Means and FG-FCM algorithm is 
shown in the figure. 
Fig 1 FG-FCM vs. K-Means 
5. CONCLUSIONS AND FUTURE WORK 
The method used in this paper can be used to infer the user 
search goal based on the feedback session. We analyze and 
reorganize only the search results that are obtained in the 
feedback session for efficient browsing. Therefore, there will 
not be any noisy data for restructuring the search result. Also, 
we consider semantically similar keywords to enhance the 
restructured search result. FG-FCM algorithm is used for 
clustering the URLs into different groups according to 
semantic similarity. Thus, the enhanced search result will 
improve the search engine relevance and satisfy the user to a 
greater extent. In future work, we plan to continue by 
investigating the feedback session not only for the single 
query in the form of keywords, but also for the queries 
submitted in the form of a sentence 
ACKNOWLEDGEMENTS 
I would like to thank Dr.N.Rajkumar for giving his innovative 
ideas, valuable comments and suggestions which led to 
improvise the presentation quality of the paper and for the 
successful completion of the work. 
REFERENCES 
[1] D. Beeferman and A. Berger, “Agglomerative 
Clustering of a Search Engine Query Log,” Proc. Sixth 
ACMSIGKDD Int’l Conf. Knowledge Discovery and 
Data Mining, pp. 407-416, 2000. 
[2] H. Chen and S. Dumais, “Bringing Order to the Web: 
Automatically Categorizing Search Results,” Proc. 
SIGCHI Conf. Human Factors in Computing Systems, 
pp. 145-152, 2000. 
[3] R. Jones and K.L., Klinkner “Beyond the Session 
Timeout: Automatic Hierarchical Segmentation of 
Search Topics in Query Logs,” Proc. 17th ACM Conf. 
Information and Knowledge Management, pp. 699- 
708, 2008. 
[4] U. Lee, Z. Liu and J. Cho, “Automatic Identification of 
User Goal sin Web Search,” Proc. 14th Int’l Conf. 
World Wide Web, pp. 391-400, 2005. 
[5] X. Li, Y.Y. Wang, and A. Acero, “Learning Query 
Intent from Regularized Click Graphs,” Proc. 31st Ann. 
Int’l ACM SIGIR Conf. Research and Development in 
Information Retrieval, pp. 339-346, 2008. 
[6] B. Poblete and B.Y. Ricardo B, “Query-Sets: Using 
Implicit Feedback and Query Patterns to Organize Web 
Documents,” Proc. 17th Int’l Conf. World Wide Web, 
pp. 41-50, 2008. 
[7] X. Wang and C.X. Zhai, “Learn from Web Search 
Logs to Organize Search Results,” Proc. 30th Ann. 
Int’l ACM SIGIR Conf. Research and Development in 
Information Retrieval pp. 87-94, 2007. 
[8] B.R. Yates, C Hurtado, and M Mendoza, “Query 
Recommendation Using Query Logs in Search 
Engines,” Proc. Int’l Conf. Current Trends in Database 
Technology, pp. 588-596, 2004. 
[9] H.J. Zeng, Z. Chen, W.Y Ma and J. Ma, “Learning to 
Cluster Web Search Results,” Proc. 27th Ann. Int’l 
ACM SIGIR Conf. Research and Development in 
Information Retrieval pp. 210-217, 2004. 
[10] L. Zheng, Z. Hongyuan, Y. Xiaokang, L. Weiyao and 
Z. Zhaohui, “A New Algorithm for Inferring User 
Search Goals with Feedback Sessions,” IEEE 
Transactions on Knowledge and Data Engineering, 
Vol. 25, No. 3, 2013.
IJRET: International Journal of Research in Engineering and Technology eISSN: 2319-1163 | pISSN: 2321-7308 
__________________________________________________________________________________________ 
Volume: 03 Special Issue: 07 | May-2014, Available @ http://www.ijret.org 535 
BIOGRAPHIES 
Dr.N. Rajkumar obtained his Bachelor’s degree in Computer Science and Engineering from Madurai Kamaraj University in 1991 and His Masters in Engg. the same stream in the 1995 from Jadavpur university, Kolkata. He has completed his Masters in Business Administration from IGNOU in the year 2003. His doctorate is in the field of Data Mining, which he completed in 2005 from PSG College of Technology, Coimbatore. He is currently the Head of the department of Computer Science and Engineering at Sri Ramakrishna Engineering College, Coimbatore, Tamilnadu. He has served in the field of education for over 20 years at various Technical Institutions. He has been instrumental in the conduct of 30 short-term courses and has also attended 20 courses conducted by other institution and organizations. He has authored for 2 books for the benefit of the student community in Networking and Computer Servicing. He has published as many as 50 papers in International Journals, Conferences and at the National level in his area of expertise namely Data Mining, Networking and Parallel computing respectively. He has guided 100- Project Scholar’s to-date. His E-mail id is nrk29@rediffmail.com 
R.Priyanka received her B.E CSE from Hindusthan college of Engineering and Technology, Coimbatore affiliated to Anna University, Chennai in 2012. At present, She is pursuing her M.E Software Engineering in Sri Ramakrishna Engineering College affiliated to Anna University Chennai. Her E-mail id is iamlavs38@gmail.com

More Related Content

PDF
A New Algorithm for Inferring User Search Goals with Feedback Sessions
PDF
TWO WAY CHAINED PACKETS MARKING TECHNIQUE FOR SECURE COMMUNICATION IN WIRELES...
PDF
Personalized web search using browsing history and domain knowledge
PDF
IJRET : International Journal of Research in Engineering and TechnologyImprov...
PDF
50120140502013
PDF
50120140506005 2
PDF
IRJET- A Novel Technique for Inferring User Search using Feedback Sessions
PDF
Naresh sharma
A New Algorithm for Inferring User Search Goals with Feedback Sessions
TWO WAY CHAINED PACKETS MARKING TECHNIQUE FOR SECURE COMMUNICATION IN WIRELES...
Personalized web search using browsing history and domain knowledge
IJRET : International Journal of Research in Engineering and TechnologyImprov...
50120140502013
50120140506005 2
IRJET- A Novel Technique for Inferring User Search using Feedback Sessions
Naresh sharma

What's hot (18)

PDF
Efficient way of user search location in query processing
PDF
Context Sensitive Search String Composition Algorithm using User Intention to...
PDF
UProRevs-User Profile Relevant Results
PDF
Application of fuzzy logic for user
PDF
IRJET- Classification and Filtration of Resources with Collaborative Tagging ...
PDF
Performance Evaluation of Query Processing Techniques in Information Retrieval
PDF
IJCER (www.ijceronline.com) International Journal of computational Engineerin...
PDF
K1803057782
PDF
Semantic web personalization
PDF
Quest Trail: An Effective Approach for Construction of Personalized Search En...
PDF
Classification-based Retrieval Methods to Enhance Information Discovery on th...
PDF
Ac02411221125
PDF
Kp3518241828
PDF
Context Driven Technique for Document Classification
PDF
Scaling Down Dimensions and Feature Extraction in Document Repository Classif...
PDF
Framework for web personalization using web mining
PDF
Context Based Classification of Reviews Using Association Rule Mining, Fuzzy ...
Efficient way of user search location in query processing
Context Sensitive Search String Composition Algorithm using User Intention to...
UProRevs-User Profile Relevant Results
Application of fuzzy logic for user
IRJET- Classification and Filtration of Resources with Collaborative Tagging ...
Performance Evaluation of Query Processing Techniques in Information Retrieval
IJCER (www.ijceronline.com) International Journal of computational Engineerin...
K1803057782
Semantic web personalization
Quest Trail: An Effective Approach for Construction of Personalized Search En...
Classification-based Retrieval Methods to Enhance Information Discovery on th...
Ac02411221125
Kp3518241828
Context Driven Technique for Document Classification
Scaling Down Dimensions and Feature Extraction in Document Repository Classif...
Framework for web personalization using web mining
Context Based Classification of Reviews Using Association Rule Mining, Fuzzy ...
Ad

Viewers also liked (20)

PDF
Human action recognition using local space time features and adaboost svm
PDF
Matlab simulink based digital protection of
PDF
Insect inspired hexapod robot for terrain navigation
PDF
Design of a modified leaf spring with an integrated damping system for added ...
PDF
Behavior of r.c.c. beam with rectangular opening
PDF
Research issues and priorities in the field of
PDF
Remedy for disease affected iris in iris recognition
PDF
Safety zone determination for wireless cellular
PDF
Removal of chromium (vi) by activated carbon derived from mangifera indica
PDF
Investigation of behaviour of 3 degrees of freedom
PDF
Exposure hazard analysis in cement fiber sheet
PDF
Two level data security using steganography and 2 d cellular automata
PDF
A simplified design of multiplier for multi layer feed forward hardware neura...
PDF
Design and performance analysis of band pass filter
PDF
Investigation and computational analysis of divergent orifice in fuel injecto...
PDF
Outage analysis of simo system over nakagami n fading channel
PDF
Static analysis of master leaf spring
PDF
Performance analysis of new proposed window for
PDF
Securing voip communications in an open network
PDF
Design and operation of synchronized robotic arm
Human action recognition using local space time features and adaboost svm
Matlab simulink based digital protection of
Insect inspired hexapod robot for terrain navigation
Design of a modified leaf spring with an integrated damping system for added ...
Behavior of r.c.c. beam with rectangular opening
Research issues and priorities in the field of
Remedy for disease affected iris in iris recognition
Safety zone determination for wireless cellular
Removal of chromium (vi) by activated carbon derived from mangifera indica
Investigation of behaviour of 3 degrees of freedom
Exposure hazard analysis in cement fiber sheet
Two level data security using steganography and 2 d cellular automata
A simplified design of multiplier for multi layer feed forward hardware neura...
Design and performance analysis of band pass filter
Investigation and computational analysis of divergent orifice in fuel injecto...
Outage analysis of simo system over nakagami n fading channel
Static analysis of master leaf spring
Performance analysis of new proposed window for
Securing voip communications in an open network
Design and operation of synchronized robotic arm
Ad

Similar to User search goal inference and feedback session using fast generalized – fuzzy c means algorithm (20)

PDF
11. Efficient Image Based Searching for Improving User Search Image Goals
DOCX
A new algorithm for inferring user search goals with feedback sessions
DOCX
A new algorithm for inferring user search goals with feedback sessions
PDF
Dynamic Organization of User Historical Queries
PPTX
A New Algorithm for Inferring User Search Goals with Feedback Sessions
PDF
Vol 12 No 1 - April 2014
PDF
Enhanced Performance of Search Engine with Multitype Feature Co-Selection of ...
PPTX
Developing and testing search engine algorithms –
PPTX
Determining Relevance Rankings from Search Click Logs
PDF
pedersen
PDF
Enhanced Web Usage Mining Using Fuzzy Clustering and Collaborative Filtering ...
PDF
A survey on various architectures, models and methodologies for information r...
PDF
`A Survey on approaches of Web Mining in Varied Areas
PDF
A Novel Approach for User Search Results Using Feedback Sessions
PDF
50120140502013
PDF
A New Recommender For The Mobile Web
PDF
Personalization of the Web Search
PDF
Information Retrieval (for beginners)
PPT
Modern information Retrieval-Relevance Feedback
PDF
Beyond search queries
11. Efficient Image Based Searching for Improving User Search Image Goals
A new algorithm for inferring user search goals with feedback sessions
A new algorithm for inferring user search goals with feedback sessions
Dynamic Organization of User Historical Queries
A New Algorithm for Inferring User Search Goals with Feedback Sessions
Vol 12 No 1 - April 2014
Enhanced Performance of Search Engine with Multitype Feature Co-Selection of ...
Developing and testing search engine algorithms –
Determining Relevance Rankings from Search Click Logs
pedersen
Enhanced Web Usage Mining Using Fuzzy Clustering and Collaborative Filtering ...
A survey on various architectures, models and methodologies for information r...
`A Survey on approaches of Web Mining in Varied Areas
A Novel Approach for User Search Results Using Feedback Sessions
50120140502013
A New Recommender For The Mobile Web
Personalization of the Web Search
Information Retrieval (for beginners)
Modern information Retrieval-Relevance Feedback
Beyond search queries

More from eSAT Publishing House (20)

PDF
Likely impacts of hudhud on the environment of visakhapatnam
PDF
Impact of flood disaster in a drought prone area – case study of alampur vill...
PDF
Hudhud cyclone – a severe disaster in visakhapatnam
PDF
Groundwater investigation using geophysical methods a case study of pydibhim...
PDF
Flood related disasters concerned to urban flooding in bangalore, india
PDF
Enhancing post disaster recovery by optimal infrastructure capacity building
PDF
Effect of lintel and lintel band on the global performance of reinforced conc...
PDF
Wind damage to trees in the gitam university campus at visakhapatnam by cyclo...
PDF
Wind damage to buildings, infrastrucuture and landscape elements along the be...
PDF
Shear strength of rc deep beam panels – a review
PDF
Role of voluntary teams of professional engineers in dissater management – ex...
PDF
Risk analysis and environmental hazard management
PDF
Review study on performance of seismically tested repaired shear walls
PDF
Monitoring and assessment of air quality with reference to dust particles (pm...
PDF
Low cost wireless sensor networks and smartphone applications for disaster ma...
PDF
Coastal zones – seismic vulnerability an analysis from east coast of india
PDF
Can fracture mechanics predict damage due disaster of structures
PDF
Assessment of seismic susceptibility of rc buildings
PDF
A geophysical insight of earthquake occurred on 21 st may 2014 off paradip, b...
PDF
Effect of hudhud cyclone on the development of visakhapatnam as smart and gre...
Likely impacts of hudhud on the environment of visakhapatnam
Impact of flood disaster in a drought prone area – case study of alampur vill...
Hudhud cyclone – a severe disaster in visakhapatnam
Groundwater investigation using geophysical methods a case study of pydibhim...
Flood related disasters concerned to urban flooding in bangalore, india
Enhancing post disaster recovery by optimal infrastructure capacity building
Effect of lintel and lintel band on the global performance of reinforced conc...
Wind damage to trees in the gitam university campus at visakhapatnam by cyclo...
Wind damage to buildings, infrastrucuture and landscape elements along the be...
Shear strength of rc deep beam panels – a review
Role of voluntary teams of professional engineers in dissater management – ex...
Risk analysis and environmental hazard management
Review study on performance of seismically tested repaired shear walls
Monitoring and assessment of air quality with reference to dust particles (pm...
Low cost wireless sensor networks and smartphone applications for disaster ma...
Coastal zones – seismic vulnerability an analysis from east coast of india
Can fracture mechanics predict damage due disaster of structures
Assessment of seismic susceptibility of rc buildings
A geophysical insight of earthquake occurred on 21 st may 2014 off paradip, b...
Effect of hudhud cyclone on the development of visakhapatnam as smart and gre...

Recently uploaded (20)

PDF
오픈소스 LLM, vLLM으로 Production까지 (Instruct.KR Summer Meetup, 2025)
PPTX
web development for engineering and engineering
PPTX
OOP with Java - Java Introduction (Basics)
PDF
Queuing formulas to evaluate throughputs and servers
PDF
Monitoring Global Terrestrial Surface Water Height using Remote Sensing - ARS...
PDF
Evaluating the Democratization of the Turkish Armed Forces from a Normative P...
PDF
algorithms-16-00088-v2hghjjnjnhhhnnjhj.pdf
PPTX
CH1 Production IntroductoryConcepts.pptx
PPTX
MCN 401 KTU-2019-PPE KITS-MODULE 2.pptx
PDF
Arduino robotics embedded978-1-4302-3184-4.pdf
PDF
SM_6th-Sem__Cse_Internet-of-Things.pdf IOT
PPTX
Strings in CPP - Strings in C++ are sequences of characters used to store and...
PPTX
FINAL REVIEW FOR COPD DIANOSIS FOR PULMONARY DISEASE.pptx
PPTX
436813905-LNG-Process-Overview-Short.pptx
PPTX
The-Looming-Shadow-How-AI-Poses-Dangers-to-Humanity.pptx
PPTX
UNIT-1 - COAL BASED THERMAL POWER PLANTS
PPTX
AgentX UiPath Community Webinar series - Delhi
PPTX
Lesson 3_Tessellation.pptx finite Mathematics
PPTX
“Next-Gen AI: Trends Reshaping Our World”
PPTX
Practice Questions on recent development part 1.pptx
오픈소스 LLM, vLLM으로 Production까지 (Instruct.KR Summer Meetup, 2025)
web development for engineering and engineering
OOP with Java - Java Introduction (Basics)
Queuing formulas to evaluate throughputs and servers
Monitoring Global Terrestrial Surface Water Height using Remote Sensing - ARS...
Evaluating the Democratization of the Turkish Armed Forces from a Normative P...
algorithms-16-00088-v2hghjjnjnhhhnnjhj.pdf
CH1 Production IntroductoryConcepts.pptx
MCN 401 KTU-2019-PPE KITS-MODULE 2.pptx
Arduino robotics embedded978-1-4302-3184-4.pdf
SM_6th-Sem__Cse_Internet-of-Things.pdf IOT
Strings in CPP - Strings in C++ are sequences of characters used to store and...
FINAL REVIEW FOR COPD DIANOSIS FOR PULMONARY DISEASE.pptx
436813905-LNG-Process-Overview-Short.pptx
The-Looming-Shadow-How-AI-Poses-Dangers-to-Humanity.pptx
UNIT-1 - COAL BASED THERMAL POWER PLANTS
AgentX UiPath Community Webinar series - Delhi
Lesson 3_Tessellation.pptx finite Mathematics
“Next-Gen AI: Trends Reshaping Our World”
Practice Questions on recent development part 1.pptx

User search goal inference and feedback session using fast generalized – fuzzy c means algorithm

  • 1. IJRET: International Journal of Research in Engineering and Technology eISSN: 2319-1163 | pISSN: 2321-7308 __________________________________________________________________________________________ Volume: 03 Special Issue: 07 | May-2014, Available @ http://www.ijret.org 531 USER SEARCH GOAL INFERENCE AND FEEDBACK SESSION USING FAST GENERALIZED – FUZZY C MEANS ALGORITHM R.Priyanka 1, N. Rajkumar2 1PG Scholar, Department of Software Engineering (M.E.), Sri Ramakrishna Engineering College, Coimbatore, India 2Head of the department, Department of Software Engineering (M.E.), Sri Ramakrishna Engineering College, Coimbatore, India There is a high stress on search engines due to the overload of information content in the internet. Search query submitted by the user to the search engine represents the user requirements. Sometimes, particular desire of the user cannot be fulfilled by the user search query. Also, long listed search result may not be always significant to the user requirements and irrelevant documents are returned by many of the existing search engines which follow the mechanism of keyword matching. Indeed, both the users and search engine developers need to reduce the information content in the internet. In this paper, we aim to infer the user search goal by considering the clicked URLs and reorganize the web search result. We use FG-FCM based clustering for grouping the semantically similar search results which further enhances the reorganized search result. Keywords— Ambiguous Query, Broad-topic Query, Feedback session, Semantics. ----------------------------------------------------------------------***-------------------------------------------------------------------- 1. INTRODUCTION The dependency on the search engine has grown recently and the users can obtain plenty of information in the internet by submitting the query to the search engine. The requirements of the user are represented by the search query. Finding the right information when searching on search engines can be a pain for sure. Search engines present the search result to the user based on the ranking of website and not according to user interests. Thus, the result of the search engine is same for all the users though different users have different interests. For the broad-topic and ambiguous query, different users will have different search goal. For example, when the query “jaguar” is submitted to a search engine, some users may wish to find the information about the car while some others may intend to find the meaning of animal. Users’ particular information needs may not be satisfied by the query given by the user. Therefore, it is required to know the exact information needs of the user. It is necessary to infer the exact user search goal for satisfying the user needs. In this paper, we aim to improve the search engine relevance by identifying the various goals of a user search query and restructuring the web search results. Inference of user search goal can also be used in recommending the list of related queries [8] for the query submitted by the user. The user search goal has to be inferred for a search query submitted by the user based on clustering the feedback session. Feedback sessions can be represented in various ways like binary vector representation, pseudo-documents, etc. In this paper, we use pseudo-documents which contain keywords to represent the feedback session. The feedback sessions are mapped to the corresponding pseudo-documents. The semantically similar keywords are found for the given query. The search results that are semantically similar are clustered by FG-FCM clustering according to the search goal. Each cluster represents one search goal. The FG-FCM algorithm allows one piece of document data belong to two or more clusters. Also, the algorithm restructures and enhances the original search result by inferring the search goal of the user and reduces the time spent by the user in searching their information needs. By this method, user needs are satisfied. Performance of the restructured search result can be done by an evaluation criterion, Classified Average Precision (CAP). 2. RELATED WORKS Up to date, many works have been made to investigate on obtaining the user search goals and type of query. We examine some of the previous works to study the problem of clustering. It is important to discover different search goals of the given query to fulfill the needs of the user. Long listed search results can be restructured [2], [7], [9] according to the user requirements. Analysis of user search goals can be divided into three modules: search result reorganization, session boundary detection and query classification. In the first class, authors tried to reorganize the search results of the web. Wang and Zhai [7] analyzed the click-through logs and grouped the search result according to the clicked URLs. In second module, Jones and Klinkner [3] considered session boundaries to identify whether the queries and the goal match. In the third module, people categorized the user goal and queries into some specific classes. Lee et al. [4] categorized the user queries into “Navigational” and “Informational”, and inferred the search goals automatically. The search goal can be used to improve the quality of a search engine's results. They also
  • 2. IJRET: International Journal of Research in Engineering and Technology eISSN: 2319-1163 | pISSN: 2321-7308 __________________________________________________________________________________________ Volume: 03 Special Issue: 07 | May-2014, Available @ http://www.ijret.org 532 discussed how to automate the goal identification process. Goal-identification task was based on two types of features: user-click behavior and anchor-link distribution Li et al. [5] defined the objective of the query as “Product intent” and “Job intent” and categorized the search queries accordingly. Today’s Web search engines provide very user friendly interface. Users can submit the queries in the form of keywords similar information retrieval system. Keywords may be a simple keyword or it may be a broad-topic and anything else. Search engine lists the related queries when a query is submitted. Ricardo Baeza-Yates et al. [6] discussed that the associated queries are based on previously issued queries and are provided to the user for redirecting the search process. Semantically similar queries were also identified by the clustering process which clusters the contents stored in query log of search engine. There are many advantages of restructuring the search result according to the user search goal. Joachims used implicit feedback to enhance the quality of search engines. He referred to click-through logs to optimize the search engine. Zheng Lu et al restructured the search result by clustering the pseudo-documents using K- means clustering [10]. But the K-means algorithm is computationally difficult to find the value of K and it does not work well with the clusters (in the original data) of different size and different density. Some of the prior works considered click-through logs as user’s feedback and restructured the search results accordingly. Other works did not consider the user feedback and considered the entire search results returned by the search engine though the link was not clicked by the user. These type of works produced noisy results. In this paper, we infer the user search goal by considering the feedback session. By this, we restructure and enhance the search result in order to satisfy the user needs. We use FG- FCM algorithm, a fast and robust algorithm for clustering the semantically similar links in the feedback sessions which provide better result than the previous works 3. RESEARCH METHODOLOHY The approach followed in this paper for inferring the user search goal is shown in the Fig 1. 3.1 User Search Query Analysis The user search query submitted by the user has to be analyzed. The click-through logs are referred for examining the user search queries and defining the feedback sessions. The queries submitted to the search engines by the user may be a simple query or ambiguous query. It is necessary to analyze the different meanings of the ambiguous query and restructure the search result into different clusters in order to get the user needs satisfied. The search results obtained for the query submitted by the user must be collected for restructuring the search result. 3.2 Feedback Session The first process in reorganizing the search result is the feedback session representation. Feedback session consist the list of URLs up to the URL that was clicked by the user at last in a single session. All the unclicked URLs before the last clicked URL in a single session is also included because those URLs also has been browsed and analyzed by the user. Therefore, these unclicked URLs must also be included for the feedback. From this feedback session, the clicked URLs represent what information the user entail and the unclicked URLs reflect what information the user do not require. The URLs that are present after the last clicked URL cannot be taken as a part of feedback because it is not certain whether the user have scanned those URLs or not. Feedback session cannot be used directly for user search goal inference because it varies from that of the user click-through logs. So, it should be represented in some other forms in order to infer the user search goals efficiently. It can be represented in various forms. Binary vector representation is one of the popular ways of representing the feedback session. It consists of 0’s and 1’s where “0” represents the unclicked URL and “1” represents the clicked URL in a single session. This method cannot be used when more feedback sessions are considered because diverse feedback sessions may have unusual aspects. The vague keywords can be used to represent the user interests for a query. But these keywords cannot be used for representing the feedback session because they are usually hidden and not expressed clearly. Therefore, pseudo- documents can be used to infer the goals of the user. The feedback sessions are mapped to the pseudo-documents. These documents can be formed by enriching those URLs present in the feedback session. Enriching the URLs can be done by adding the title and a short snippet in a small text paragraph for the same URLs. 3.3 Semantic Similarity and Fast Generalized-Fuzzy C Means Clustering Semantics of the query submitted by the user must be analyzed and restructured accordingly. Semantically similar words can be identified by the wordnet tool. From the wordnet tool, the semantically similar words for the user search query are extracted. Then the FG-FCM clustering process begins. This algorithm is a variation of FCM algorithm which differs by adding the mathematical exponentiation to the result obtained using FCM. The number of clusters need not be specific.
  • 3. IJRET: International Journal of Research in Engineering and Technology eISSN: 2319-1163 | pISSN: 2321-7308 __________________________________________________________________________________________ Volume: 03 Special Issue: 07 | May-2014, Available @ http://www.ijret.org 533 The advantage of using this algorithm is that the same data element can be in more than one cluster and also the clustering process is more efficient than the existing algorithm. The titles in the feedback session are grouped based on the similarity between the semantic keywords and the titles. Also, the similarity matrix Ui,j is used for the clustering process. The matrix consists of rows and columns where both the rows and columns represent the same titles of the search result arranged in same order. Entry in the matrix represents the similarity between the titles. Various similarity measures can be used to calculate the similarity. In this paper, cosine similarity is used. Cosine score for the titles can be computed as, Similarity Ti,Tj = Cos ( Ti,Tj) (1) Where, Ti and Tj represent titles of the search results. Fig.1. Search result restructuring process The matrix can be computed by             2 i j i j ij | x C | | x C | 1 U (2) Where, Xi represents the keyword count which is the total number of the semantic keywords in each URL of the feedback session. Cj and Ck are the value of clusters which are obtained from the computation,    ij ij i j U U .x C (3) Where, Ui,j represents the value of the similarity matrix at the position i,j. Σ implies that the process should be repeated for the each and every title in the feedback session. The search results are grouped based on the clustered value. Thus, the search results are restructured and reorganized into groups based on semantic similarity. By assembling the semantically similar URLs into different clusters and restructure the search result, the users’ browsing experience can be improved efficiently. This will also satisfy the requirements of the user and reduces the time spent in browsing the contents. This process will be very much useful for the ambiguous queries submitted by the user where there will be more than one meaning. 4. PERFORMANCE EVALUATION The performance of the approach we have followed can be evaluated from the Classified Average Precision (CAP) criterion. It can be computed using Voted Average Precision (VAP) and risk as, γ CAP  VAP  (1 Risk) (4) Where, γ is a parameter to adjust the value of risk. Risk can be calculated as 2 Cm m i , j 1(i j) ij d Risk     (5) Risk factor is added to avoid error. The value di,j can be either 1 or 0. The value of di,j is 0 if the pair of ith clicked URL and jth clicked URL are classified into same class else the value of di,j is 1. 2 m C is the count of the URL pairs that are clicked in a single session. The Voted Average Precision (VAP) can be calculated from Average Precision (AP). It can be calculated as
  • 4. IJRET: International Journal of Research in Engineering and Technology eISSN: 2319-1163 | pISSN: 2321-7308 __________________________________________________________________________________________ Volume: 03 Special Issue: 07 | May-2014, Available @ http://www.ijret.org 534    N r 1 r Rr rel(r) N 1 AP (6) Where, N+ is the total documents that are clicked in the retrieved search result, r is the rank of the URL, N is the total count of the retrieved search result, rel () is the binary function which is computed for the value of r. Rr is the total count of the retrieved search results with the rank r or less than that. In Voted Average Precision (VAP), votes are referred as clicks. It can be calculated after reorganizing the search results into different clusters and use the AP equation to calculate the “Voted Average Precision”. The cluster with more number of URLs is taken into account for calculating VAP. Suppose, number of clicked URLs in both the clusters is same, then the higher value of AP among the clusters is chosen as VAP. The comparison between K-Means and FG-FCM algorithm is shown in the figure. Fig 1 FG-FCM vs. K-Means 5. CONCLUSIONS AND FUTURE WORK The method used in this paper can be used to infer the user search goal based on the feedback session. We analyze and reorganize only the search results that are obtained in the feedback session for efficient browsing. Therefore, there will not be any noisy data for restructuring the search result. Also, we consider semantically similar keywords to enhance the restructured search result. FG-FCM algorithm is used for clustering the URLs into different groups according to semantic similarity. Thus, the enhanced search result will improve the search engine relevance and satisfy the user to a greater extent. In future work, we plan to continue by investigating the feedback session not only for the single query in the form of keywords, but also for the queries submitted in the form of a sentence ACKNOWLEDGEMENTS I would like to thank Dr.N.Rajkumar for giving his innovative ideas, valuable comments and suggestions which led to improvise the presentation quality of the paper and for the successful completion of the work. REFERENCES [1] D. Beeferman and A. Berger, “Agglomerative Clustering of a Search Engine Query Log,” Proc. Sixth ACMSIGKDD Int’l Conf. Knowledge Discovery and Data Mining, pp. 407-416, 2000. [2] H. Chen and S. Dumais, “Bringing Order to the Web: Automatically Categorizing Search Results,” Proc. SIGCHI Conf. Human Factors in Computing Systems, pp. 145-152, 2000. [3] R. Jones and K.L., Klinkner “Beyond the Session Timeout: Automatic Hierarchical Segmentation of Search Topics in Query Logs,” Proc. 17th ACM Conf. Information and Knowledge Management, pp. 699- 708, 2008. [4] U. Lee, Z. Liu and J. Cho, “Automatic Identification of User Goal sin Web Search,” Proc. 14th Int’l Conf. World Wide Web, pp. 391-400, 2005. [5] X. Li, Y.Y. Wang, and A. Acero, “Learning Query Intent from Regularized Click Graphs,” Proc. 31st Ann. Int’l ACM SIGIR Conf. Research and Development in Information Retrieval, pp. 339-346, 2008. [6] B. Poblete and B.Y. Ricardo B, “Query-Sets: Using Implicit Feedback and Query Patterns to Organize Web Documents,” Proc. 17th Int’l Conf. World Wide Web, pp. 41-50, 2008. [7] X. Wang and C.X. Zhai, “Learn from Web Search Logs to Organize Search Results,” Proc. 30th Ann. Int’l ACM SIGIR Conf. Research and Development in Information Retrieval pp. 87-94, 2007. [8] B.R. Yates, C Hurtado, and M Mendoza, “Query Recommendation Using Query Logs in Search Engines,” Proc. Int’l Conf. Current Trends in Database Technology, pp. 588-596, 2004. [9] H.J. Zeng, Z. Chen, W.Y Ma and J. Ma, “Learning to Cluster Web Search Results,” Proc. 27th Ann. Int’l ACM SIGIR Conf. Research and Development in Information Retrieval pp. 210-217, 2004. [10] L. Zheng, Z. Hongyuan, Y. Xiaokang, L. Weiyao and Z. Zhaohui, “A New Algorithm for Inferring User Search Goals with Feedback Sessions,” IEEE Transactions on Knowledge and Data Engineering, Vol. 25, No. 3, 2013.
  • 5. IJRET: International Journal of Research in Engineering and Technology eISSN: 2319-1163 | pISSN: 2321-7308 __________________________________________________________________________________________ Volume: 03 Special Issue: 07 | May-2014, Available @ http://www.ijret.org 535 BIOGRAPHIES Dr.N. Rajkumar obtained his Bachelor’s degree in Computer Science and Engineering from Madurai Kamaraj University in 1991 and His Masters in Engg. the same stream in the 1995 from Jadavpur university, Kolkata. He has completed his Masters in Business Administration from IGNOU in the year 2003. His doctorate is in the field of Data Mining, which he completed in 2005 from PSG College of Technology, Coimbatore. He is currently the Head of the department of Computer Science and Engineering at Sri Ramakrishna Engineering College, Coimbatore, Tamilnadu. He has served in the field of education for over 20 years at various Technical Institutions. He has been instrumental in the conduct of 30 short-term courses and has also attended 20 courses conducted by other institution and organizations. He has authored for 2 books for the benefit of the student community in Networking and Computer Servicing. He has published as many as 50 papers in International Journals, Conferences and at the National level in his area of expertise namely Data Mining, Networking and Parallel computing respectively. He has guided 100- Project Scholar’s to-date. His E-mail id is [email protected] R.Priyanka received her B.E CSE from Hindusthan college of Engineering and Technology, Coimbatore affiliated to Anna University, Chennai in 2012. At present, She is pursuing her M.E Software Engineering in Sri Ramakrishna Engineering College affiliated to Anna University Chennai. Her E-mail id is [email protected]