Computing Reviews
Today's Issue Hot Topics Search Browse Recommended My Account Log In
Review Help
Search
Query intent mining with multiple dimensions of web search data
Jiang D., Leung K., Ng W. World Wide Web19 (3):475-497,2016.Type:Article
Date Reviewed: Nov 11 2016

How to derive search intent from user queries has been a hot research topic in recent years. Jiang et al. propose new frameworks to improve intent mining. Extensive experiments with large real-life datasets show the effectiveness of their approach.

The authors first propose a collection of multidimensional features that can capture the latent relations among the query, the URL, the current session, and term dimensions. These features are used collectively to quantify the hidden information in the query log. The URL dimension is represented by the search queries that result in the click-through on the same URL. The authors define the session dimension as “a series of queries that are sequentially submitted by a user to satisfy the same information need.” The term dimension is a collection of search queries of which at least one term is related.

With these three dimensions defined, the authors propose two new frameworks: a result-oriented framework (ROF) and a topic-oriented framework (TOF), to mine the search intent using the data represented by the three dimensions in search logs. The ROF framework uses the raw search query log data to compute the closeness of various measures. For example, the matrix QU consists of rows of query qi in which each column is a unique URL uj clicked through that query qi. Thus, the matrix multiplication of QU and its transpose produces a measure of closeness among queries in relation with the common URL. However, the dimension of the matrices resulting from this multiplication is huge. The authors thus propose the second framework, TOF, which uses the latent Dirichlet allocation (LDA) model to mine the topics in the search log. LDA has the advantages of reduced data dimensions and proven effectiveness.

To assess the performance of the proposed model, the authors ran extensive experiments using large real datasets. They used query log data from Yahoo, containing 506,515 search queries of 2,158 search engine users during a four-month period. Five human judges manually labeled the intents of these search queries according to 546 Open Directory Project (ODP) categories. The experiments aimed to compare measures such as precision, recall, F-measures, scalability, and allocated memory among different approaches. Results show that the proposed model performs significantly better in many measures compared to the state-of-the-art methods such as click graph and graph summarization.

Reviewer:  Xiannong Meng Review #: CR144920 (1702-0153)
Bookmark and Share
  Featured Reviewer  
 
Search Process (H.3.3 ... )
 
 
Data Mining (H.2.8 ... )
 
Would you recommend this review?
yes
no
Other reviews under "Search Process": Date
Search improvement via automatic query reformulation
Gauch S., Smith J. ACM Transactions on Information Systems 9(3): 249-280, 1991. Type: Article
Jul 1 1993
Criteria for the selection of search strategies in best-match document-retrieval systems
McCall F., Willett P. International Journal of Man-Machine Studies 25(3): 317-326, 1986. Type: Article
Oct 1 1987
The use of adaptive mechanisms for selection of search strategies in document retrieval systems
Croft W. (ed), Thompson R.  Research and development in information retrieval (, King’s College, Cambridge,1101984. Type: Proceedings
Aug 1 1985
more...

E-Mail This Printer-Friendly
Send Your Comments
Contact Us
Reproduction in whole or in part without permission is prohibited.   Copyright 1999-2024 ThinkLoud®
Terms of Use
| Privacy Policy