Computing Reviews
Today's Issue Hot Topics Search Browse Recommended My Account Log In
Review Help
Leveraging social bookmarks from partially tagged corpus for improved web page clustering
Trivedi A., Rai P., Daumé, III H., Duvall S.  ACM Transactions on Intelligent Systems and Technology (TIST) 3 (4): 1-18, 2012. Type: Article
Date Reviewed: Jan 25 2013

In information retrieval, clustering aims to improve user search experience through grouping objects. For example, it can be used for grouping search results so that users can quickly retrieve the interrelated items together. Recently, especially due to social bookmarking, it has become possible to have user-assigned descriptive tags in addition to page text for some web pages. Such user-generated content can be used to enrich the description of pages and improve clustering performance so that clusters become more cohesive.

In this study, the authors aim to obtain highly discriminative features based on page text and user tags, and improve clustering performance. For this purpose, they develop a method based on the multiview learning concept and a technique called kernel canonical correlation analysis. Their approach aims to show that, by considering the correlation between page text and tags, it is possible to obtain better features that would improve clustering performance. The authors consider both partially and fully tagged corpora, and in the experiments employ various combinations of page text and tag information (such as page text only, tags only, and their combination). They experiment with 2,000 tagged Open Directory Project (ODP) web pages. They show that their approach improves clustering performance.

The study is useful for clustering objects defined by more than one set of features. The authors provide several future research pointers, such as clustering medical records and multilingual data; however, understanding and using the method requires knowledge of the domains and languages.

Reviewer:  F. Can Review #: CR140871 (1305-0408)
Bookmark and Share
Clustering (H.3.3 ... )
Would you recommend this review?
Other reviews under "Clustering": Date
A study of the integration of passage-, document-, and cluster-based information for re-ranking search results
Krikon E., Kurland O.  Information Retrieval 14(6): 593-616, 2011. Type: Article
May 15 2012
Embracing semantics in zoomable user interface
Panza D., Vitali A., Sentinelli A., Celetto L.  ICMR 2011 (Proceedings of the 1st ACM International Conference on Multimedia Retrieval, Trento, Italy,  Apr 18-20, 2011) 1-2, 2011. Type: Proceedings
Jul 19 2011
Document clustering using NMF and fuzzy relation
Park S., An D., Yoo H.  ICUIMC 2011 (Proceedings of the 5th International Conference on Ubiquitous Information Management and Communication, Seoul, Korea,  Feb 21-23, 2011) 1-5, 2011. Type: Proceedings
Jun 28 2011

E-Mail This Printer-Friendly
Send Your Comments
Contact Us
Reproduction in whole or in part without permission is prohibited.   Copyright © 2000-2014 ThinkLoud, Inc.
Terms of Use
| Privacy Policy