Computing Reviews
Today's Issue Hot Topics Search Browse Recommended My Account Log In
Review Help
Search
Leveraging social bookmarks from partially tagged corpus for improved web page clustering
Trivedi A., Rai P., Daumé, III H., Duvall S. ACM Transactions on Intelligent Systems and Technology3 (4):1-18,2012.Type:Article
Date Reviewed: Jan 25 2013

In information retrieval, clustering aims to improve user search experience through grouping objects. For example, it can be used for grouping search results so that users can quickly retrieve the interrelated items together. Recently, especially due to social bookmarking, it has become possible to have user-assigned descriptive tags in addition to page text for some web pages. Such user-generated content can be used to enrich the description of pages and improve clustering performance so that clusters become more cohesive.

In this study, the authors aim to obtain highly discriminative features based on page text and user tags, and improve clustering performance. For this purpose, they develop a method based on the multiview learning concept and a technique called kernel canonical correlation analysis. Their approach aims to show that, by considering the correlation between page text and tags, it is possible to obtain better features that would improve clustering performance. The authors consider both partially and fully tagged corpora, and in the experiments employ various combinations of page text and tag information (such as page text only, tags only, and their combination). They experiment with 2,000 tagged Open Directory Project (ODP) web pages. They show that their approach improves clustering performance.

The study is useful for clustering objects defined by more than one set of features. The authors provide several future research pointers, such as clustering medical records and multilingual data; however, understanding and using the method requires knowledge of the domains and languages.

Reviewer:  F. Can Review #: CR140871 (1305-0408)
Bookmark and Share
 
Clustering (H.3.3 ... )
 
Would you recommend this review?
yes
no
Other reviews under "Clustering": Date
Concepts and effectiveness of the cover-coefficient-based clustering methodology for text databases
Can F. (ed), Ozkarahan E. ACM Transactions on Database Systems 15(3): 483-517, 1990. Type: Article
Dec 1 1992
A parallel algorithm for record clustering
Omiecinski E., Scheuermann P. ACM Transactions on Database Systems 15(3): 599-624, 1990. Type: Article
Nov 1 1992
Organization of clustered files for consecutive retrieval
Deogun J., Raghavan V., Tsou T. ACM Transactions on Database Systems 9(4): 646-671, 1984. Type: Article
Jun 1 1985
more...

E-Mail This Printer-Friendly
Send Your Comments
Contact Us
Reproduction in whole or in part without permission is prohibited.   Copyright 1999-2024 ThinkLoud®
Terms of Use
| Privacy Policy