Computing Reviews
Today's Issue Hot Topics Search Browse Recommended My Account Log In
Review Help
Search
On graph-based name disambiguation
Fan X., Wang J., Pu X., Zhou L., Lv B. Journal of Data and Information Quality2 (2):1-23,2011.Type:Article
Date Reviewed: Mar 19 2012

Researchers often submit author names to digital library systems to search for papers. However, in the search results, papers by different authors with identical names are mixed together. While existing work on this problem often employs several attributes, such as the title, abstract, and author affiliation, Fan et al. developed GHOST, a system that employs co-authorship attributes only and groups the papers by the same authors into identical clusters.

GHOST consists of five steps: graph construction, valid path selection, similarity computation, name clustering, and user feedback. In the similarity computation step, the authors define similarity--“the confidence of two nodes corresponding to the same author”--and then utilize the computation scheme of the total resistance of a parallel circuit in Ohm’s law. This idea is unique and will be helpful for researchers who work on social network analysis. In the name clustering step, the authors employ affinity propagation, which almost provides the same accuracy as classical agglomerative hierarchical clustering. Thus, they need to further improve the affinity propagation scheme. They evaluate the clustering scheme using precision, recall, and F-score; however, they also need to evaluate it with measures such as purity, inverse purity, and the B-cubed clustering measure, which are often used in clustering. In addition, this is the first work that employs user feedback in the name disambiguation problem. Their achieved runtime (0.5 to 2 seconds) to disambiguate an author name is reasonable. Also, GHOST was applied in CDBLP (http://www.cdblp.cn), a real database system.

The authors’ finding that the affiliation information of each author is important will be helpful to people who develop disambiguation systems in search engines.

Reviewer:  Kazunari Sugiyama Review #: CR139983 (1208-0835)
Bookmark and Share
 
Data Mining (H.2.8 ... )
 
 
Digital Libraries (H.3.7 )
 
 
Miscellaneous (H.2.m )
 
Would you recommend this review?
yes
no
Other reviews under "Data Mining": Date
Feature selection and effective classifiers
Deogun J. (ed), Choubey S., Raghavan V. (ed), Sever H. (ed) Journal of the American Society for Information Science 49(5): 423-434, 1998. Type: Article
May 1 1999
Rule induction with extension matrices
Wu X. (ed) Journal of the American Society for Information Science 49(5): 435-454, 1998. Type: Article
Jul 1 1998
Predictive data mining
Weiss S., Indurkhya N., Morgan Kaufmann Publishers Inc., San Francisco, CA, 1998. Type: Book (9781558604032)
Feb 1 1999
more...

E-Mail This Printer-Friendly
Send Your Comments
Contact Us
Reproduction in whole or in part without permission is prohibited.   Copyright 1999-2024 ThinkLoud®
Terms of Use
| Privacy Policy