Computing Reviews
Today's Issue Hot Topics Search Browse Recommended My Account Log In
Review Help
Search
HITS algorithm improvement using semantic text portion
Hung B., Otsubo M., Hijikata Y., Nishida S. Web Intelligence and Agent Systems8 (2):149-164,2010.Type:Article
Date Reviewed: Mar 16 2011

It is a good idea for all Internet users to be aware of the most common approaches to ranking Web pages, databases, and digital documents.

The most popular ranking algorithm is Google’s PageRank, which uses the frequency of the incoming links in relation to the outgoing links for a particular Web page (this is also called a random walk, made by a user clicking the links that should lead directly to the information being searched for). Another such algorithm is Kleinberg’s hypertext-induced topic selection (HITS), which determines a correspondence between pages with many links (hubs) and those linked by many other pages (authorities). Assuming that “hub pages link to many good pages,” the correspondence changes with each user’s random walk. Such an approach relies on the judgment of humans who walk randomly from one page to another, making the pages visited most often (authority pages) “good pages,” as the authors call them.

Other researchers have worked in the past on this algorithm. For example, Li et al. [1] proposed an improvement to a HITS-based algorithm for Web documents by combining the weighting of the incoming links with relevance methods, such as the vector space model, Okapi similarity, cover density ranking, and three-level scoring. As long as weighting increased the ranking precision significantly, none of the combinations with the methods mentioned above made a dramatic improvement.

Similarly, the authors’ experiment aims to achieve a better HITS performance, specifically by tackling the topic drift problem. In this problem, the top-ranking pages--both authorities and hubs--are not necessarily those most relevant to the query, but are often the most commonly retrieved by search engines such as Google or Yahoo! Thus, they follow Chakrabarti’s method, which determines the information relevance (authority) based on a query matching any of 50 words around the anchor link. Their contribution is the extraction of text fragments that have some semantic properties of the anchor link. The evaluators were invited to assess the semantics of 50 official and 50 personal target pages from the Open Directory (http://dmoz.org), located in paragraphs, tables, lists, and DIV objects. Thirteen pooling methods were used to compare ten hubs and authority pages each. Despite the fact that most of the results achieved over 90 percent accuracy, ten queries is hardly a sufficiently large sample, particularly when the measures are based on the judgment of the authors’ students, acting as evaluators.

Despite some sentence repetitions and an imprecise description of the raw data usage, the paper is quite well written and interesting, especially for students and small text-based search engine developers. Most of the references provided come from the mid- to late-1990s, so the work is not really novel; however, the topic itself is interesting. The experiment is presented in detail, which can serve as a good example to other researchers, particularly those working in this specific field.

Reviewer:  Jolanta Mizera-Pietraszko Review #: CR138907 (1109-0957)
1) Li, L.; Shang, Y.; Zhang, W. Improvement of HITS-based algorithms on Web documents. In Proc. WWW 2002 ACM, 2002, 527–535.
Bookmark and Share
  Featured Reviewer  
 
Algorithm Design And Analysis (G.4 ... )
 
Would you recommend this review?
yes
no
Other reviews under "Algorithm Design And Analysis": Date
Numerical recipes
Sprott J., Cambridge University Press, New York, NY, 1991. Type: Book (9780521406895)
Dec 1 1992
An interactive calculus theorem-prover for continuity properties
Suppes P., Takahashi S. Journal of Symbolic Computation 7(6): 573-590, 1989. Type: Article
Aug 1 1990
The numerical methods programming projects book
Grandine T., Oxford University Press, Inc., New York, NY, 1990. Type: Book (9789780198533870)
Mar 1 1991
more...

E-Mail This Printer-Friendly
Send Your Comments
Contact Us
Reproduction in whole or in part without permission is prohibited.   Copyright 1999-2024 ThinkLoud®
Terms of Use
| Privacy Policy