Computing Reviews
Today's Issue Hot Topics Search Browse Recommended My Account Log In
Review Help
Search
Google’s PageRank and beyond : the science of search engine rankings
Langville A., Meyer C., PRINCETON UNIVERSITY PRESS, Princeton, NJ, 2006. 234 pp. Type: Book (9780691122021)
Date Reviewed: Feb 5 2007

Two mathematicians’ view of what is most interesting about Web search engines is presented in this book. Because the authors teach linear algebra, they are most interested in the matrix formulations of algorithms that determine which of a search engine’s hits are most important and therefore deserve to be presented to the user first. For context, the authors begin with 33 pages of general discussion of search engines. They devote 31 of these pages to setting up an abstract framework before saying what they are talking about.

Fortunately, what they are talking about is fascinating and useful. “In short, PageRank’s thesis is that a Web page is important if it is pointed to by other important pages. Sounds circular, doesn’t it? We will see in chapter 4 that this can be formalized in a beautifully simple mathematical formula.” Three pages later, chapter 4 elaborates as follows: “Many of the mathematical terms in each chapter are explained in the mathematics chapter (chapter 15).” Chapter 15 is 47 pages long and may well be sufficiently complete to make chapter 4 sound simple to a person who has had several courses in linear algebra. I have had very few, and none recently, so I cannot tell. In any case, the chapters between 4 and 15 become successively more mathematical. I am also not competent to judge at what level, if any, this book is suitable as a linear algebra textbook.

For those who do not think that even chapter 4 is simple, this book’s saving grace is that it is priced low enough, around $27, to purchase solely for its nonmathematical parts. Of course, because of the authors’ mathematical viewpoint, a reader may need to fill in an appropriate computer science emphasis. For example, the authors correctly note that the grist for Google’s PageRank mill is provided at no cost to Google by the hyperlinks inserted by developers of Web pages. However, the authors do not note that PageRank is therefore the premiere instance of the defining characteristic of a Web 2.0 application, extracting most of the application’s enormous value as a free byproduct of others’ day-to-day Web activities. Similarly, on page 97, the authors unexcitedly note, “In April 2002, Google released its Web Application Programming Interface (API). Google suddenly had thousands of free employees, creating new services and applications of Google and offering to give them back to the public.”

The book was clearly edited in expectation of a large sales volume, with internal references specified by page numbers rather than the more common but less convenient section numbers. It also appears to suffer from some overly aggressive copyediting. For instance, on page 112, an editor who apparently has never heard of a GB seems to have changed the amount of storage space that Gmail gives each user to “1KB.”

The book provides many entertaining opportunities to see ourselves as others see us. An aside in section 13.5 reports on an unidentified author’s amazed introduction to blogs in general, and Slashdot in particular. A footnote in section 4.1 mentions that research into the history of search engines turned up the pun embodied in the fact that Larry Page patented PageRank, which ranks Web pages. Page 4 illustrates Google’s wide appeal by quoting Matt Groening, creator of The Simpsons; Michael Powell, then chair of the FCC; and Gary Trudeau, creator of Doonesbury.

The authors obviously enjoyed doing research into the history of search engines. Their joy shows through in their write-up of the Google bomb, in which many pranksters used the anchor text “talentless hack” for their hot links to Andy Pressman’s page. This fooled Google into indexing Andy’s page under “talentless” and “hack,” even though those words never appear on his page. Diligent research determined that there are search engines other than Google. Chapter 11 discusses the HITS algorithm used by Teoma. Getting to the point unusually quickly, the authors summarize HITS as follows: “Good authorities are pointed to by good hubs and good hubs point to good authorities.”

Computer scientists as well as mathematicians can be entertained by examples of how mathematics is applied. Page 36 notes that

none of their [Brin and Page’s] papers used the phrase ‘Markov Chain,’ not even once. Markov Chain researchers have excitedly and steadily jumped on the PageRank bandwagon, eager to work on what some call the grand application of Markov chains. Brin and Page use the notion of a random surfer.

This book’s audience is not entirely clear; I hope that I have given enough information for you to decide whether it includes you.

Reviewer:  A. Kellerman Review #: CR133887 (0801-0034)
Bookmark and Share
  Reviewer Selected
 
 
Information Search And Retrieval (H.3.3 )
 
 
Online Information Services (H.3.5 )
 
Would you recommend this review?
yes
no
Other reviews under "Information Search And Retrieval": Date
Nested transactions in a combined IRS-DBMS architecture
Schek H. (ed)  Research and development in information retrieval (, King’s College, Cambridge,701984. Type: Proceedings
Nov 1 1985
An integrated fact/document information system for office automation
Ozkarahan E., Can F. (ed) Information Technology Research Development Applications 3(3): 142-156, 1984. Type: Article
Oct 1 1985
Access methods for text
Faloutsos C. ACM Computing Surveys 17(1): 49-74, 1985. Type: Article
Jan 1 1986
more...

E-Mail This Printer-Friendly
Send Your Comments
Contact Us
Reproduction in whole or in part without permission is prohibited.   Copyright 1999-2024 ThinkLoud, Inc.®
Terms of Use
| Privacy Policy