Web searching continues to be a popular topic, of both commercial and academic interest. This paper presents an in-depth mathematical analysis of the properties of Google’s PageRank algorithm, which relies on the topological structure of the Web to calculate the relative value of a given Web page. In addition to the rigorous mathematical treatment, the paper provides insights into the implications of these calculations, and how they impact stability, computability, convergence, maintenance, and vulnerability.
The PageRank algorithm exploits the inherent properties of the Markovian matrices that represent the Web’s structure to calculate individual page ranks. The authors go on to show that it is possible to perform an optimal computation, based on the limited precision requirements imposed by the PageRank formula. This result is an important factor for the scalability of the PageRank algorithm. However, as pointed out by Langville and Meyer , “if the holy grail of real-time personalized search is ever to be realized, then drastic speed improvements must be made...”
The authors also introduce the notion of energy, and demonstrate the mathematical properties of this concept through the use of circuit analysis on a subgraph or community of Web pages. This analysis leads to some interesting properties, including the energy loss introduced by “dangling” pages, and “outlinks” from a given community. Following this line, the authors also detail mechanisms by which the PageRank can be increased (or decreased), to influence the relative ranking of the pages within the ranking system.
Indeed, one of the authors’ findings is the fact that the energy of a given target community can be driven to grow linearly with the growth of a “promoting community.” This results in a promotion mechanism that is very difficult to detect.
Another nugget suggested by this research is that hyperlinks to outside pages should be placed in pages with a small PageRank, in addition to a large number of internal links. This property, as well as others discussed by the authors, can be exploited to retain as much energy as possible in a given community, and may provide some insights for site designers.
With this paper, the authors provide a valuable resource for those interested in analyzing the behavior of, and optimizing, large-scale ranking algorithms, such as the one used by Google. The math should keep those interested in the theory busy, while the conclusions provide some practical advice for practitioners in the field.