Computing Reviews

A large-scale study of the evolution of Web pages
Fetterly D., Manasse M., Najork M., Wiener J. Software--Practice & Experience34(2):213-237,2004.Type:Article
Date Reviewed: 07/07/04

It is a commonplace observation that the Web changes rapidly; Cho and Garcia-Molina observed, in 2000, that 40 percent of Web pages changed weekly, and 23 percent of the pages in the “.com” domain changed daily [1].

This new, excellent, and detailed paper reports on a study where 151 million Web pages were sampled regularly, and the changes were studied. Rates of change are exaggerated by, first, artificial nonsense pages being generated by pornographers and spammers, and second, minor markup changes associated with session identifiers or advertising. Although 40 percent of Web pages in “.com” change weekly, less than 30 percent change text, rather than markup, and less than ten percent of the pages in “.edu” change each week.

This paper is intended to guide the design of crawling strategies for search engines, and is worth careful study for that purpose.


1)

Cho, J.; Garcia-Molina, H. The evolution of the Web and implications for an incremental crawler. In Proceedings of the 26th International Conference on Very Large Data Bases. Morgan Kaufmann Publishers Inc., Orlando, FL, 200–209.

Reviewer:  Michael Lesk Review #: CR129854 (0501-0089)

Reproduction in whole or in part without permission is prohibited.   Copyright 2024 ComputingReviews.com™
Terms of Use
| Privacy Policy