Computing Reviews
Today's Issue Hot Topics Search Browse Recommended My Account Log In
Review Help
A large-scale study of the evolution of Web pages
Fetterly D., Manasse M., Najork M., Wiener J. Software--Practice & Experience34 (2):213-237,2004.Type:Article
Date Reviewed: Jul 7 2004

It is a commonplace observation that the Web changes rapidly; Cho and Garcia-Molina observed, in 2000, that 40 percent of Web pages changed weekly, and 23 percent of the pages in the “.com” domain changed daily [1].

This new, excellent, and detailed paper reports on a study where 151 million Web pages were sampled regularly, and the changes were studied. Rates of change are exaggerated by, first, artificial nonsense pages being generated by pornographers and spammers, and second, minor markup changes associated with session identifiers or advertising. Although 40 percent of Web pages in “.com” change weekly, less than 30 percent change text, rather than markup, and less than ten percent of the pages in “.edu” change each week.

This paper is intended to guide the design of crawling strategies for search engines, and is worth careful study for that purpose.

Reviewer:  Michael Lesk Review #: CR129854 (0501-0089)
1) Cho, J.; Garcia-Molina, H. The evolution of the Web and implications for an incremental crawler. In Proceedings of the 26th International Conference on Very Large Data Bases. Morgan Kaufmann Publishers Inc., Orlando, FL, 200–209.
Bookmark and Share
  Reviewer Selected
Editor Recommended
Search Process (H.3.3 ... )
HTML (I.7.2 ... )
World Wide Web (WWW) (H.3.4 ... )
Systems And Software (H.3.4 )
Would you recommend this review?
Other reviews under "Search Process": Date
Search improvement via automatic query reformulation
Gauch S., Smith J. ACM Transactions on Information Systems 9(3): 249-280, 1991. Type: Article
Jul 1 1993
Criteria for the selection of search strategies in best-match document-retrieval systems
McCall F., Willett P. International Journal of Man-Machine Studies 25(3): 317-326, 1986. Type: Article
Oct 1 1987
The use of adaptive mechanisms for selection of search strategies in document retrieval systems
Croft W. (ed), Thompson R.  Research and development in information retrieval (, King’s College, Cambridge,1101984. Type: Proceedings
Aug 1 1985

E-Mail This Printer-Friendly
Send Your Comments
Contact Us
Reproduction in whole or in part without permission is prohibited.   Copyright 1999-2024 ThinkLoud®
Terms of Use
| Privacy Policy