Computing Reviews
Today's Issue Hot Topics Search Browse Recommended My Account Log In
Review Help
A large-scale study of the evolution of Web pages
Fetterly D., Manasse M., Najork M., Wiener J.  Software--Practice & Experience 34 (2): 213-237, 2004. Type: Article
Date Reviewed: Jul 7 2004

It is a commonplace observation that the Web changes rapidly; Cho and Garcia-Molina observed, in 2000, that 40 percent of Web pages changed weekly, and 23 percent of the pages in the “.com” domain changed daily [1].

This new, excellent, and detailed paper reports on a study where 151 million Web pages were sampled regularly, and the changes were studied. Rates of change are exaggerated by, first, artificial nonsense pages being generated by pornographers and spammers, and second, minor markup changes associated with session identifiers or advertising. Although 40 percent of Web pages in “.com” change weekly, less than 30 percent change text, rather than markup, and less than ten percent of the pages in “.edu” change each week.

This paper is intended to guide the design of crawling strategies for search engines, and is worth careful study for that purpose.

Reviewer:  Michael Lesk Review #: CR129854 (0501-0089)
1) Cho, J.; Garcia-Molina, H. The evolution of the Web and implications for an incremental crawler. In Proceedings of the 26th International Conference on Very Large Data Bases. Morgan Kaufmann Publishers Inc., Orlando, FL, 200–209.
Bookmark and Share
  Reviewer Selected
Editor Recommended
Search Process (H.3.3 ... )
HTML (I.7.2 ... )
World Wide Web (WWW) (H.3.4 ... )
Systems And Software (H.3.4 )
Would you recommend this review?
Other reviews under "Search Process": Date
 Exact algorithms via monotone local search
Fomin F., Gaspers S., Lokshtanov D., Saurabh S.  Journal of the ACM 66(2): 1-23, 2019. Type: Article
Apr 26 2021
Query intent mining with multiple dimensions of web search data
Jiang D., Leung K., Ng W.  World Wide Web 19(3): 475-497, 2016. Type: Article
Nov 11 2016
Personalized concept-based search on the linked open data
Sah M., Wade V.  Journal of Web Semantics 3632-57, 2016. Type: Article
Jun 9 2016

E-Mail This Printer-Friendly
Send Your Comments
Contact Us
Reproduction in whole or in part without permission is prohibited.   Copyright © 2000-2022 ThinkLoud, Inc.
Terms of Use
| Privacy Policy