Computing Reviews
Today's Issue Hot Topics Search Browse Recommended My Account Log In
Review Help
Search
A large-scale study of the evolution of Web pages
Fetterly D., Manasse M., Najork M., Wiener J.  Software--Practice & Experience 34 (2): 213-237, 2004. Type: Article
Date Reviewed: Jul 7 2004

It is a commonplace observation that the Web changes rapidly; Cho and Garcia-Molina observed, in 2000, that 40 percent of Web pages changed weekly, and 23 percent of the pages in the “.com” domain changed daily [1].

This new, excellent, and detailed paper reports on a study where 151 million Web pages were sampled regularly, and the changes were studied. Rates of change are exaggerated by, first, artificial nonsense pages being generated by pornographers and spammers, and second, minor markup changes associated with session identifiers or advertising. Although 40 percent of Web pages in “.com” change weekly, less than 30 percent change text, rather than markup, and less than ten percent of the pages in “.edu” change each week.

This paper is intended to guide the design of crawling strategies for search engines, and is worth careful study for that purpose.

Reviewer:  Michael Lesk Review #: CR129854 (0501-0089)
1) Cho, J.; Garcia-Molina, H. The evolution of the Web and implications for an incremental crawler. In Proceedings of the 26th International Conference on Very Large Data Bases. Morgan Kaufmann Publishers Inc., Orlando, FL, 200–209.
Bookmark and Share
  Reviewer Selected
Editor Recommended
 
 
Search Process (H.3.3 ... )
 
 
HTML (I.7.2 ... )
 
 
Systems And Software (H.3.4 )
 
 
World Wide Web (WWW) (H.3.4 ... )
 
Would you recommend this review?
yes
no
Other reviews under "Search Process": Date
Query intent mining with multiple dimensions of web search data
Jiang D., Leung K., Ng W.  World Wide Web 19(3): 475-497, 2016. Type: Article
Nov 11 2016
Personalized concept-based search on the linked open data
Sah M., Wade V.  Journal of Web Semantics 3632-57, 2016. Type: Article
Jun 9 2016
The Mannheim Search Join Engine
Lehmberg O., Ritze D., Ristoski P., Meusel R., Paulheim H., Bizer C.  Journal of Web Semantics 35, Part 3, 159-166, 2015. Type: Article
Mar 14 2016
more...

E-Mail This Printer-Friendly
Send Your Comments
Contact Us
Reproduction in whole or in part without permission is prohibited.   Copyright © 2000-2017 ThinkLoud, Inc.
Terms of Use
| Privacy Policy