It is a commonplace observation that the Web changes rapidly; Cho and Garcia-Molina observed, in 2000, that 40 percent of Web pages changed weekly, and 23 percent of the pages in the “.com” domain changed daily .
This new, excellent, and detailed paper reports on a study where 151 million Web pages were sampled regularly, and the changes were studied. Rates of change are exaggerated by, first, artificial nonsense pages being generated by pornographers and spammers, and second, minor markup changes associated with session identifiers or advertising. Although 40 percent of Web pages in “.com” change weekly, less than 30 percent change text, rather than markup, and less than ten percent of the pages in “.edu” change each week.
This paper is intended to guide the design of crawling strategies for search engines, and is worth careful study for that purpose.