With so many potential users of software packages linked by high-speed networking, it is now feasible to obtain profiling information from a potentially huge number of user sessions. Such information could be used, for example, to improve test suites, to show how software is actually used in practice, and to indicate changes in the pattern of use. However, as well as confidentiality and efficiency issues, the naive use of profiling is quite likely to produce overwhelming amounts of data. This paper reports on the results of an experiment to measure the effectiveness of various profiling strategies, applied to the Pine mailer. This is a substantial package, consisting of 1,373 functions, and totaling 155,000 lines of code. The experiment involved 30 users, and data was collected from 1,193 user sessions.
The analysis of the data provides some insight into a number of practical areas that might benefit from profiling in this way. For example, how effective would a test suite derived from field data be at fault detection and statement/function coverage, and what are the effects of various algorithms designed to reduce the amount of profiling data collected and transferred?
A variety of data reduction methods are described, and the effects of these algorithms on the amount of useful information obtained in a number of different areas are reported. Given the potential of data collected in this way to improve the quality of widely used software products, this paper provides a good introduction to the problems involved, and quantifies the effects of several previously published approaches.
Since the experiment was conducted on a substantial piece of software, the paper will be of considerable interest to any software engineer considering deploying these techniques in anger, as well as to readers interested in how profiling data may be used to improve the software engineering process.