A detailed analysis of the use of correlation and entropy measures to examine the relationship between data and control variables, as accessed by different programs, is presented in this paper. The idea is to see if, by exhaustively testing only certain segments of a program that frequently reference the same data, we can still obtain reasonably trustworthy software. Unfortunately, the answer is shown to be no.
The following conclusions are presented: “dynamic program dependence is not necessarily indicative of actual information flow in real programs”; “the length of an information flow is not indicative of its strength (importance)”; and “long flows are not generally less significant than short flows and should not be dismissed without further (extensive) examination.”
The paper is scholarly and seems complete, with appropriate cautions about the few programs examined. The authors use an interesting three-way triangulation to bind their results. Unless you are fascinated by the insightful approach first laid out by software theory giants Dorothy and Peter Denning, accept the conclusion that you must examine the entire program to make sure it works; there are no shortcuts.
The authors write clearly and their introduction and conclusion sections are erudite and wonderful. If you are working in this software engineering field, the paper is worth studying. If you are a practitioner who is looking for ways to skip software analysis and testing, you will be disappointed. It’s too bad that Masri and Podgurski’s truly professional analysis has not yielded breakthrough results, but they confirm what practitioners already know: namely, that it is vital to test and examine the test cases derived from the requirements and the use case scenarios. The nature of software execution remains chaotic and small errors in infrequently referenced shared data or control structures can lead to catastrophic results.