Source code analysis is a very well-known and well-developed area that deals with analyzing programs written in high-level languages to detect the presence of defects and to extract useful information about the program.
Balakrishnan and Reps discuss the limitations of source code analysis when it comes to looking for various security vulnerabilities that often occur or are revealed only at the machine code level; also, the source code is not always available for analysis, especially for third-party-developed software. These considerations have motivated the authors to look at the static analysis of machine code with little additional information. Machine-level static analysis is very challenging, and involves extracting simple information such as the variables used by the program, and the control flow and data flow dependency, which deals with memory addresses. The analysis engine has to identify the correct chunks of address spaces as variables and perform the analysis.
The large body of work in this area carried out by Reps and some of his students over the last several years is presented in this paper. As a result, it is voluminous and rich in content. It should be very useful to researchers interested in static analysis as applied to machine code. A number of algorithms described in the paper include the extraction of intermediate representation (IR) of the code, which is very similar to what is obtained from a high-level language. Value-set analysis is an important and novel technique proposed by the authors that would be very useful in performing machine code analysis.
The paper is very extensive and describes the various algorithms used for the analysis clearly. It is very well written and can serve as an important resource for people interested in the static analysis of machine code. It contains a few examples and some details of the implementation of algorithms that have been used in the extension of their tool, CodeSurfer, for executables.