There are at least three core issues that illustrate the need for software engineers and architects to better understand reverse engineering efforts. First, the number of software solutions continues to increase. Second, technology selection, variety, and evolution change over time. Third, development teams involved in both solution creation and technology selection fluctuate in membership as new opportunities arise. In short, software solutions are built with increasingly sophisticated techniques and technologies, while the teams that build them rarely stay united. Software engineers and architects, therefore, grapple with the question of where to begin when reverse engineering an existing solution. Girba, Ducasse, and Lanza propose an approach to reverse engineering software that looks at past class-level code changes to help identify where to begin the early stages of the reverse engineering process. In addition to the academically inclined, software engineers, architects, and development managers are all likely readers of this paper.
The authors purport that, by looking at classes with sizable changes, one finds likely areas of future change and thus identifies appropriate starting points for reverse engineering. The Yesterday’s Weather system leverages the historical tracking of code over time. To help illustrate this, the authors introduce a simplified evolution matrix, where the x-axis denotes version and the y-axis represents class. The matrix describes the change of classes over time marked by version. Having established the historical record, the authors propose that Yesterday’s Weather should use three formulas to identify and predict the classes most likely to change in the near future: evolution of number of methods, latest evolution of number of methods, and earliest evolution of number of methods. Yesterday’s Weather results in a percent value indicating the likelihood that the identified classes are those most likely to change, and are thus the best places to begin reverse engineering.
To support this premise, the authors have defined and implemented these formulas as part of a system that analyzes method-level code changes over the life of development. They test their solution on two different projects, Jun and CodeCrawler, adequately demonstrating the approach. However, they suggest that the only way to validate the effectiveness of Yesterday’s Weather would be to run the system over a longer period of time with the ability to verify the changes that actually occur next. Compared to other approaches cited by the authors, their system returns finer-grained results based on semantic code changes.
The key to their approach is agreement with the general premise that classes likely to change in the future are more worthwhile to examine than other parts of the system’s code. However, is their approach practically any better than alternatives? One of the arguments that the software evolution and reverse engineering communities have is at which level it makes sense to understand the nature of software--low-level code semantics or higher-level engineering [1]. Another approach is to identify patterns for reengineering as proposed by Stevens and Pooley [2]. Other approaches include code visualization systems [3,4,5]. Regardless, all of these solutions put the burden on human interpreters, evaluating the results of the tools, to help guide the process of understanding an existing software solution.
The whole topic begs the question of why the industry has come so far with such little documentation about software, despite the fact that it is perhaps the single easiest way to reduce the burden of software reverse engineering. I am not convinced that looking at a historical summary identifies the best places to begin reverse engineering. However, I do think that this strategy is probably as good as many others are. If a human is trying to determine the best place to begin, I suggest that understanding a holistic, higher-level design view of the system will provide guidance as adequate as, or better than, any other solution. Nonetheless, Girba, Ducasse, and Lanza offer a rational alternative approach to addressing where to begin reverse engineering, one that is sure to pique the interest of readers.