Hill et al. present Dora as a tool that does a better job at identifying relevant code parts when performing software maintenance. The key idea in Dora is to prune away structurally connected but otherwise irrelevant code. Method relevance is scored by employing informational retrieval techniques across natural-language queries by the user and terms appearing in methods. Figure 3 suggests that Dora’s precision and recall compare favorably with other techniques.
The scope of a single maintenance request can vary from a single statement to the entire system. An explicit model of a maintenance request is, however, absent in this work. Without such a model, method relevance cannot be judged safely, either in the process of building a training set to instantiate Dora’s probability model of method relevance, or in the process of experimental comparison between Dora and other techniques. Ironically, Table 1 clearly demonstrates the difficulty of assessing relevance in the absence of an underlying model. The table shows developers disagreeing over the set of methods relevant to eight concerns, where concerns are informally described as high-level ideas or features implemented in code. While Hill et al. do not view this disagreement as a serious threat to the validity of their experiment, others would view this threat as fatal.
Although many would dismiss outright the evaluation section of this paper, the key idea in Dora is sound, and the need for a tool combining structural and lexical identifier information is well argued. As such, I recommend this paper to the software maintenance community.