Computing Reviews

Extracting code clones for refactoring using combinations of clone metrics
Choi E., Yoshida N., Ishio T., Inoue K., Sano T.  IWSC 2011 (Proceedings of the 5th International Workshop on Software Clones, Waikiki, Honolulu, HI, May 23, 2011)7-13,2011.Type:Proceedings
Date Reviewed: 02/16/12

Many tools to detect code duplication are available. What remains lacking is an understanding of how best to use the results these tools provide. Anecdotal evidence suggests that high values for individual clone metrics are not always indicative of clone sets that can and should be refactored.

An argument is made for using combinations of clone metrics. For example, the size or length of a clone (LEN) can be used in conjunction with the number or population (POP) of clones in a clone set. LEN can be used to eliminate small-sized clone sets with high POP values. A study is reported in which a developer made refactoring decisions on various kinds of clone sets. The first three kinds of sets were based on individual metrics: LEN, ratio of nonrepeated token sequences (RNR), and POP. The second three kinds of sets were based on paired combinations of LEN, RNR, and POP. A seventh kind of set combined all three of these metrics. Tables 2 and 3 clearly show that combined metrics outperform individual metrics in identifying worthwhile refactoring candidates, often by a factor of two.

A rule of thumb for tools providing suggestions or warnings is that the false positive rate should be less than 50 percent. According to Table 2, the individual metric LEN satisfies this criterion. Some readers would disagree with the authors’ conclusion that the use of LEN alone is unacceptable. Despite this criticism, I recommend this paper to those involved in software quality assurance and those researching code duplication.

Reviewer:  Andy Brooks Review #: CR139872 (1208-0824)

Reproduction in whole or in part without permission is prohibited.   Copyright 2024™
Terms of Use
| Privacy Policy