Computing Reviews, the leading online review service for computing literature.

Search

CCFinder: a multilinguistic token-based code clone detection system for large scale source code
Kamiya T., Kusumoto S., Inoue K. IEEE Transactions on Software Engineering28 (7):654-670,2002.Type:Article

Date Reviewed: Apr 28 2003

A token-based code clone detection system called CCFinder is described in this paper. A clone pair is a pair of identical or similar code portions that could be merged into a single routine to reduce the maintenance burden. The paper describes a token-by-token matching algorithm that employs several optimization techniques, making analysis of industrial strength software practical. Language dependency is restricted: developing the Java sub-component took only two person days. Several metrics are developed, and results are presented for the source code for Java development kit 1.3.0, FreeBSD 4.0, NetBSD 1.5, and Linux 2.4.0. Many very similar source files are reported to be found in javax/swing/*.java. The paper contains a convincing visualization of strong similarities between FreeBSD and NetBSD (over 25,000 clone pairs). Between FreeBSD and Linux, 252 of 1,091 clone pairs (23 percent) were detected across line breaks, indicating how many clones line-by-line matching algorithms can miss. The transformation, optimization, and other implementation techniques employed by CCFinder implicitly define similarity, and what a clone pair is. The paper shows the dramatic effects of disabling various techniques on the numbers of clone pairs detected. However, these results are not related to the metric values for clones, so it remains unclear which set of techniques are optimal. The paper does not report application of the tool to itself. Considerable insights might have emerged had such an investigation been undertaken, and had CCFinder code undergone refactoring to merge clone pairs. However, this paper represents a major contribution to code clone detection, and is highly recommended to specialists in software maintenance.

Reviewer: Andy Brooks	Review #: CR127548 (0308-0784)

Reuse Models (D.2.13 ... )

Restructuring, Reverse Engineering, And Reengineering (D.2.7 ... )

Text Processing (I.5.4 ... )

Distribution, Maintenance, and Enhancement (D.2.7 )

Would you recommend this review?

yes

Other reviews under "Reuse Models":	Date

Retrieval of software components using a distributed web system: unnamed functions in C++ Behle A., Kirchhof M., Nagl M., Welter R. Journal of Network and Computer Applications 25(3): 197-222, 2002. Type: Article	Jul 23 2003

Engineering software reuse for on-board embedded real-time systems: unnamed functions in C++ Vardanega T., Caspersen G. Software--Practice & Experience 32(3): 233-264, 2002. Type: Article	Jul 22 2003

Deadlock-free software architectures for COM/DCOM applications Inverardi P., Tivoli M. Journal of Systems and Software 65(3): 173-183, 2003. Type: Article	Feb 23 2004

more...

Reproduction in whole or in part without permission is prohibited. Copyright 1999-2024 ThinkLoud^®
Terms of Use | Privacy Policy