Computing Reviews
Today's Issue Hot Topics Search Browse Recommended My Account Log In
Review Help
Search
CCFinder: a multilinguistic token-based code clone detection system for large scale source code
Kamiya T., Kusumoto S., Inoue K. IEEE Transactions on Software Engineering28 (7):654-670,2002.Type:Article
Date Reviewed: Apr 28 2003

A token-based code clone detection system called CCFinder is described in this paper. A clone pair is a pair of identical or similar code portions that could be merged into a single routine to reduce the maintenance burden. The paper describes a token-by-token matching algorithm that employs several optimization techniques, making analysis of industrial strength software practical. Language dependency is restricted: developing the Java sub-component took only two person days. Several metrics are developed, and results are presented for the source code for Java development kit 1.3.0, FreeBSD 4.0, NetBSD 1.5, and Linux 2.4.0.

Many very similar source files are reported to be found in javax/swing/*.java. The paper contains a convincing visualization of strong similarities between FreeBSD and NetBSD (over 25,000 clone pairs). Between FreeBSD and Linux, 252 of 1,091 clone pairs (23 percent) were detected across line breaks, indicating how many clones line-by-line matching algorithms can miss.

The transformation, optimization, and other implementation techniques employed by CCFinder implicitly define similarity, and what a clone pair is. The paper shows the dramatic effects of disabling various techniques on the numbers of clone pairs detected. However, these results are not related to the metric values for clones, so it remains unclear which set of techniques are optimal.

The paper does not report application of the tool to itself. Considerable insights might have emerged had such an investigation been undertaken, and had CCFinder code undergone refactoring to merge clone pairs. However, this paper represents a major contribution to code clone detection, and is highly recommended to specialists in software maintenance.

Reviewer:  Andy Brooks Review #: CR127548 (0308-0784)
Bookmark and Share
  Featured Reviewer  
 
Reuse Models (D.2.13 ... )
 
 
Restructuring, Reverse Engineering, And Reengineering (D.2.7 ... )
 
 
Text Processing (I.5.4 ... )
 
 
Distribution, Maintenance, and Enhancement (D.2.7 )
 
Would you recommend this review?
yes
no
Other reviews under "Reuse Models": Date
Retrieval of software components using a distributed web system: unnamed functions in C++
Behle A., Kirchhof M., Nagl M., Welter R. Journal of Network and Computer Applications 25(3): 197-222, 2002. Type: Article
Jul 23 2003
Engineering software reuse for on-board embedded real-time systems: unnamed functions in C++
Vardanega T., Caspersen G. Software--Practice & Experience 32(3): 233-264, 2002. Type: Article
Jul 22 2003
Deadlock-free software architectures for COM/DCOM applications
Inverardi P., Tivoli M. Journal of Systems and Software 65(3): 173-183, 2003. Type: Article
Feb 23 2004
more...

E-Mail This Printer-Friendly
Send Your Comments
Contact Us
Reproduction in whole or in part without permission is prohibited.   Copyright 1999-2024 ThinkLoud®
Terms of Use
| Privacy Policy