Computing Reviews

Automatically learning semantic features for defect prediction
Wang S., Liu T., Tan L.  ICSE 2016 (Proceedings of the 38th International Conference on Software Engineering, Austin, TX, May 14-22, 2016)297-308,2016.Type:Proceedings
Date Reviewed: 02/25/20

Have data? Try teaching a machine. Any problem space that has a large amount of data is just asking for someone to apply machine learning to find a solution. Defect detection is one such domain. With an extensive number of open-source projects, coupled with publicly available revision histories, it is unsurprising that there is such a large proliferation of machine learning papers on the topic.

The authors take Java source code and create a list of tokens for each file. These lists of tokens are then assigned to discrete numerical values that feed into a deep belief network. The network is then trained to identify semantic details from the token list that form the basis of deciding whether a file will contain defects. What I found very surprising is that the authors did not encode scope delimiters into the token list.

Having trained their network of nodes, the authors compare the predictive power of their approach against classification algorithms that base their decisions on traditional features of the code, such as lines of code, operand and operator count, number of methods, position of class in inheritance hierarchy, McCabe complexity, and others. For both, within project defect detection and (with some changes to their algorithm) for cross-project defect detection, the authors show that their approach improves on existing algorithms’ defect detection capabilities. The paper unfortunately fails to provide any theoretical justification for why any particular token list is more likely to contain a defect than another slightly different list of tokens.

Reviewer:  Bernard Kuc Review #: CR146905 (2006-0136)

Reproduction in whole or in part without permission is prohibited.   Copyright 2024 ComputingReviews.com™
Terms of Use
| Privacy Policy