Computing Reviews, the leading online review service for computing literature.

Search

Data mining in software engineering
Halkidi M., Spinellis D., Tsatsaronis G., Vazirgiannis M. Intelligent Data Analysis15 (3):413-441,2011.Type:Article

Date Reviewed: Apr 20 2012

Today, huge volumes of data about software development are available from a variety of sources, including organizational databases, open-source software project metadata, and other software engineering repositories, mailing lists, discussion forums, and newsletters. Data mining provides the capability to analyze this data and transform it into valuable information. This excellent paper deals with the mining of data produced during the software development life cycle and stored in software repositories. The authors introduce the concepts, approaches, tasks, and techniques of data mining, and the challenges of mining software repositories. The data mining approaches described are clustering, classification, frequent pattern mining and association rules, data characterization and summarization, change and deviation detection, and text mining. The paper classifies, describes, and explains the different types of software engineering data. These include documentation, software configuration management data, source code, compiled code, execution traces, problem tracking and bug reports, and mailing lists. The authors map the various data mining approaches and techniques to the software engineering tasks for which they are helpful. The appropriate mining approaches, the input data and data analysis results for software development, testing, debugging, maintenance, and reuse are discussed and summarized. The authors conclude by discussing the challenges in mining software engineering repositories, which in their opinion require further research. This paper will be useful to anyone who is connected with the development, testing, debugging, maintenance, and reuse of software products--from programmers to managers. The knowledge obtained from mining software repositories and other related sources will help such an audience better understand the development process, and thereby help them refine it. It will also help them make the software life cycle processes more efficient.

Reviewer: Alexis Leon	Review #: CR140076 (1209-0945)

Data Mining (H.2.8 ... )

General (D.2.0 )

Would you recommend this review?

yes

Other reviews under "Data Mining":	Date

Feature selection and effective classifiers Deogun J. (ed), Choubey S., Raghavan V. (ed), Sever H. (ed) Journal of the American Society for Information Science 49(5): 423-434, 1998. Type: Article	May 1 1999

Rule induction with extension matrices Wu X. (ed) Journal of the American Society for Information Science 49(5): 435-454, 1998. Type: Article	Jul 1 1998

Predictive data mining Weiss S., Indurkhya N., Morgan Kaufmann Publishers Inc., San Francisco, CA, 1998. Type: Book (9781558604032)	Feb 1 1999

more...

Reproduction in whole or in part without permission is prohibited. Copyright 1999-2024 ThinkLoud^®
Terms of Use | Privacy Policy