Computing Reviews, the leading online review service for computing literature.

Search

Enriching documents with examples: a corpus mining approach
Kim J., Lee S., Hwang S., Kim S. ACM Transactions on Information Systems31 (1):1-27,2013.Type:Article

Date Reviewed: Aug 13 2013

For a compiler fan, it’s great seeing a system that parses code into abstract syntax trees (ASTs) to find, relate, and then generate semantically relevant and illustrative code samples. This paper describes a new data mining approach, called eXoaDocs, which automatically generates and relates these code samples to application programming interface (API) program descriptions, resulting in enriched example-based programming documents. By automatically creating semantically relevant code samples, the authors’ system omits irrelevant code, but also organizes based on various criteria such as representativeness, frequency, conciseness, and correctness. Their browser also supports popularity ranking to help end users find the best code examples. This extensive paper provides detailed descriptions of their algorithms for organizing code samples, while contrasting clustering, ranking, and hybrid approaches. Although other successful documentation approaches rely on manually developed, high-quality examples, when dealing with massive magnitudes of code, an automation approach would be valuable. eXoaDocs is compared to other code search engines and documentation approaches. As a test, it was run on the extensive Java Development Kit (JDK) 5 source. Illustrative code documentation samples were generated for 75 percent of the code (27,000 methods). In contrast, the traditional Java documents (JavaDocs) toolset only generated illustrative samples for 2 percent of the same code. To validate their approach, a user study was conducted where numerous students were given sample problems to program. Those that had access to the eXoaDocs semantic examples had measurable productivity gains. The authors also nicely identify areas where the validity of their approach could be threatened, but it looks like this approach could play a role in future code documentation and browsing tools.

Reviewer: Scott Moody	Review #: CR141459 (1310-0928)

Data Mining (H.2.8 ... )

Search Process (H.3.3 ... )

Would you recommend this review?

yes

Other reviews under "Data Mining":	Date

Feature selection and effective classifiers Deogun J. (ed), Choubey S., Raghavan V. (ed), Sever H. (ed) Journal of the American Society for Information Science 49(5): 423-434, 1998. Type: Article	May 1 1999

Rule induction with extension matrices Wu X. (ed) Journal of the American Society for Information Science 49(5): 435-454, 1998. Type: Article	Jul 1 1998

Predictive data mining Weiss S., Indurkhya N., Morgan Kaufmann Publishers Inc., San Francisco, CA, 1998. Type: Book (9781558604032)	Feb 1 1999

more...

Reproduction in whole or in part without permission is prohibited. Copyright 1999-2024 ThinkLoud^®
Terms of Use | Privacy Policy