For a compiler fan, it’s great seeing a system that parses code into abstract syntax trees (ASTs) to find, relate, and then generate semantically relevant and illustrative code samples. This paper describes a new data mining approach, called eXoaDocs, which automatically generates and relates these code samples to application programming interface (API) program descriptions, resulting in enriched example-based programming documents.
By automatically creating semantically relevant code samples, the authors’ system omits irrelevant code, but also organizes based on various criteria such as representativeness, frequency, conciseness, and correctness. Their browser also supports popularity ranking to help end users find the best code examples.
This extensive paper provides detailed descriptions of their algorithms for organizing code samples, while contrasting clustering, ranking, and hybrid approaches. Although other successful documentation approaches rely on manually developed, high-quality examples, when dealing with massive magnitudes of code, an automation approach would be valuable. eXoaDocs is compared to other code search engines and documentation approaches. As a test, it was run on the extensive Java Development Kit (JDK) 5 source. Illustrative code documentation samples were generated for 75 percent of the code (27,000 methods). In contrast, the traditional Java documents (JavaDocs) toolset only generated illustrative samples for 2 percent of the same code.
To validate their approach, a user study was conducted where numerous students were given sample problems to program. Those that had access to the eXoaDocs semantic examples had measurable productivity gains.
The authors also nicely identify areas where the validity of their approach could be threatened, but it looks like this approach could play a role in future code documentation and browsing tools.