Important documents, especially standards documents describing (software) requirements for critical industries, really ought to be precise. Unfortunately, most of these documents are written in English prose, which is woefully imprecise and easily misinterpreted. Extending documents with semantic information is thus a worthy goal.
Of course, this is by no means a new idea: the seeds of it go back to Vannevar Bush’s highly influential article [1]. Such ideas pervade the design of Extensible Markup Language (XML) and its associated technologies (in particular XLink, XSD, and extensible stylesheet language (XSL)). Separately, people have been using various aspects of mathematics for capturing the semantics of many of the artifacts of engineering and computer science.
The paper at hand used a particular standards document (MIL-STD-6016C) for a case study, to propose a framework for the creation of hybrid documents. The authors chose a sampling technique and appear to have sampled 24 pages (three pages across eight topics) from an 8000-plus-page document. It is unclear if such a small sample is statistically significant.
What is then partially developed, with a few sparse examples provided here, is a way to insert some structure atop the English prose. This structure does provide some semantics for the presentation of the material of the document, but it does not actually provide semantics for the meaning of the presented artifacts. This structure would be quite helpful in extracting items like data definitions (which could be used directly in code), but still entirely useless for understanding what that data is supposed to convey.
The paper’s text is long-winded, and frequently repetitive, but then strangely concise when it comes time to give explicit examples that would really help convey their ideas in a more concrete manner. Ironically, their paper suffers from many of the same defects that they identify as problems with standards documents. Their sampling of the relevant literature also seems lacking: the most obvious omission is any mention of OpenMath and/or OMDoc, which already today serve as the basis for working implementations of many of the ideas presented here. Citing an obscure technical report from 1995 for the concept of “separation of concerns” instead of the work of Dijkstra is another irritant.
The goals of the work presented here are laudable, but the paper itself is not.