A text analysis package is a program or suite for such tasks as creating concordances, word frequency counts, collocation lists (that is, lists of words occurring near one another in the text), and so on. Such packages are of use to literary scholars and others interested in the properties of texts. The present paper first gives a set-theoretic definition of the tasks of text analysis, and then briefly describes one such system, CLOC.
Regrettably, most readers will find the paper either too long or too short. Tho- se interested in a short overview of text analysis packages will get stuck in t- he over-rigorous set-theoretic definitions, which seem to be designed for direct translation into a PROLOG-like language--Reed can never, for example, say anything like 1≤Ci≤Cp without first mentioning that i is an integer. This is a shame, as the later sections of the paper have exactly the sort of material such a reader would want. Conversely, those interested in all the gory details will like the definitions, and then be sorely disappointed by the sparse information about the implementation.
It is never clear what the basis is for Reed’s claim that the package is actually “based” on elementary set theory, as opposed to merely being in conformity with his set-theoretic abstraction.