Computing Reviews
Today's Issue Hot Topics Search Browse Recommended My Account Log In
Review Help
Search
Creativity and universality in language
Esposti M., Altmann E., Pachet F., Springer International Publishing, New York, NY, 2016. 208 pp. Type: Book (978-3-319244-01-3)
Date Reviewed: Oct 26 2016

This volume originates in a 2014 conference held in Paris and includes 12 contributions by authors who for the most part are based in departments of mathematics, physics, cognitive science, artificial intelligence, and computational linguistics. The chapters represent a flourishing field of studies of natural language from a mathematical point of view, which uses linear strings in written digital form. The discovery of statistical universal patterns such as Zipf’s Law (1934) predates Chomskyan universal grammar and Greenberg’s universal patterns of phrase structure (1956). The studies in the volume focus on a universal pattern inherent in natural languages and conversely the variations in language that represent creativity and style.

So far, the hierarchical semantic and phrase structural properties of natural language have eluded a full account in artificial intelligence. The contributions propose a variety of experiments on an “agnostic” digital representation of written texts, facilitated by the recent rapid growth in the number of digital corpora in many languages. The contributions to this volume focus on an interesting variety of issues, both theoretical and practical. I summarize some examples to illustrate the kinds of topic that are investigated in these experiments in the statistical analysis of natural language.

Montemurro and Zanette study the balance of order (language constraints) and disorder (choices of lexical items, author’s style) in a corpus of 7,000 documents from 24 languages, both ancient and modern. The less predictable lexical items are shown to be related to the topic of the document. The languages nevertheless show a constant relative entropy value of 3.5 bits, corresponding to the distribution of predictable elements such as function words in sentences, versus less predictable lexical items. This result is highly interesting in that it shows that languages of many, if not all, types are constructed in the same way, though actually how is a matter of great controversy.

Blythe studies language change, the introduction or loss of a specific term out of a group of synonyms: in this case, the set including “couch,” which is now dominant, and “chesterfield,” which has gone out of use. The age of speakers is a proxy for a timeline, showing the rise or decline of a term over time. The rise of the dominant term shows a characteristic constant S-curve, slow introduction followed by rapid spread, which has been observed elsewhere in cases of language change. If language speakers and attitudes were perfectly symmetrical, language change should not occur. Blythe discusses what kinds of asymmetry could be responsible for change in this manner; he concludes that only asymmetry of prestige accounts for the S-curve.

Labbé, Labbé, and Portet deal with the problem of fake scientific articles, computer-generated texts that have found their way into databases and journals. Their experiment contrasts natural texts with computer-generated texts, using either a Markov chain or a context-sensitive phrase structure grammar. The texts generated by a Markov chain have the appearance of a natural text, except that they are incomprehensible. The texts generated by a context-free phrase structure grammar differ from natural texts in having shorter sentences and less rich vocabulary. But it is still unclear that either kind of computer-generated texts can infallibly be detected, so improvements in text generators and detectors are needed.

Author attribution is another use of mathematical quantitative approaches to language analysis. Benedetto and Degli Esposito ask whether the posthumously published poetry Diario postumo is solely the work of the poet Montale or contains additions by his associate A. Cima. They use two approaches to comparing singly authored compositions: entropic distance and n-gram distance. The results show some influence from the other author, part of wider work on authorship, style, and forgery.

The volume as a whole gives a richly detailed overview of the long-term, flourishing research on the mathematical statistical approach to properties of written natural language. The chapters and the references provide details of this inventive investigation of the inherent and variable properties of natural language.

Reviewer:  Alice Davison Review #: CR144873 (1701-0036)
Bookmark and Share
  Reviewer Selected
Featured Reviewer
 
 
Linguistics (J.5 ... )
 
 
Natural Language Processing (I.2.7 )
 
 
Sound And Music Computing (H.5.5 )
 
Would you recommend this review?
yes
no
Other reviews under "Linguistics": Date
Meaning and speech acts: principles of language use (vol. 1)
Vanderveken D., Cambridge University Press, New York, NY, 1990. Type: Book (9780521374156)
Sep 1 1992
Modelling spacial knowledge on a linguistic basis
Lang E., Carstensen K., Simmons G., Springer-Verlag New York, Inc., New York, NY, 1991. Type: Book (9780387537184)
Jun 1 1992
Technobabble
Barry J., MIT Press, Cambridge, MA, 1991. Type: Book (9780262023337)
May 1 1992
more...

E-Mail This Printer-Friendly
Send Your Comments
Contact Us
Reproduction in whole or in part without permission is prohibited.   Copyright 1999-2024 ThinkLoud®
Terms of Use
| Privacy Policy