Computing Reviews
Today's Issue Hot Topics Search Browse Recommended My Account Log In
Review Help
Search
Foundations of computational linguistics : human-computer communication in natural language
Hausser R., Springer-Verlag New York, Inc., Secaucus, NJ, 2001. 578 pp. Type: Book (9783540424178)
Date Reviewed: Jun 19 2003

The slightly revised title of this second edition of is something of a misnomer. If you are looking for a survey of basic concepts and popular techniques in natural language processing (NLP), other books (for example, those by Manning and Sch¿tze [1], or Jurafsky and Martin [2]) will probably meet your expectations more fully. Nonetheless, I recommend reading Haussers stimulating work for several reasons.

The authors somewhat personal view of what computational linguistics is about is illustrated in the book, something comparatively orthogonal to the current mainstream research in the field. In Haussers view, issues of natural language processing (NLP) are deeply interrelated with the modeling of natural communication, and are instrumental for building talking robots.

This unifying perspective may at times strike the reader as either disconcerting or fascinating. Dismissive statements, such as [...] theoretical linguistics has vainly searched for universals supposed to characterize the innate human language faculty, are typical of Haussers style, and occasionally spoil his otherwise cogent exposition. On the other hand, the general idea of framing NLP in the context of a theory of verbal and non-verbal communication is certainly worthy of praise. Most of the overview parts of the book benefit from this approach, and can be held up as a model of clarity and rigor. The ideas aired here are well argued and refreshing, if not entirely new, and define an exciting interdisciplinary research program. Very few foundational books in the field offer such a wealth of pointers and insights. Yet, some key notions of the book fail to coalesce in a coherent picture. I shall deal in more detail with one of them below, namely, Haussers line of argument against non-linearity of language.

It is common wisdom that a sentence in language is not just a linear arrangement of words trailing one another like beads on a necklace [3]. Each sentence has a highly non-linear, hidden structure that speakers are able to infer on the basis of their internalized knowledge of language. This structure is commonly represented as an upside down tree, or, equivalently, as a sequence of possibly nested constituents, surrounded by parentheses. For instance, the sentence, The little dog found a bone, would be shown as: ( ( the (little dog)) ( found (a bone))).

The sentence divides into two constituents that, in turn, split into two sub-constituents. Assembling the leftmost major constituent requires a preliminary level of nesting, whereby little and dog are recognized as parts of a smaller constituent. From this perspective, the grammar rules of a language are instructions for assembling nested representations of this kind. Speakers are believed to use these representations to understand, for instance, (the (little dog)) as denoting a specific member of the dog class.

From the standpoint of computational linguistics, however, this widely shared assumption raises a few non-trivial problems. In a nutshell, some important classes of language structures are known to be impossible to assemble in linear time. How can we possibly then develop an operational language-understanding robot, if the structures needed to interpret a sentence take non-linear time to be built up?

Hausser suggests that the steps taken by a computer parser in applying the rules of grammar (the so-called parse tree), and the constituency tree of the same sentence, do not coincide. Following his suggestion, the little dog should split into the little and dog. This reminds me of a good, older argument by Jensen [4], where she proposes decoupling binary-branching parse trees, and the much deeper and computationally costly trees generated by complex grammar rules. Hausser apparently ignores Jensens work, and goes so far as to concede only limited linguistic validity to traditional constituency trees.

The last step is questionable and not fully justified. Hausser himself defines a highly indirect mapping between syntax and semantics, whereby binary syntactic elements are projected onto complex semantic structures, fairly reminiscent of constituency trees. This comes as no surprise. Constituency trees are known to collapse both syntactic and semantic aspects of a sentence representation. In the parsing literature, recognition of this fact has prompted the suggestion that these aspects could, and should, be factored out and represented independently. Partial syntactic parsing, such as chunking [5], finds its roots in this suggestion.

Hausser fails to appreciate the significance of such a shift of paradigm, thus throwing into sharp relief a serious limitation of his research program. A talking robot should be able to understand elliptical, incomplete, retraced, or even ungrammatical sentences. Yet, Haussers grammar keeps looking for a top tree node, dominating a well-formed sentence structure. Moreover, the book contains no reference to issues of machine language learning or recovery strategies based on stochastic evidence of language use. In the long run, it would be impossible to maintain the authors open-world assumptions about communicative interaction while disregarding these fundamental issues. In the back of the readers mind, the suspicion keeps lurking that the author indulges in the habit of playing with toy systems, too brittle to be used in real world applications.

Reviewer:  Vito Pirrelli Review #: CR127815 (0310-1070)
1) Manning, C.; Schütze, H. Foundations of statistical natural language processing. MIT Press, Cambridge, MA, 1999.
2) Jurafsky, D.; Martin, J.H. Speech and language processing. Prentice Hall PTR, Upper Saddle River, NJ, 2000.
3) Pinker, S. The language instinct: how the mind creates language. William Morrow, New York, NY, 1994.
4) Jensen, K. Binary Rules and Non-Binary Trees: Breaking down the Concept of Phrase Structure. In Mathematics of language. Edited by Manaster-Ramer, John Benjamins, USA, 1987.
5) Abney, S.P. Parsing by Chunks. In Principled-based parsing: computation and psycholinguistics. Edited by Berwick, R. C.; Abney, S. P.; Tenny, C. Kluwer, Holland, 1991.
Bookmark and Share
  Featured Reviewer  
 
Linguistics (J.5 ... )
 
 
Formal Languages (F.4.3 )
 
 
General (H.2.0 )
 
 
General (H.3.0 )
 
 
Natural Language Processing (I.2.7 )
 
Would you recommend this review?
yes
no
Other reviews under "Linguistics": Date
Meaning and speech acts: principles of language use (vol. 1)
Vanderveken D., Cambridge University Press, New York, NY, 1990. Type: Book (9780521374156)
Sep 1 1992
Modelling spacial knowledge on a linguistic basis
Lang E., Carstensen K., Simmons G., Springer-Verlag New York, Inc., New York, NY, 1991. Type: Book (9780387537184)
Jun 1 1992
Technobabble
Barry J., MIT Press, Cambridge, MA, 1991. Type: Book (9780262023337)
May 1 1992
more...

E-Mail This Printer-Friendly
Send Your Comments
Contact Us
Reproduction in whole or in part without permission is prohibited.   Copyright 1999-2024 ThinkLoud®
Terms of Use
| Privacy Policy