Computing Reviews
Today's Issue Hot Topics Search Browse Recommended My Account Log In
Review Help
Search
Multiword expressions acquisition : a generic and open framework
Ramisch C., Springer Publishing Company, Incorporated, New York, NY, 2014. 230 pp. Type: Book (978-3-319092-06-5)
Date Reviewed: Sep 3 2015

The topic of this very well-organized and well-documented book is an important feature of natural language, multiword expressions (MWEs), which pose as yet unsolved problems for computer applications to language corpus data: “still an open problem and a very challenging one” for multilingualism and machine translation. The author asks three questions: What are the various kinds of multiword expressions, why do they matter, and what happens if we ignore them? MWEs are varied in composition, but are word combinations that are more likely to occur together than random sequences of words. They include English phrasal verbs like “take on” and “take away,” compounds like the title of the book, Greek expressions like “human optical corner” (equals human point of view), and Portuguese complex predicates “leave at side” (equals ignore). As these examples show, they often have idiomatic meanings, which cannot be literally translated, and MWEs in one language may correspond to a single word in another language. The problem for MWEs is that they have different properties from the usual verb followed by a noun or preposition followed by a noun: the components are more tightly or idiomatically connected than ordinary sequences of words, a problem for acquisition from a corpus and machine translation. In other instances, they may correspond to a single word in another language, causing a mismatch of sentences in machine translation, resulting in errors or awkward results.

The problems have been known for a long time, and much previous research has been devoted to computationally based methods for identifying MWEs. This research is surveyed in a chapter on the current state of the art. MWEs are an intrinsic feature of natural language and have been called a tough nut to crack or even a pain in the neck. There has recently been a large body of work on various approaches to identifying MWEs, summed up in the state of the art chapter.

The core of the book is Ramisch’s proposal for a generally applicable tool for acquiring MWEs, one that is language independent, useful for monolingual and bilingual corpora, and capable of being customized for different purposes and corpora. The mwetoolkit is a framework for MWE acquisition from corpora of various types and languages. The general architecture consists of core modules for prototypical acquisition of MWEs (p. 129). It is customizable, with multiple alternatives.

What makes this book particularly interesting and engaging is that in the final chapters the use of the toolkit is illustrated with three different demonstrations, using different languages, corpora, and tasks of acquisition or machine translation. These demonstrations are experiments, designed to define objective and subjective measures of relative success, especially for a given language like Greek or Brazilian Portuguese, for which few resources exist, either large enough corpora or lexicons of expressions.

The first “toy” experiment is designed to extract multiword terms (MWT) from a corpus in English of abstracts of scientific articles. The goal is to create a terminological dictionary. The corpus was preprocessed in several steps and divided into a large training set and a much smaller test set of sentences. Extraction was done with 57 morphosyntactic patterns based on part of speech sequences. Only candidates that occurred five times or more were retained, to improve precision and recall. The mwetoolkit compares favorably with other tools in current use in precision and recall.

Three other experiments are discussed in detail, in order to show the flexibility and adaptability of the mwetoolkit. The first experiment is in lexicography, the acquisition of nominal compounds in Greek from the Greek section of the Europarl corpus. The results were validated by comparison with similar acquisition from the web, and manual evaluation by Greek native speakers. The resulting dictionary of 815 nominal MWEs was made available on a website. The second experiment in lexicography focused on complex predicates in Brazilian Portuguese, another under-resourced language. The problem with complex predicates is their syntactic ambiguity, involving “light” verbs and nominal objects in seven distinct part of speech sequences. The candidates identified by the mwetoolkit were manually analyzed, leading to the formation of a subset of expressions for feelings, “sentiment expressions.” The final set of experiments involved machine translation, contrasting statistical MT with phrase-based MT. These experiments tested the translation of English phrasal verbs into French, using the matching English and French versions of TED talks. The task was made more difficult by the fact that the preposition can be separated from the verb by an object (throw it out). Unexpectedly, the phrase-based MT gave better results than the hierarchical system, which allowed for discontinuous constituents.

The motivating idea behind this work is to explore and compare approaches to MWE, involving various tools as well as human resources. All of the computational tools are freely available and can be accessed from websites given in the book. Much information is given to enable other researchers to investigate MWEs. The book contains a vast amount of information. Though there is no index, the chapter and section headings in the table of contents make it easy to locate specific discussions. An extensive bibliography follows each chapter. There are helpful appendices, including a list of standard part of speech tags.

Reviewer:  Alice Davison Review #: CR143743 (1511-0944)
Bookmark and Share
  Featured Reviewer  
 
Natural Language Processing (I.2.7 )
 
 
Linguistics (J.5 ... )
 
Would you recommend this review?
yes
no
Other reviews under "Natural Language Processing": Date
Current research in natural language generation
Dale R. (ed), Mellish C. (ed), Zock M., Academic Press Prof., Inc., San Diego, CA, 1990. Type: Book (9780122007354)
Nov 1 1992
Incremental interpretation
Pereira F., Pollack M. Artificial Intelligence 50(1): 37-82, 1991. Type: Article
Aug 1 1992
Natural language and computational linguistics
Beardon C., Lumsden D., Holmes G., Ellis Horwood, Upper Saddle River, NJ, 1991. Type: Book (9780136128137)
Jul 1 1992
more...

E-Mail This Printer-Friendly
Send Your Comments
Contact Us
Reproduction in whole or in part without permission is prohibited.   Copyright 1999-2024 ThinkLoud®
Terms of Use
| Privacy Policy