Computing Reviews
Today's Issue Hot Topics Search Browse Recommended My Account Log In
Review Help
Search
Multilingual natural language processing applications : from theory to practice
Bikel D., Zitouni I., IBM Press, Upper Saddle River, NJ, 2012. 640 pp. Type: Book (978-0-137151-44-8)
Date Reviewed: Aug 27 2012

I was quite enthusiastic about the opportunity to review this book, as I have been active in the language technology area for about 20 years, mainly consulting and developing tools for translation companies. These tools include translation editors and translation memories, as well as terminology management and terminology extraction tools. Also, the book was edited by D. M. Bikel from Google and I. Zitouni, both active in language processing, and contains contributions from 30 authors.

The chapters on various aspects of natural language processing (NLP) were assembled to teach readers the art of “building robust and accurate multilingual NLP systems” (from the cover). This also defines the scope of the book: multilingual applications. Monolingual aspects are only covered if they are required as a basis for multilingual tools. This is no real drawback, since there are many books out there for the study of monolingual systems.

The first chapter, “Theory,” starts with words, the smallest units of documents; proceeds to parsing; and ends with sentiment analysis (a very active area in many domains). The second chapter, “Practice,” opens with entity detection (such as, for example, dates), followed by a discussion of machine translation. Various approaches are combined to describe NLP architectures such as unstructured information management architecture (UIMA). Interestingly, there is no chapter about translation memory systems in the book, although this technology is used for about 90 percent of the translation processing done in the real world by translators. This may be something to add in a future edition of the book.

Due to the breadth of topics covered, I will focus on two areas with which I have some familiarity: machine translation (MT) and sentiment analysis. The chapter on MT, written by P. Koehn, is mainly devoted to statistical MT. I was a little bit surprised and disappointed to find only a brief description of classical MT systems (syntax/semantics-based systems). Although statistical MT is the hot topic today in the NLP area, I still expected more information about older systems. The author first explains a key question in MT systems: How do I evaluate the output quality? He explains different approaches, especially the bilingual evaluation understudy (BLEU) score, which is a favorite. Next, he explains in detail the statistical approach, which basically means starting with a large amount of translated text (a parallel corpus) and extracting matching translations, or phrases, from that text to build a statistical model, which is then used for new texts. This approach is quite successful (see, for example, Google MT) and allows for the creation of domain-specific MT systems in a short amount of time. A good example here is the Moses system. Readers who really want to understand the theory behind this type of MT should invest some time in this chapter.

Another chapter I read in more detail was about sentiment analysis. I developed a system in the tourism area some years ago with relatively simple technologies (using dictionaries with positive and negative utterance markers), and wanted to check the authors’ recommendations for this case. C. Banea, R. Mihalcea, and J. Wiebe start with the observation that although many users speak English, an overwhelmingly large part of the community uses other languages. Thus, tools supporting other languages will have a competitive advantage. They describe the different approaches and tools involved (such as dictionaries and MT), moving from words to sentences to documents. Finally, they give some recommendations on how various approaches work, from the best, manually annotated corpora, to the worst, using lexica for translation. They provide a summary table showing that using parallel texts is the best choice, but MT-based approaches are a very close second. Lexicon-based approaches fall at the lower end of the scale. I found this result quite interesting, as I used the fourth-best approach on my data and nevertheless had quite good results.

This is a really interesting and well-written book, which I can recommend to all newbies in this area. However, professionals will gain insight from it as well. I definitely would recommend this book to my students if I were teaching a computational linguistics course at my university.

Reviewer:  K. Waldhör Review #: CR140491 (1212-1215)
Bookmark and Share
  Reviewer Selected
 
 
Natural Language Processing (I.2.7 )
 
 
Linguistics (J.5 ... )
 
Would you recommend this review?
yes
no
Other reviews under "Natural Language Processing": Date
Current research in natural language generation
Dale R. (ed), Mellish C. (ed), Zock M., Academic Press Prof., Inc., San Diego, CA, 1990. Type: Book (9780122007354)
Nov 1 1992
Incremental interpretation
Pereira F., Pollack M. Artificial Intelligence 50(1): 37-82, 1991. Type: Article
Aug 1 1992
Natural language and computational linguistics
Beardon C., Lumsden D., Holmes G., Ellis Horwood, Upper Saddle River, NJ, 1991. Type: Book (9780136128137)
Jul 1 1992
more...

E-Mail This Printer-Friendly
Send Your Comments
Contact Us
Reproduction in whole or in part without permission is prohibited.   Copyright 1999-2024 ThinkLoud®
Terms of Use
| Privacy Policy