Computing Reviews
Today's Issue Hot Topics Search Browse Recommended My Account Log In
Review Help
Search
Anaphora resolution and discourse-new classification : a comprehensive evaluation
Kabadjov M., VDM Verlag, Saarbrücken, Germany, 2010. 284 pp. Type: Book (978-3-639244-47-2)
Date Reviewed: Jun 22 2011

It is one thing to suggest that a general anaphoric resolver and discourse-new detectors for anaphora resolution constitute good and self-evident aspects for inclusion in any natural language processing (NLP) system. It is an entirely different thing to prove their efficiency and effectiveness in a principled way that presupposes a lot of research and an extensive review of the previous literature on the matter. It is this combination of presenting a massive amount of research along with the novel algorithm for discourse-new detectors that makes this book a must-read for anyone seriously considering tackling anaphora from a theoretical or, especially, practical angle (such as summarization, information extraction, or question answering (QA) systems).

This book--a 2007 PhD thesis that was published in 2010--presupposes a good grasp of statistics, as well as linguistic and NLP terminology. The intended audience includes linguists, software engineers, and cognitive scientists who have a penchant for following research in the making and enjoy live statistics to back up the claims. For the dyed-in-the-wool statistician, half of the book is a feast of hardcore formulas and cut-and-dried results. For the NLP engineer, the other half is a goldmine that covers the methodology and the usefulness of the anaphora resolver and discourse-new detectors.

The focus of the thesis was to determine whether (and to what degree) discourse-new detectors improve the performance of anaphora resolution, and whether a generic anaphoric resolver improves the specific task at hand, in relation to text summarization.

The well-documented research produced the following findings (page 4):

  • a combined lexical and anaphoric knowledge representation in conjunction with a latent semantic analyzer (LSA) outperforms systems with only lexical information, when it comes to text summarization;
  • discourse-new recognizers (based on machine learning algorithms) improve definite description resolution; and
  • the resulting system is available via the open-source initiative, and we can use it to replicate and enhance anaphora resolution research online.

The book consists of seven chapters. Chapters 1 and 2 introduce the subject with a good discussion of previous research. Chapter 3 introduces GuiTAR, the author’s version of a general tool for anaphora resolution. Chapter 4 discusses discourse-new classification based on a machine learning approach. This chapter is the core of the author’s scientific contribution, with extensive discussions of extending GuiTAR with a discourse-new classifier approach in conjunction with experiments with hand-parsed and automatically parsed data. Chapter 5 extends the research by applying the extended GuiTAR system to three established research projects involving corpora and annotation:

(1) the GNOME corpus [1], which consists of texts from three different domains--museum labels, pharmaceutical leaflets, and tutorial dialogues--as well as discourse and semantic information annotations;
(2) the CAST project [2], which focused on developing an annotated corpus for automatic summarization; and
(3) the MUC-7 corpus [3], which includes an evaluation software framework allowing uniform performance comparisons of different systems.

Chapter 6 demonstrates that text summarization is an appropriate task for evaluating the performance of anaphora resolvers, provided they encapsulate viable discourse models. Chapter 7 is a four-page summary of the book. The four appendices include training details on the discourse-new classification, error analyses of experiments from chapters 4 and 5, and a short discussion of the Extensible Markup Language (XML)-based anaphoric syntax.

Like any book that is based on a thesis or dissertation, this book addresses a very limited audience; only academics and serious researchers can fully appreciate it. The many abbreviations impact readability even though there is an abbreviation list at the beginning of the book. Research related to anaphora, with regards to both theoretical and practical implications, is extremely complex and proves the old adage of language making infinite use of finite means. In light of this fact, the author has successfully provided a comprehensive evaluation of handling anaphora, as of 2007, and has provided another milestone to analyze the infinite use of anaphora in discourse. Until NLP systems can successfully summarize newspaper articles, Twitter feeds, and ordinary discourse, we have to be content with such milestones, however significant they may be.

Reviewer:  Klaus K. Obermeier Review #: CR139174 (1112-1262)
1) Poesio, M. Discourse annotation and semantic annotation in the GNOME corpus. In Proc. of the 2004 ACL Workshop on Discourse Annotation (DiscAnnotation 2004) The Association for Computational Linguistics, 2004, 72–79.
2) Hasler, L.; Orasan, C.; Mitkov, R. Building better corpora for summarisation. In Proc. of Corpus Linguistics University of Wolverhampton, 2003, 309–319.
3) Defense Advanced Research Projects Agency. Proc. of the 7th Message Understanding Conference (MUC-7). Morgan Kaufman, San Francisco, CA, 1998.
Bookmark and Share
  Reviewer Selected
Featured Reviewer
 
 
Natural Language Processing (I.2.7 )
 
Would you recommend this review?
yes
no
Other reviews under "Natural Language Processing": Date
Current research in natural language generation
Dale R. (ed), Mellish C. (ed), Zock M., Academic Press Prof., Inc., San Diego, CA, 1990. Type: Book (9780122007354)
Nov 1 1992
Incremental interpretation
Pereira F., Pollack M. Artificial Intelligence 50(1): 37-82, 1991. Type: Article
Aug 1 1992
Natural language and computational linguistics
Beardon C., Lumsden D., Holmes G., Ellis Horwood, Upper Saddle River, NJ, 1991. Type: Book (9780136128137)
Jul 1 1992
more...

E-Mail This Printer-Friendly
Send Your Comments
Contact Us
Reproduction in whole or in part without permission is prohibited.   Copyright 1999-2024 ThinkLoud®
Terms of Use
| Privacy Policy