The holy grail for computational linguistics, as the authors see it, is the semantic interpretation of text. A combination of semantic web research, particularly related to ontology-based systems, together with linguistic formalisms and knowledge representations can lead the way. Think of ontology as a semantic encapsulation of a particular domain, task, or application in support of faster and better natural language processing (NLP) and understanding. Ontologies organize entities of a domain often using a hierarchy to enable efficient top-down NLP.
This book assumes solid familiarity with first-order logic and Turtle, a resource description language (RDF). The intended audience includes NLP and semantic web practitioners. Since the book originated from a college course taught by the authors, there are exercises for every chapter and a website with programming samples.
Chapter 1 starts out with the basics for creating ontologies, lexical and semantic representations. Chapter 2 deals with knowledge representation, including first-order logic, descriptive logic, and web ontology languages (OWL) to build ontologies. Web ontology languages consist of formal semantics and RDF/XML-based languages for the semantic web. Chapter 3 introduces linguistic tools used in this book, such as lexicalized tree-adjoining grammars to deal with syntactic operations. The authors chose a form of underspecified discourse representation theory to cover semantics. This theory is based on the observation that humans can understand language well by not having to resolve all types of ambiguities found in discourse instantaneously. The authors readily admit that linguistic formalisms are interchangeable and that the formalisms they discuss are mere conduits to facilitate linkage to ontologies.
The central role of lexica to generate grammars and lexically encoded concepts is covered in chapters 4 and 5. The introduced ontology lexicon, called Lemon, supports the automatic generation of domain-specific grammars. Chapter 6 details the formation of semantic representations within the Lemon system. Chapter 7 deals with the interaction between interpretation and reasoning while demonstrating how ontological restrictions can cut down on the multiple readings of sentences in NLP. In chapter 8, the authors deal with a theory for temporal processing that allows a domain-independent treatment of temporal relations applicable across domains. Chapter 9 provides an example of applying the presented ontology-based parsing methodology to a question-answering system based on RDF data. Chapter 10 consists of a summary, discussion of the findings, and further research directions.
The book serves many purposes: it is a textbook with exercises, a handbook with detailed explanations of many subject areas in linguistics and the semantic web, and a book on the methodology of using ontologies for NLP. Since it originated from a course taught by the authors the lengthy treatment of theories, concepts, and disciplines, albeit pedagogically motivated, negatively impacts the flow of the discussion about the ontology-based parsing method put forth in the book.
The book offers takeaways on multiple levels: on the practical level, it demonstrates how one can create an ontology-driven natural language processor for a snapshot of a very limited domain (for example, soccer) using a laundry list of interchangeable theories and procedures from various scientific disciplines, for example, logic, linguistics, and mathematics. The result, an engineering success combining NLP and semantic web research, is a question/answering system based on an ontology-driven parsing method.
On the theoretical level, the book shows the power and limitations of ontologies at the same time: ontologies are conceptualizations of worldviews, that is, highly arbitrary data constructs controlled by their creator (page 17). What could make them scientifically more stringent is the level of robustness and scalability that turns an engineering success into a scientific revelation. However, the authors defer these important issues: “The focus of the book is certainly not on robustness or coverage. ... Our goal is to describe a principled approach to arriving at representations, which could then later be shown to scale by incorporating statistical and machine-learning-based techniques” (page 6). Consequently, the book tells only half of the story of why ontologies are preferable for parsing by offering only a limited proof of concept for language understanding of soccer reporting.
The authors express surprise as to why linguists have not embraced research from the semantic web community more readily (page xv). Here is a possible explanation for this situation. Research into the semantic web reached a breaking point some time ago. The ontologies or organized knowledge representation schemes have gotten so large that their usefulness for some tasks is no longer feasible. Research is well underway to design algorithms that would extract just the right amount of semantic information from ontologies to solve domain-specific problems and challenges. In , researchers present various algorithms that allow the extraction of domain-specific knowledge from existing very large ontologies such as GALEN to increase tractability for specific needs and tasks to be accomplished. What this means is that ontologies may get too large to guide the parsing process since they would have to externalize which partition of the ontology is usable for parsing.
Natural languages make infinite use (that is, combinations of words and phrases) of finite means (that is, grammar rules and lexica). Using ontologies may reduce the complexity by limiting the “infinite use” under certain circumstances and for very stringently defined domains. If practitioners are interested in facilitating text understanding under narrowly defined circumstances, this book describes undoubtedly an interesting engineering success. If, however, one wants to make a stronger claim as to how semantics can drive natural language understanding on a larger scale in industrial-strength applications, a more complex and scientifically motivated approach, as opposed to an engineering approach, is called for. The outcome of such an approach should result in robust text understanding on a large scale and may even incorporate quantitative analysis of NLP.
More reviews about this item: Amazon