This paper describes the Constituent Object Parser (COP), which is a pragmatic approach to improving information retrieval results. COP creates binary dependency trees to capture the scope and dependency of the terms in each sentence of an abstract. These can then be compared with the natural language search statement to result in an ordered retrieval based on closeness to the request. According to the authors, COP is “most likely to improve the precision of a search when the same query terms can appear in more than one scope and dependency relationship in a meaningful way in the same database or document collection . . . COP is also useful if the query terms could appear together in the same document without being really related, especially where the terms in the correct relationship mean something quite specific.” Several different matching strategies are discussed and will be tested in a future implementation.
Because of design decisions determined by the retrieval environment, the resultant parser is able to deal with a number of complex natural language processing issues in a simplified manner. These include ambiguous modification structures, conjunction, and intrasentence pronoun referencing. One result of the simplification assumptions, however, has been that the parser overaccepts in two ways. It overaccepts syntactic structures that could be ruled out using semantics and it overaccepts because it does not use the kinds of constraints commonly used in modern syntactic theory. However, as the authors point out, “whether such specificity could be achieved in a very general parser, and whether it would be worth the computational costs, are certainly problematic.”
Future work will focus on integrating their syntactic filter into an existing retrieval system to determine any difficulties naive users have with natural language queries, the effect of syntactic analysis on query formulation and browsing, the awareness (or lack thereof) that users demonstrate in recognizing when this analytic technique would be useful, and the application of syntactic analyses in related areas such as statistical information retrieval and automatic indexing.