Computing Reviews
Today's Issue Hot Topics Search Browse Recommended My Account Log In
Review Help
Search
Building an example application with the unstructured information management architecture
Ferrucci D., Lally A. IBM Systems Journal43 (3):455-475,2004.Type:Article
Date Reviewed: Feb 2 2005

IBM has developed a new architecture for addressing data stored in “plain” text (this will also address data stored in Extensible Markup Language (XML) structured text, so the real issue is that it is not in a fixed-format structured database). IBM has chosen to call this architecture unstructured information management architecture (UIMA), and this paper describes the architecture, and illustrates its use by building a sample application.

The heart of this architecture is a design intended to allow significant reuse of components. People wishing to find data in an unstructured file need to understand the framework, and then work within the framework to extend the portions that do not already exist. As an example, a common problem is to detect words within the text, and it might be that a parser for a given character set (for example, Farsi) does not already exist, so the developer might need to modify an existing parser (for example, one for the Western alphabet) to achieve the same purpose.

Within the architecture, IBM has proposed five different developer roles, and outlined what each of those roles should do. A starting framework that allows the system manager to use existing documents to train (and test) the modules has also been developed.

This paper does a good job of explaining the concepts, components, and roles of the architecture. It isn’t intended as a developer’s reference, nor is it clear that this architecture is intended for use by people not in IBM’s employment. The paper does point out what is necessary to get high reuse of application components, so the process used can be applied to significantly different problems.

As I read this paper, the only criticism that came to mind is one that often applies to information technology (IT) professionals: the acronyms almost get in the way of understanding. Each acronym is explained on first use, but some of the acronyms used in this paper are used differently in other jargons, so it is difficult to keep them straight. On the other hand, if there were no acronyms used, the paper would be significantly longer, so there is no easy solution.

I recommend this paper to anyone working with unstructured text, or anyone who wishes to implement a process with significant reuse of application components.

Reviewer:  Charles W. Bash Review #: CR130748 (0508-0939)
Bookmark and Share
  Reviewer Selected
 
 
Information Storage And Retrieval (H.3 )
 
 
Textual Databases (H.2.4 ... )
 
 
Systems And Software (H.3.4 )
 
Would you recommend this review?
yes
no
Other reviews under "Information Storage And Retrieval": Date
Length normalization in XML retrieval
Kamps J., de Rijke M., Sigurbjörnsson B.  Research and development in information retrieval (Proceedings of the 27th International Conference on Research and Development in Information Retrieval, Sheffield, United Kingdom, Jul 25-29, 2004)80-87, 2004. Type: Proceedings
Nov 1 2005
Rich results from poor resources: NTCIR-4 monolingual and cross-lingual retrieval of Korean texts using Chinese and English
Kwok K., Choi S., Dinstl N. ACM Transactions on Asian Language Information Processing 4(2): 136-162, 2005. Type: Article
Mar 2 2006
Mining search engine query logs for query recommendation
Zhang Z., Nasraoui O.  World Wide Web (Proceedings of the 15th International Conference on the World Wide Web, Edinburgh, Scotland, May 23-26, 2006)1039-1040, 2006. Type: Proceedings
Jul 25 2006
more...

E-Mail This Printer-Friendly
Send Your Comments
Contact Us
Reproduction in whole or in part without permission is prohibited.   Copyright 1999-2024 ThinkLoud®
Terms of Use
| Privacy Policy