Computing Reviews
Today's Issue Hot Topics Search Browse Recommended My Account Log In
Review Help
Search
Semantic clustering of XML documents
Tagarelli A., Greco S. ACM Transactions on Information Systems28 (1):1-56,2010.Type:Article
Date Reviewed: May 28 2010

With the advent of Extensible Markup Language (XML) and its wide adoption in applications, data extraction from semi-structured documents to facilitate data analysis has become an attractive research direction. The existence of structure in documents provides the means for designing sophisticated approaches for data management and knowledge discovery. These approaches take into consideration both content and structure semantics.

In this paper, Tagarelli and Greco propose a framework, along with algorithms to cluster semantically related semi-structured documents, based on commonalities in their structure and content. First, they apply structure analysis to the XML documents to remove the ambiguity in the different tag names and allow the selection of the most appropriate sense for each tag name. Following this, they analyze the documents based on their content similarity, using techniques that consider both syntactic and semantic term relevance.

An important characteristic of the proposed approach is the use of a novel representation scheme for mapping XML document trees into transactions consisting of items that carry both structure and content characteristics. The authors employ a transactional clustering algorithm that quantifies similarity by taking into consideration the semantics of the data. Subsequently, the identified clusters of transactions derive a classification of the XML documents for the end user. The authors demonstrate the effectiveness of the proposed approach through experiments on real-world data that test it against state-of-the-art algorithms for clustering XML documents.

Overall, this is interesting work. The paper is well structured, motivated, and presented, and the experimental results look promising. For these reasons, researchers in the field will benefit from reading it.

Reviewer:  Aris Gkoulalas-Divanis Review #: CR138046 (1010-1045)
Bookmark and Share
 
Textual Databases (H.2.4 ... )
 
 
Clustering (H.3.3 ... )
 
 
Markup Languages (I.7.2 ... )
 
 
Document Preparation (I.7.2 )
 
 
Information Search And Retrieval (H.3.3 )
 
Would you recommend this review?
yes
no
Other reviews under "Textual Databases": Date
Text databases & document management: theory & practice
Chin A. Idea Group Publishing, Hershey, PA,2001. Type: Divisible Book
May 1 2001
Modeling and managing changes in text databases
Ipeirotis P., Ntoulas A., Cho J., Gravano L. ACM Transactions on Database Systems 32(3): 14-es, 2007. Type: Article
Dec 20 2007

E-Mail This Printer-Friendly
Send Your Comments
Contact Us
Reproduction in whole or in part without permission is prohibited.   Copyright 1999-2024 ThinkLoud®
Terms of Use
| Privacy Policy