Computing Reviews, the leading online review service for computing literature.

Search

Integrating Web query results: holistic schema matching
Chuang S., Chang K. CIKM 2008 (Proceedings of the 17th ACM Conference on Information and Knowledge Management, Napa Valley, CA, Oct 26-30, 2008)33-42.2008.Type:Proceedings

Date Reviewed: Jun 2 2009

Schema matching is one of the challenging problems faced when handling multiple data sources. This gets even more complicated when it needs to be done with only sample instances. Chuang and Chang’s work attempts to address this challenge. The authors explain the concept of pairwise schema matching techniques attempted by other researchers in the area of instance-based schema matching. They claim that holistic schema matching is the same as domain schema discovery. The major contribution seems to be the extension of pairwise schema matching to what could be termed as weighted multi-pair integrated matching. Chuang and Chang verify the effectiveness of their algorithm by using case studies from four domains: airfares, books, cars, and CDs. The holistic matching algorithm proposed provides the best matching performance, compared with a few other algorithms, such as cluster and chain matching. While their claim may be valid for the sample set used for the comparison, it would be very difficult to extend it as a general improvement without a much deeper analysis. First, the select data sources used in the chosen domains have relatively comparable schema--for example, expedia.com and travelocity.com. Therefore, whether the algorithm would perform the same way with diverse schema in the same domain is a question to be answered. Second, the authors do not address what would happen if the domains were changed and, particularly, if the number of fields increased significantly. Surprisingly, having used about 300 to 400 records, from 30 to 40 sample pages in each domain, the authors claim to have carried out extensive experiments. Nevertheless, the paper attempts to address a very important challenge in schema matching.

Reviewer: Sithu D. Sudarsan	Review #: CR136895 (1010-1046)

Miscellaneous (H.2.m )

Heterogeneous Databases (H.2.5 )

Would you recommend this review?

yes

Other reviews under "Miscellaneous":	Date

Data management support for database management Bayer R., Schlichtiger P. Acta Informatica 21(1): 1-28, 1984. Type: Article	Mar 1 1985

Extracting the extended entity-relationship model from a legacy relational database Alhajj R. Information Systems 28(6): 597-618, 2003. Type: Article	Oct 23 2003

Static analysis techniques for predicting the behavior of active database rules Aiken A., Hellerstein J., Widom J. ACM Transactions on Database Systems 20(1): 3-41, 1995. Type: Article	Jan 1 1996

more...

Reproduction in whole or in part without permission is prohibited. Copyright 1999-2024 ThinkLoud^®
Terms of Use | Privacy Policy