When working to integrate multiple database schemas, either to merge them or to enable queries across their federation, the desirability of automatic schema matching is apparent. Although schema matching and integration has been studied for over 25 years [1], it is still a difficult problem. Bernstein et al. describe it as “AI-complete, that is, as hard as reproducing human intelligence” [2].
Nowadays, many researchers are working on semi-automatic systems that propose potential schema matches, with human experts making the final decisions. In this vein, Karasneh et al. propose a model to extend previous work on the integration of multiple heterogeneous biomedical databases’ schemas.
Unlike other contemporary research efforts, this paper’s approach is limited to integrating relational database schemas. The approach is straightforward, focusing primarily on the schema-matching phase of database integration. Although Karasneh et al. make the unsupported claim of “using as much as possible the available information during the process of matching the schemas,” the model does incorporate multilevel matching and decision making at four levels: schema, attribute name, domain, and instance. Two similarity metrics--n-gram and synonym--are used for matching. The results presented support the following hypotheses: all of the elements used for matching are significant; the approach reduces the number of attributes in the global schema; and the approach reduces the number of null values in an integrated database.
Despite several grammatical errors, the paper is well organized and easy to read. It should be of interest to researchers who are looking for practical approaches to database schema matching and integration.