Computing Reviews
Today's Issue Hot Topics Search Browse Recommended My Account Log In
Review Help
Search
The Mannheim Search Join Engine
Lehmberg O., Ritze D., Ristoski P., Meusel R., Paulheim H., Bizer C. Journal of Web Semantics35, Part 3,  159-166,2015.Type:Article
Date Reviewed: Mar 14 2016

Data used by computer systems has traditionally been highly structured and organized. Most current relational database management systems (RDBMS) were introduced in the 1970s and have been faithfully serving their users ever since. Lately though, the explosive growth of the Internet and the World Wide Web has brought forth at least as much data as in regular DBMS; the only problem is that Internet data is much more unstructured than traditional datasets, and thus much more difficult to analyze, let alone integrate, in one single data environment. This paper comes to the rescue, presenting the Mannheim Search Join Engine, a search engine that merges the two worlds into one, where data flows effortlessly and is easily and quickly retrievable. The paper presents the architecture first, and then evaluates its performance searching large corpora of data found on the Internet. Finally, it is compared with other existing methods available today.

The system architecture is composed of a three-step sequence: table indexing, followed by table search, followed by data consolidation. Table indexing consists of retrieving data at large, that is, on the Internet; normalizing it to find the attributes they have in common; and choosing a set of unique data based on these attributes and indexing them. Table search queries data in local structured tables and compares it to the previous set. Data consolidation consists in a series of standard left outer joins between the tables previously built. A large color figure at the beginning of this section is very helpful in explaining this.

The system is then tested on large datasets, published on the web under various forms. Its performance is evaluated in terms of both coverage, or how many results are found, and precision, or how close to the real data these results are. Although not many details on the experimental setup are given, the results are well documented with plenty of tables. At the end of the paper, after warning that their research field is relatively new, without much previous work available, the authors give extensive references to it, show the strengths and weaknesses of their work, and point to future developments.

A system like this may seem complex at first, and the need for such an extensive data retrieval campaign a little bit too far-fetched, but we shall keep in mind that the need for data is growing every day and at present the only viable alternative is to manually search the web. A system like this is thus warmly welcomed.

Reviewer:  Andrea Paramithiotti Review #: CR144232 (1605-0333)
Bookmark and Share
  Featured Reviewer  
 
Search Process (H.3.3 ... )
 
Would you recommend this review?
yes
no
Other reviews under "Search Process": Date
Search improvement via automatic query reformulation
Gauch S., Smith J. ACM Transactions on Information Systems 9(3): 249-280, 1991. Type: Article
Jul 1 1993
Criteria for the selection of search strategies in best-match document-retrieval systems
McCall F., Willett P. International Journal of Man-Machine Studies 25(3): 317-326, 1986. Type: Article
Oct 1 1987
The use of adaptive mechanisms for selection of search strategies in document retrieval systems
Croft W. (ed), Thompson R.  Research and development in information retrieval (, King’s College, Cambridge,1101984. Type: Proceedings
Aug 1 1985
more...

E-Mail This Printer-Friendly
Send Your Comments
Contact Us
Reproduction in whole or in part without permission is prohibited.   Copyright 1999-2024 ThinkLoud, Inc.®
Terms of Use
| Privacy Policy