Computing Reviews
Today's Issue Hot Topics Search Browse Recommended My Account Log In
Review Help
Search
Learning Object Models from Semistructured Web Documents
Ye S., Chua T. IEEE Transactions on Knowledge and Data Engineering18 (3):334-349,2006.Type:Article
Date Reviewed: Mar 22 2006

Web agents need to sift through an unwieldy amount of human-oriented data on the Web, which is not an easy task. Ye and Chua report on a method to infer object models and extract data from semistructured, tabulated Web pages. Their approach is unsupervised and consists of clustering the data according to a kernel-based metric that measures how frequently it changes in a set of related Web pages; intuitively, the clusters that do not change frequently are prone to be menus or banners, whereas the clusters that change frequently are prone to be data. This idea is not new--it lies at the heart of OntoMiner [1]--but using a kernel-based method seems to be a new application of this technique.

The authors conducted some experiments on several commercial Web sites, and the results prove that their technique is effective. Unfortunately, the related work section is poor. First, the authors simply enumerate several proposals, and there is not enough of a theoretical or empirical comparison; for example, it is not clear whether OntoMiner’s hierarchical partitioning algorithm outperforms kernel-based methods. Second, there is no reference to Crescenzi and Mecca’s recent proposal on wrapper induction [2]. Finally, key concepts such as “token” or “shared attribute” are not explained or illustrated in the paper. This makes it difficult to repeat the results and to assess them from a comparative point of view.

Reviewer:  Rafael Corchuelo Review #: CR132584 (0701-0082)
1) Davalcu, H.; Vadrevu, S.; Nagarajan, S.; Ramakrishnan, I. OntoMiner: bootstrapping and populating ontologies from domain-specific Web sites. IEEE Intelligent Systems 18, 5(2003), 24–33.
2) Crescenzi , V.; Mecca, G. Automatic information extraction from large websites. Journal of the ACM 51, 5(2004), 731–779.
Bookmark and Share
  Reviewer Selected
Featured Reviewer
 
 
Database Applications (H.2.8 )
 
 
Indexing Methods (H.3.1 ... )
 
 
Knowledge Acquisition (I.2.6 ... )
 
 
Object Representation (E.2 ... )
 
 
XML (I.7.2 ... )
 
 
Content Analysis And Indexing (H.3.1 )
 
  more  
Would you recommend this review?
yes
no
Other reviews under "Database Applications": Date
Databases for genetic services: current usages and future directions
Meaney F. Journal of Medical Systems 11(2-3): 227-232, 1987. Type: Article
Sep 1 1988
Database applications using Prolog
Lucas R., Halsted Press, New York, NY, 1988. Type: Book (9789780470211663)
Aug 1 1990
Oracle’s cooperative development environment
Kline K., Butterworth-Heinemann, Newton, MA, 1995. Type: Book (9780750695008)
May 1 1996
more...

E-Mail This Printer-Friendly
Send Your Comments
Contact Us
Reproduction in whole or in part without permission is prohibited.   Copyright 1999-2024 ThinkLoud®
Terms of Use
| Privacy Policy