Computing Reviews, the leading online review service for computing literature.

Search

Learning Object Models from Semistructured Web Documents
Ye S., Chua T. IEEE Transactions on Knowledge and Data Engineering18 (3):334-349,2006.Type:Article

Date Reviewed: Mar 22 2006

Web agents need to sift through an unwieldy amount of human-oriented data on the Web, which is not an easy task. Ye and Chua report on a method to infer object models and extract data from semistructured, tabulated Web pages. Their approach is unsupervised and consists of clustering the data according to a kernel-based metric that measures how frequently it changes in a set of related Web pages; intuitively, the clusters that do not change frequently are prone to be menus or banners, whereas the clusters that change frequently are prone to be data. This idea is not new--it lies at the heart of OntoMiner [1]--but using a kernel-based method seems to be a new application of this technique. The authors conducted some experiments on several commercial Web sites, and the results prove that their technique is effective. Unfortunately, the related work section is poor. First, the authors simply enumerate several proposals, and there is not enough of a theoretical or empirical comparison; for example, it is not clear whether OntoMiner’s hierarchical partitioning algorithm outperforms kernel-based methods. Second, there is no reference to Crescenzi and Mecca’s recent proposal on wrapper induction [2]. Finally, key concepts such as “token” or “shared attribute” are not explained or illustrated in the paper. This makes it difficult to repeat the results and to assess them from a comparative point of view.

Reviewer: Rafael Corchuelo	Review #: CR132584 (0701-0082)

1)	Davalcu, H.; Vadrevu, S.; Nagarajan, S.; Ramakrishnan, I. OntoMiner: bootstrapping and populating ontologies from domain-specific Web sites. IEEE Intelligent Systems 18, 5(2003), 24–33.

2)	Crescenzi , V.; Mecca, G. Automatic information extraction from large websites. Journal of the ACM 51, 5(2004), 731–779.

Database Applications (H.2.8 )

Indexing Methods (H.3.1 ... )

Knowledge Acquisition (I.2.6 ... )

Object Representation (E.2 ... )

XML (I.7.2 ... )

Content Analysis And Indexing (H.3.1 )

Would you recommend this review?

yes

Other reviews under "Database Applications":	Date

Databases for genetic services: current usages and future directions Meaney F. Journal of Medical Systems 11(2-3): 227-232, 1987. Type: Article	Sep 1 1988

Database applications using Prolog Lucas R., Halsted Press, New York, NY, 1988. Type: Book (9789780470211663)	Aug 1 1990

Oracle’s cooperative development environment Kline K., Butterworth-Heinemann, Newton, MA, 1995. Type: Book (9780750695008)	May 1 1996

more...

Reproduction in whole or in part without permission is prohibited. Copyright 1999-2024 ThinkLoud^®
Terms of Use | Privacy Policy