Computing Reviews
Today's Issue Hot Topics Search Browse Recommended My Account Log In
Review Help
Search
Web data mining : exploring hyperlinks, contents, and usage data (Data-Centric Systems and Applications)
Liu B., Springer-Verlag New York, Inc., Secaucus, NJ, 2007. 532 pp. Type: Book (9783540378815)
Date Reviewed: Jan 26 2009

This is a textbook about data mining and its application to the Web. The first part of the book covers core data mining and machine learning concepts. These include associative rules that discover correlations among data items and sequential patterns, where ordering also matters. Along with the models, Liu describes key algorithms such as apriori, generalized sequential pattern (GSP), and their variations.

The book then discusses supervised learning, where the algorithms use a set of training data to derive a classification function that is then applied. This is followed by unsupervised learning algorithms that attempt to cluster data into similar subsets without prior training, and partially supervised learning algorithms, where a small training sample is combined with a large set of input data.

The second part of the book relates these data mining concepts to Web mining--beginning with search; the author covers concepts such as relevance ranking, relevance feedback, preprocessing of Web pages, inverted indexes, compression, and meta-search engines. The subject of search is followed by a discussion of link analysis that uses hyperlinks for page evaluation and ranking. Some of the concepts covered include prestige, citation analysis, Google’s PageRank algorithm, and the hypertext induced topic search (HITS) algorithm.

The book then describes issues around Web crawlers. Topics covered include parsing, link extraction, coverage, freshness, and different types of crawlers. The book concludes with chapters on extracting structured information, information integration, and opinion and usage mining.

Liu succeeds in helping readers appreciate the key role that data mining and machine learning play in Web applications. Most readers are familiar with search, but this book really highlights the broad role that machine learning plays when applied to such fields as data extraction and opinion mining. This is important, as it gives people a better idea of what is possible, and points to related areas where these concepts can be further applied. It also motivates the student by adding immediacy and relevance to the concepts and algorithms described.

I like the way the concepts are introduced in a stepwise manner. For example, the author starts with the apriori algorithm, and then describes issues that motivate various refinements. The chapters also include many examples that point intuitively to what the algorithms are modeling. I also appreciated the bibliographical notes at the end of each chapter. They give more context to how and when these algorithms were developed, which helps one appreciate the dynamism of the field.

Reviewer:  W. Hu Review #: CR136455 (0912-1133)
Bookmark and Share
  Featured Reviewer  
 
Data Mining (H.2.8 ... )
 
 
Information Search And Retrieval (H.3.3 )
 
Would you recommend this review?
yes
no
Other reviews under "Data Mining": Date
Feature selection and effective classifiers
Deogun J. (ed), Choubey S., Raghavan V. (ed), Sever H. (ed) Journal of the American Society for Information Science 49(5): 423-434, 1998. Type: Article
May 1 1999
Rule induction with extension matrices
Wu X. (ed) Journal of the American Society for Information Science 49(5): 435-454, 1998. Type: Article
Jul 1 1998
Predictive data mining
Weiss S., Indurkhya N., Morgan Kaufmann Publishers Inc., San Francisco, CA, 1998. Type: Book (9781558604032)
Feb 1 1999
more...

E-Mail This Printer-Friendly
Send Your Comments
Contact Us
Reproduction in whole or in part without permission is prohibited.   Copyright 1999-2024 ThinkLoud®
Terms of Use
| Privacy Policy