The World Wide Web (WWW) and its surrounding technologies have become major application domains for data mining techniques. Web mining expertise can be both intellectually and economically rewarding. However, data mining textbooks typically have too much ground to cover and so rarely devote more than a single chapter--or single section --to the specifics of this growing area; some even skip it altogether [2,3]. Hence, a well-written monograph on this relevant topic is a welcome addition to every data miner’s library.
Liu, a well-known researcher in the field, has updated and extended his 2006 monograph on Web data mining techniques  while keeping its original structure. His book provides a comprehensive, self-contained introduction to the major data mining techniques and their use in Web data mining.
Part 1 provides an overview of traditional data mining techniques. This introduction to data mining--about one-third of the book--presents association rule mining, sequential patterns, classification, clustering, and partially supervised learning techniques. This feature detracts from the book’s main focus on Web mining, but it fosters the book’s adoption in courses in which previous knowledge of data mining techniques is not a prerequisite. Liu skillfully combines basic information with discussions of his own research on topics such as association rule mining using multiple minimum supports, associative classifiers, and discovering holes (for clustering problems). The chapter on partially supervised learning is the most technically demanding. It provides a nice survey of the techniques that have been proposed for learning from both labeled and unlabeled examples during the last decade.
Part 2--the remaining two-thirds of the book--delves into Web mining techniques. After an introductory chapter on information retrieval concepts and key Web search ideas, the content revolves around three main topics: Web structure mining, Web content mining, and Web usage mining.
Two chapters on Web structure mining examine social network analysis and Web crawlers. The chapter on social network analysis (titled “Link Analysis” in the first edition ) describes the ubiquitous PageRank and hyperlink-induced topic search (HITS), ranking algorithms that are discussed in every self-respecting monograph on networks or Web information retrieval. It also suggests different data mining strategies that might prove useful for community detection--the network equivalent to traditional clustering--though much better surveys exist on this topic . The second chapter in this section provides a nice survey on the implementation and ethical issues behind the construction of Web crawlers; however, it was never updated from the previous edition.
Three chapters focus on Web content mining. The first one deals with structured data extraction--that is, wrapper induction (a supervised learning problem) and automatic data extraction (based on pattern matching techniques). The second chapter, on information integration, makes extensive use of heuristics and serves as a short introduction to integrating Web query interfaces, including descriptions of three different approaches proposed back in 2004. The final chapter on Web content mining focuses on opinion mining and sentiment analysis, that is, mining opinions that indicate positive or negative sentiments. This 70-page chapter analyzes a technically challenging field and identifies many open research problems from the perspective of the author’s own research. In fact, the author is involved in a start-up company on opinion mining.
Last but not least, a long final chapter deals with Web usage mining, which involves clickstream data analysis. Recommender systems and query log mining are analyzed as representative applications of Web usage mining, and computational advertising, the cornerstone behind the bottom line of search engine companies, is also addressed as a financially significant example of a recommendation system (that is, ads are recommended to match users’ queries).
Insightful bibliographic notes round off this excellent book. Professionals and researchers alike will find it handy as a reference. Its extensive lists of references at the end of each chapter provide hundreds of pointers for further reading. As a textbook, it is also suitable for advanced undergraduate and graduate courses on Web mining; it is highly self-contained and includes many easy-to-understand examples that will help readers grasp the key ideas behind current Web data mining techniques.