The social Web enhances the traditional Web of content with social relations that link people together. As people’s activities and interactions on the Web increase, it becomes more interesting to analyze their social relationships together with the content they provide. At the same time, due to the exponential growth of available data on the social Web, automatic analysis is inevitable. This is where Russell’s book comes into play. Several books on the social Web and its analysis already exist. This book is unique in that it not only provides a classic tutorial on the concepts and algorithms for analyzing social relationships and the content created by people on the Web, but also offers readers the right tools for doing such an analysis.
The book’s main approach is learning by doing. A majority of the book constitutes program elements written in Python, which are also available on the book’s Web site (https://github.com/ptwobrussell/Mining-the-Social-Web). Russell begins with some introductory examples. The main--and probably the only--prerequisite for this book is a willingness to experience hands-on analysis of data available on the social Web. He claims that prior knowledge of programming is not necessary. However, since the book actually starts with references to Python and Python development tools, complete novices in programming will need to make a nontrivial effort to learn some programming (the book recommends some good resources).
The book contains 10 chapters and an index. It starts with introductory examples that give the reader a first experience with collecting and analyzing social data. The whole book is a kind of hands-on tutorial that motivates the reader to think about issues related to social data analysis, which are then revisited and detailed later on. Next, chapters 2 to 9 give various insights into the analysis of the most popular and available social data resources, such as microformats (for example, Extensible Hypertext Markup Language (XHTML) Friends Network (XFN), hRecipe, and hReview), mailboxes, Twitter, LinkedIn, Google Buzz, blogs, and Facebook. The last chapter wraps up the book with a brief discussion of the semantic Web in relation to the social Web.
The book is primarily intended for beginners in social data analysis (with some relation to information retrieval (IR) and natural language processing (NLP)). It covers a lot of ground as an attempt to “give the crucial 20 percent of the skills that you can use to do 80 percent of the work.” Programming experience (not necessarily in Python) is a big advantage. Potential readers should look at the first chapter, which contains some programming, in order to assess the programming knowledge necessary.
I like the book’s many good references and its strong emphasis on using existing tools and application programming interfaces (APIs) in the presented examples. On the other hand, I do not like the many references to Wikipedia entries. Today’s Web, with its search capabilities, makes this unnecessary (I’m not speaking about the questionable nature of such references, since the quality of Wikipedia content is not guaranteed).
Even though such practical guides become obsolete rather soon, this book will live longer. Russell not only presents examples, but he motivates the reader to think about various aspects of data analysis; identify the right questions for analysis; and consider the robustness and effectiveness of the analysis, together with the availability of particular social data. He also gives introductory tutorials at a conceptual level on several topics, including IR, NLP, and clustering.
Each chapter includes an introduction and a summary. However, I do not recommend that beginners pick and choose chapters of particular interest. The book can be an ideal source for a hands-on acquaintance with the evolving field of social data analysis. It may fit well in a practical seminar on information systems or a related graduate study program.