Computing Reviews
Today's Issue Hot Topics Search Browse Recommended My Account Log In
Review Help
Search
Computational biology : a practical introduction to BioData processing and analysis with Linux, MySQL, and R (2nd ed.)
Wünschiers R., Springer Publishing Company, Incorporated, New York, NY, 2013. 478 pp. Type: Book (978-3-642347-48-1)
Date Reviewed: Nov 5 2013

A more whimsical title for this book might have been The driven data wrangler’s guide to Unix text processing tools, with bioinformatics application examples. Expressly written for an audience with little or no experience in computers and data processing, this book aims to guide an astute pupil on the path to acquiring a practical familiarity with the Unix command line. From that base, the reader then learns how to make use of several powerful general-purpose utilities available from that interface to facilitate batch processing of large text-based datasets.

The tone of the book is utilitarian, centered on introducing a neophyte to computer programming, with bash, awk, and Perl as the tools of choice. The approach does not pull any punches: the reader who needs to learn something quickly and proficiently must not be afraid of a steep learning curve. For instance, topics run the gamut from using awk for analyzing the content of data files to developing a SQL query with regular expressions and predicates. The author recommends that readers do the exercises with Linux installed on a virtual machine: good advice, but it assumes a number of underlying skills possibly lacking in much of the author’s intended audience. Never fear! Chapter 3 is a crash course on operating systems and virtualization, strictly focused on providing a bare-bones understanding of the framework for running useful applications. Chapter 4 covers procuring Ubuntu Linux without repeating details about the installation and setup of the virtualization environment. In the same chapter, there are instructions for using apt-get to set up bioinformatics packages, MySQL, and R, with explanations of the Unix command line and how to use it to get things done.

Chapter 5 introduces the file system and related utility programs, and chapter 6 discusses remote connectivity, with a focus on how to access remote computers and move data between them. Chapter 7 shows how to use cat, sort, uniq, and other applications that are part of the usual Unix endowment, including grep, ed, some regular expressions, and a few text editors. Chapter 8 introduces the shell, particularly bash. It covers redirection, pipes, job control, and other functionalities.

Chapter 9 is dedicated to the installation of the bioinformatics utilities, BLAST and ClustalW. Chapter 10 tackles shell scripting with a brief introductory tutorial and a commented list of the capabilities of bash, with simple examples. Chapter 11 introduces regular expressions using an empirical, example-driven approach. The author demonstrates what can be done, in lieu of lengthy explanations. Chapter 12 is an effective, if fundamental, tutorial for the stream editor sed.

Chapters 13 and 14 together account for approximately one-fourth of the book’s page count. They could be seen as a crash course in writing awk and Perl programs, respectively. Awk gets a few more pages than Perl, with a rundown of language elements and usage tips. That chapter concludes with the description of a dynamic programming algorithm implemented in awk and relevant to computational biology. Perl gets fewer pages, because, in the author’s words, the chapter “is not intended to be a complete guide” (page 255). It overviews enough of the language to convey the power that Perl can unleash through its often peculiar expressivity.

Chapter 15 introduces relational databases and the statistics suite R. Beginning with the basics of database administration for MySQL, chapter 16 continues with the mechanics of creating and querying tables, but does not broach the subject of database design. R gets a similar treatment in the next chapter, in approximately 20 pages. The reader should not expect to learn statistics from this extremely terse tutorial. The remaining chapters present worked examples, including the comparison of two genomes in chapter 18, homology modeling in chapter 19, sequencing in chapter 20, and an analysis of protein structure in chapter 21. Code is interspersed throughout the examples, to demonstrate the use of the previously introduced Unix arsenal.

I found the book entertainingly written and well edited--increasingly rare attributes in current books on computing. The target reader is a neophyte only in the graduate school sense: the ideal reader will possess intellectual athleticism, resourcefulness, and stamina. The text addresses literacy in textual data processing on the Unix platform in a general-purpose context. Material guidance specific to bioinformatics is limited in scope and confined mostly to the last chapters. The book is a spirited introduction to data processing on Unix, and will also be useful to driven data wranglers who are not necessarily in the biology field.

More reviews about this item: Amazon

Reviewer:  A. Squassabia Review #: CR141697 (1401-0023)
Bookmark and Share
  Featured Reviewer  
 
Database Applications (H.2.8 )
 
 
Biology And Genetics (J.3 ... )
 
 
Linux (D.4.0 ... )
 
 
Relational Databases (H.2.4 ... )
 
 
SQL (H.2.3 ... )
 
 
Text Processing (I.5.4 ... )
 
  more  
Would you recommend this review?
yes
no
Other reviews under "Database Applications": Date
Databases for genetic services: current usages and future directions
Meaney F. Journal of Medical Systems 11(2-3): 227-232, 1987. Type: Article
Sep 1 1988
Database applications using Prolog
Lucas R., Halsted Press, New York, NY, 1988. Type: Book (9789780470211663)
Aug 1 1990
Oracle’s cooperative development environment
Kline K., Butterworth-Heinemann, Newton, MA, 1995. Type: Book (9780750695008)
May 1 1996
more...

E-Mail This Printer-Friendly
Send Your Comments
Contact Us
Reproduction in whole or in part without permission is prohibited.   Copyright 1999-2024 ThinkLoud®
Terms of Use
| Privacy Policy