Computing Reviews
Today's Issue Hot Topics Search Browse Recommended My Account Log In
Review Help
Search
Computing petaflops over terabytes of data: the case of genome-wide association studies
Fabregat-Traver D., Bientinesi P. ACM Transactions on Mathematical Software40 (4):1-22,2014.Type:Article
Date Reviewed: Oct 16 2014

With its catchy but somewhat misleading title, this paper proposes a method for efficiently solving multiple instances of the same problem when the instances are correlated.

The authors propose an algorithm that exploits the correlation among multiple instances of a problem such that the complexity of solving the ensemble of instances is less than the total complexity of solving each instance independently. They look at the genome-wide association studies (GWAS) application. GWAS is applied for locating single-nucleotide polymorphisms, which are involved in the control of a trait or phenotype.

The authors consider the multi-trait GWAS problem, that is, the simultaneous analysis of t traits for m phenotypes of a population of n individuals. Multi-trait GWAS entails the solution of m × t correlated generalized least-squares (GLS) problems of size n. The authors design an algorithm for performing multi-trait GWAS on multicore architectures and compare it with the straightforward approach of solving independently m × t GLS problems.

The multi-trait GWAS algorithm is developed by improving the solution of a single instance of the GLS problem in three steps. First, the relation among successive problems is exploited so that the overall computational complexity is reduced. Second, the authors analyze the data transfers caused by the datasets not fitting in memory and partition the data to hide the overhead due to input/output operations. Third, the computation is mapped to shared-memory parallelism by decomposing it into threads and tasks that are distributed among the available cores.

The result of the first step is to reduce the number of floating-point operations from O(mtn3) with the independent problem solution approach to O(n3 + n2m + n2t + mtn). The authors claim that the complexity of their approach is O(mtn), but this is not true if m and t are smaller than n.

The second step solves the issue of latencies caused by datasets too large to fit in random-access memory (RAM). In this case, an efficient way to handle out-of-core data is to partition data into tiles so that the time it takes to move a tile between storage and memory is less than the time it takes to process one slab, thus completely overlapping data movement with computation.

In the third step, each section of the algorithm is mapped to the best-suited type of shared-memory parallelism, such as multithreaded BLAS-3 kernels, and single-threaded BLAS-1 and BLAS-2 kernels combined with OpenMP task parallelism. The size of the tiles along the m and t dimensions is also determined in this step.

The title of the paper is somewhat misleading; when the algorithm proposed by the authors is applicable to a multicore host, it cuts down the amount of computation to hundreds of teraflops for the values of m, n, and t considered.

Despite the discussed drawbacks, the techniques presented in the paper make it an interesting read for developers of scientific applications.

Reviewer:  Gabriel Mateescu Review #: CR142838 (1501-0078)
Bookmark and Share
  Editor Recommended
Featured Reviewer
 
 
Algorithm Design And Analysis (G.4 ... )
 
 
Computations On Matrices (F.2.1 ... )
 
 
Efficiency (G.4 ... )
 
 
Multiple Data Stream Architectures (Multiprocessors) (C.1.2 )
 
Would you recommend this review?
yes
no
Other reviews under "Algorithm Design And Analysis": Date
Numerical recipes
Sprott J., Cambridge University Press, New York, NY, 1991. Type: Book (9780521406895)
Dec 1 1992
An interactive calculus theorem-prover for continuity properties
Suppes P., Takahashi S. Journal of Symbolic Computation 7(6): 573-590, 1989. Type: Article
Aug 1 1990
The numerical methods programming projects book
Grandine T., Oxford University Press, Inc., New York, NY, 1990. Type: Book (9789780198533870)
Mar 1 1991
more...

E-Mail This Printer-Friendly
Send Your Comments
Contact Us
Reproduction in whole or in part without permission is prohibited.   Copyright 1999-2024 ThinkLoud®
Terms of Use
| Privacy Policy