Computing Reviews
Today's Issue Hot Topics Search Browse Recommended My Account Log In
Review Help
Search
Triangulating molecular surfaces over a LAN of GPU-enabled computers
Dias S., Gomes A. Parallel Computing42 (C):35-47,2015.Type:Article
Date Reviewed: Nov 11 2015

As the availability of modern massively parallel graphics processing units (GPUs) increases, researchers are exploring ways to utilize loosely coupled GPU-equipped workstations to increase application performance without the cost and complexity of traditional supercomputers. Dias and Gomes present a heterogeneous compute solution for one application, triangulation and rendering of molecular surfaces based on a marching cubes algorithm.

Their solution uses a combination of OpenMPI, OpenMP, and CUDA across a network of multicore systems with CUDA-capable devices. They use a standard master-worker distributed architecture, where each worker node has multiple GPUs. Each GPU in the worker node is controlled by a unique central processing unit (CPU) core or thread. In their experiments, they use up to five worker nodes with two GPUs each. Gigabit and ten-gigabit networks are used in their experiments. The algorithm’s performance is measured on 40 molecules of varying sizes. Their results show that performance scales with the number of GPUs in the network, especially for very large molecules. Network communication overhead is discussed, and it appears that scaling beyond ten GPUs/five workers may not yield additional performance improvement.

While the performance results are generally very impressive, the data presented show some anomalies that are not addressed in the paper. The runtimes of the 40-core solution often show super-linear speedup compared to the single CPU runtimes. Also, for the largest molecule, there is a 70 percent performance improvement on the infiniband network when using ten GPUs versus eight GPUs. These results are nonintuitive, and readers of this paper may be left wondering why this is the case.

This work is encouraging in that it shows that significant performance improvement can be achieved on some applications using a loosely coupled network of commodity hardware without a major refactoring of an existing algorithm.

Reviewer:  Chris Lupo Review #: CR143928 (1601-0056)
Bookmark and Share
 
Local and Wide-Area Networks (C.2.5 )
 
 
Graphics Processors (I.3.1 ... )
 
Would you recommend this review?
yes
no
Other reviews under "Local and Wide-Area Networks": Date
Microcomputer LANs (2nd ed.)
Hordeski M., TAB Books, Blue Ridge Summit, PA, 1991. Type: Book (9780830634248)
Jul 1 1992
High-speed local area networks and their performance
Abeysundara B., Kamal A. ACM Computing Surveys 23(2): 221-264, 1991. Type: Article
Jun 1 1992
Local area networking
Naugle M., McGraw-Hill, Inc., New York, NY, 1991. Type: Book (9780070464551)
Jun 1 1992
more...

E-Mail This Printer-Friendly
Send Your Comments
Contact Us
Reproduction in whole or in part without permission is prohibited.   Copyright 1999-2024 ThinkLoud®
Terms of Use
| Privacy Policy