Computing Reviews
Today's Issue Hot Topics Search Browse Recommended My Account Log In
Review Help
Search
Performance modeling for hierarchical graph partitioning in heterogeneous multi-core environment
Chan S., Ling T., Aubanel E. Parallel Computing46 (C):78-97,2015.Type:Article
Date Reviewed: Nov 12 2015

Graph partitioning is an interesting topic in parallel scientific computing that has been studied for decades. For many numerical methods, such as finite element methods (FEMs), finite difference methods, and finite volume methods, a mesh has to be distributed among message passing interface (MPI) processes. A partition must be determined such that each process has an equal workload if the system is homogeneous and communications should be minimized. Graph partitioning is ideal to mesh partitions, since each mesh can be converted to a dual graph and graph partitioning tools can be applied right away, such as ParMETIS. Usually, communications are quasi-optimal and good scalability can be expected.

This paper studies performance modeling of hierarchical graph partitioning in heterogeneous clusters, where a new model is proposed under several assumptions: communications are asynchronous and non-blocking; computations and communications overlap; and network and memory contentions are non-negligible. The model also assumes that computations are linearly proportional to mesh size. The authors provide a step-by-step procedure, including platform model and performance model calibration. The performance model is validated using four real-world meshes.

The research topic is interesting and practical. However, the value of this paper is uncertain. First, the targeted applications are FEM codes whose communications can be modeled precisely. It is well known that many open-source parallel graph partitioning tools exist, such as ParMETIS and PT-SCOTCH, which can provide excellent partitioning. Second, real FEM applications require preconditioners to work with linear solvers. This paper uses an iterative conjugate gradient method only, which is not efficient to my knowledge. In the end, the paper assumes that computations are proportional to mesh size, which is not true, especially when the number of MPI processes is large.

Reviewer:  Hui Liu Review #: CR143938 (1601-0059)
Bookmark and Share
  Featured Reviewer  
 
Performance of Systems (C.4 )
 
 
Multiple Data Stream Architectures (Multiprocessors) (C.1.2 )
 
 
Parallel Architectures (C.1.4 )
 
Would you recommend this review?
yes
no
Other reviews under "Performance of Systems": Date
A computer and communications network performance analysis primer
Stuck B., Arthurs E., Prentice-Hall, Inc., Upper Saddle River, NJ, 1985. Type: Book (9789780131639812)
Jun 1 1985
A mean value performance model for locking in databases
Tay Y., Suri R. (ed), Goodman N. Journal of the ACM 32(3): 618-651, 1985. Type: Article
Mar 1 1986
The relationship between benchmark tests and microcomputer price
Sircar S., Dave D. Communications of the ACM 29(3): 212-217, 1986. Type: Article
Nov 1 1986
more...

E-Mail This Printer-Friendly
Send Your Comments
Contact Us
Reproduction in whole or in part without permission is prohibited.   Copyright 1999-2024 ThinkLoud®
Terms of Use
| Privacy Policy