Computing Reviews
Today's Issue Hot Topics Search Browse Recommended My Account Log In
Review Help
Search
A survey of communication performance models for high-performance computing
Rico-Gallego J., Díaz-Martín J., Manumachu R., Lastovetsky A.  ACM Computing Surveys 51 (6): 1-36, 2019. Type: Article
Date Reviewed: Apr 16 2019

Cluster computing is a major trend in scientific high-performance computing (HPC), and the recent evolution of cluster computing requires a revision to the models and methods for evaluating operational performance. The paper’s main achievement is investigating the taxonomy of analytical communication performance modeling based on communication cost in clusters.

The introduction reviews new modeling methodologies for prediction accuracy, mostly based on the message passing interface (MPI) method for point-to-point (P2P) and collective operations. Next, with a network node communication theme, the paper discusses the simple postal model to logGP, featured optimized scheduling algorithms in runtime libraries, derived modeling to mitigate the issue of accurate MPI cost prediction, channel contention and multicore node problems in the network hierarchy, and the heterogeneity of the platform. The progress of different communication models with related parameters--network delay, overhead, gap per message, and number of processors in the cluster--are pictorially expressed, and a topological discussion considers the issue of network hierarchy and related message transferal policies like inter-cluster and intra-cluster communications. Section 3.3, “Communication Contention,” covers node contention, link contention, and controller performance bottlenecks in distributed shared memory (DSM) machines. Platform heterogeneity, middleware costs, scalabilty, and domain generality versus specificity are also discussed.

The paper then discusses bridging the gap between the analytical description of a model and its experimental description with empirical parameters, along with a literature review of the related measurement methods. Moreover, the paper introduces a framework to evaluate building methods and best practices to evaluate measurement methods. A section is devoted to enhancing the estimation accuracy of a communication performance model, discussing the factors that influence model performance. The paper ends with a conclusion and future research.

The paper provides a comprehensive view of the evolution, estimation, and analysis of cluster performance modeling in the HPC ecosystem.

Reviewer:  Mohammad Sadegh Kayhani Pirdehi Review #: CR146534 (1908-0318)
Bookmark and Share
  Reviewer Selected
 
 
General (I.6.0 )
 
 
General (C.2.0 )
 
Would you recommend this review?
yes
no
Other reviews under "General": Date
Theory of modeling and simulation: discrete event & iterative system computational foundations (3rd ed.)
Zeigler B., Muzy A., Kofman E.,  Academic Press, Inc., San Diego, CA, 2019. 692 pp. Type: Book (978-0-128133-70-5)
Dec 5 2019
Introduction to modeling and simulation with MATLAB and Python
Gordon S., Guilfoos B.,  Chapman&Hall/CRC, Boca Raton, FL, 2017. 210 pp. Type: Book (978-1-498773-87-4)
Oct 18 2018
Fundamentals of complex networks: models, structures and dynamics
Chen G., Wang X., Li X.,  Wiley Publishing, Hoboken, NJ, 2015. 392 pp. Type: Book (978-1-118718-11-7), Reviews: (3 of 3)
Aug 9 2016
more...

E-Mail This Printer-Friendly
Send Your Comments
Contact Us
Reproduction in whole or in part without permission is prohibited.   Copyright © 2000-2021 ThinkLoud, Inc.
Terms of Use
| Privacy Policy