ComputingReviews.com

Virtual machine aware communication libraries for high performance computing
Huang W., Koop M., Gao Q., Panda D. Supercomputing (Proceedings of the 2007 ACM/IEEE Conference on Supercomputing, Reno, Nevada, Nov 10-16, 2007)1-12,2007.Type:Proceedings

Date Reviewed: 06/20/08

Virtual machines (VMs) strike back! VM technology is a way of sharing a computer between multiple instances of possibly diverse operating systems (OSs) running at the same time. This technology has been around for many years, and confined to mainframes for most of its lifetime. Recently, virtualization based on x86 processor architecture has been made available for computers, servers, and personal computers (PCs). As the products offered by VMware and its competitors have matured, and computers have become faster and have more random access memory (RAM), virtual machines have become the most exciting software technology of the last few years. Why? Because it’s a natural solution to a number of common challenges that are ever present in data centers: application deployment and migration, and server consolidation and load management. Best of all, it does everything behind the scenes and requires no changes to the applications.

Enter high-performance computing (HPC): once the domain of proprietary, sometimes even one-off platforms and systems, it is now dominated by clusters. There were more than 400 such systems on the November 2007 release of the Top500 supercomputing list. Initially popular for their low cost, they grew larger and faster over the years. Nowadays, large-scale clusters are among the fastest supercomputers in the world. This creates new challenges in system management and administration: dealing with failing nodes, incompatibilities between nodes, and incompatibilities between system libraries required by different applications. Many of these challenges can be addressed by running applications inside VMs, but this comes at the price of additional overhead and performance penalty. HPC users are a very performance-sensitive bunch, and that’s where this paper comes into the picture.

Huang et al. focus on communication bottlenecks between multiple instances of VMs running on the same computer node. Their inter-VM communication library (IVC) is implemented using shared-memory mapping mechanisms present in Xen, a popular virtualization solution available in a number of Linux distributions. IVC uses this feature to share communication buffers between VM instances running on the same node, and then implements the message exchange and flow-control protocols in the user space only. This ensures the overhead of inter-VM communication is minimal. The proof is in the benchmark results. There are point-to-point and group communication microbenchmarks, as well as a number of application-level benchmarks with various communication patterns. All the performance benchmarks are message passing interface (MPI) programs. This is possible because the authors have ported the popular MVAPICH2 library to IVC. Their MVAPICH2-IVC implementation also supports internode communication over InfiniBand, and is able to switch between IVC and the network at runtime. This makes it possible to support another great feature of IVC: transparent migration to a remote node, while the application is running.

So, what is there not to like about this paper? After reading, I still had a few unanswered questions about the VMM-bypass input/output (I/O) used for internode communication, including how it supports the migration between nodes. As it turns out, these questions have already been addressed in other papers cited in the extensive references. That is to say, the bibliography is as good as the rest of this excellent paper.

Reviewer: Maciej Golebiewski

Review #: CR135751 (0905-0473)

Reproduction in whole or in part without permission is prohibited. Copyright 2024 ComputingReviews.com™
Terms of Use | Privacy Policy