Virtualized resources add flexibility to computer architectures, and are especially useful with parallel applications using the message passing interface (MPI). One can use the virtualization strategy to test distributed memory parallel code on inexpensive shared memory computers. User-level tasks seem to be good candidates for the implementation of MPI processes, to minimize simulation overload. Unfortunately, global variables are shared among threads, while MPI imposes a strict, private memory policy.
This paper proposes leveraging the thread-local storage (TLS) mechanism typically used by kernel threads. The proposal introduces a new TLS data structure that each thread must use to indirectly perform memory accesses, ensuring that all MPI processes indeed only use private memory. This induces changes to the compiler (for example, the addition of the _thread specifier in the language), the linker (allocation and initialization of TLS data), and some libraries (memory management). The authors describe the design and implementation of this approach on top of the Converse runtime. Performance data is presented to illustrate the practicality of this technique over other approaches such as adaptive MPI (AMPI), based on some NAS kernels and the BRAMS weather forecasting application. A positive byproduct of this privatization technique is that process migration is easier to perform--a plus when trying to ensure load balancing via the dynamic preemptive management of processes.
This easy-to-read paper provides food for thought to software engineers interested in virtualization strategies and scientists looking to profit from this powerful productivity-enhancing tool.