Computing Reviews, the leading online review service for computing literature.

Search

Fundamentals of parallel multicore architecture
Solihin Y., Chapman & Hall/CRC, Boca Raton, FL, 2016. 494 pp. Type: Book (978-1-482211-18-4)

Date Reviewed: Jul 12 2016

Since the birth of microprocessors in the 1970s (the Intel 4004 was designed in 1971), virtual memory and pipelining were the main architectural novelties of the 1980s, on-chip caches and superscalar processors were introduced in the 1990s, and multicore chips have become dominant in the 21st century. The first non-embedded microprocessor that combined two cores on a single die was the IBM Power4 in 2001, and it marked a turning point in the architecture of modern microprocessors. Solihin clearly explains why this architecture has become commonplace during the last decade or so, partially due to the diminishing returns of the performance improvements that can be obtained from instruction-level parallelism on a single core, and partially due to power consumption issues that impose limits on the microprocessor clock frequency. An intermediate stage between single and multicore architectures was the inclusion of simultaneous multithreading (SMT) for the execution of several threads on a single processor core designed for exploiting instruction-level parallelism. In fact, Intel introduced its SMT technology, called HyperThreading, in 2002, and yet did not commercialize its first dual-core processor until 2006. As a consequence of the architectural evolution of microprocessors, parallel programming is now required for exploiting the parallelism provided by current hardware. And parallel programming provides the starting point for Solihin’s detailed discussion on multicore architectures. In the first part of his monograph, Solihin presents and exposes some key issues that appear when programming shared memory multiprocessors. Once reserved for high-end supercomputing centers, multicore chips have extended these issues to virtually every computing system, including personal computers (PCs), tablets, and smartphones. After the standard reference to Amdahl’s law, which imposes limits to the performance improvements provided by parallelism due to the sequential part in every algorithm, Solihin introduces the two major parallel programming models: shared memory and message passing. He also comments on alternative models, such as partitioned global address space (PGAS), data parallelism in single-instruction multiple-data (SIMD) architectures, the ubiquitous MapReduce for cluster computing, and transactional memory. This book, apart from a short contributed chapter on the single-instruction multiple-thread (SIMT) programming model used by graphical processing units (GPUs), focuses on the shared memory programming model because “its concepts are important for understanding the architecture of multicore processors.” Two whole chapters are devoted to this programming model, with detailed explanations, numerous examples (some of them written in OpenMP), solved exercises, and homework problems. Once the reader is acquainted with parallel programming, the core of the book, which comprises more than 250 pages, delves into the architectural design of shared memory multiprocessors. Here, three problems play key roles: cache coherence, memory consistency, and synchronization. The first chapter in this part of the book provides a thorough introduction to memory hierarchies. As Paolo Faraboschi, the leader of the architecture effort behind HP Labs “The Machine” project, states: “Deep memory hierarchies are all about hiding latency and saving energy by exploiting locality.” Later, separate chapters are devoted to the three aforementioned problems: two of them focused on cache coherence protocols, one devoted to hardware support for synchronization (that is, locks, barriers, and transactional memory), and a fourth one on memory consistency models (both sequential consistency and relaxed models). The final chapter of this part of Solihin’s monograph turns its attention toward the interconnection networks required to send messages from one processor to another with low latency. Readers will find that Solihin’s style is somewhat verbose, almost as if you were attending a lecture where the professor was describing, state by state, all the transitions in the finite automaton that describes a particular cache coherence protocol. Fortunately, he is also crystal clear when describing key design problems and the solutions proposed to overcome them. Contemporary design issues are discussed, something graduate students will find valuable as a source for ideas, and some case studies are also sprinkled throughout the text. Describing actual systems helps put formal ideas in context, as in the comparison of different memory hierarchies (AMD Shanghai versus Intel Barcelona) or the description of the Alpha 21364 network architecture. Finally, solved exercises and doable homework problems are included at the end of each individual chapter; these are relatively well chosen for helping students assimilate the concepts introduced in each chapter. This book ends in a somewhat unusual form. Instead of a bunch of fluffy conclusions and self-serving platitudes, the final chapter of this book (13) includes four short interviews with leading researchers in computing architecture. Josep Torrellas, from the University of Illinois at Urbana-Champaign (UIUC), provides his experienced perspective on parallel multicore architectures, their past evolution, and the key challenges ahead. Li-Shiuan Peh, from the Massachusetts Institute of Technology (MIT), focuses on network-on-chip design issues, such as power consumption, timing, and reliability. Youfeng Wu, from Intel, analyzes the software side of the problem: the compilation of parallel programs and the potential use of domain-specific languages (DSLs). Finally, Paolo Faraboschi, from HP Labs, shares his views on memory and storage architectures for data-centric systems and promising non-volatile memory (NVM) technologies.

Reviewer: Fernando Berzal	Review #: CR144570 (1610-0718)

Parallel Architectures (C.1.4 )

Parallel Programming (D.1.3 ... )

Parallelism And Concurrency (F.1.2 ... )

Concurrent Programming (D.1.3 )

Would you recommend this review?

yes

Other reviews under "Parallel Architectures":	Date

A chaotic asynchronous algorithm for computing the fixed point of a nonnegative matrix of unit spectral radius Lubachevsky B., Mitra D. Journal of the ACM 33(1): 130-150, 1986. Type: Article	Jun 1 1986

iWarp Gross T., O’Hallaron D., MIT Press, Cambridge, MA, 1998. Type: Book (9780262071833)	Nov 1 1998

Industrial strength parallel computing Koniges A. Morgan Kaufmann Publishers Inc., San Francisco, CA,2000. Type: Divisible Book	Mar 1 2000

more...

Reproduction in whole or in part without permission is prohibited. Copyright 1999-2024 ThinkLoud^®
Terms of Use | Privacy Policy