Computing Reviews, the leading online review service for computing literature.

Search

Converting thread-level parallelism to instruction-level parallelism via simultaneous multithreading
Lo J., Emer J., Levy H., Stamm R., Tullsen D., Eggers S. ACM Transactions on Computer Systems15 (3):322-354,1997.Type:Article

Date Reviewed: Aug 1 1998

One of the highest compliments that an idea can be given is for others to ask, “Why didn’t I think of that?” As the push toward faster and faster computers relies increasingly on making use of the various levels of parallelism in programs, we all feel some frustration over the lack of a single architecture that is best for all types of problems. Some algorithms split nicely into independent threads that will run efficiently on massively parallel systems. Others show a lot of instruction-level parallelism that can be exploited by wide-issue architectures. A good match between architecture and algorithm can produce great computational efficiency; a mismatch is likely to produce computational paralysis. The results reported here receive my “Why didn’t I think of that?” award for asking the question, “What if we built a CPU that could execute instructions from multiple threads at the same time?” and then going out to find the answer in quantitative style. The authors begin by describing their simultaneous multithreading (SMT) architecture and two comparative designs. All start from a base processor similar to the MIPS R10000. The instruction fetch mechanism and register file of the SMT design are modified so that up to four instructions from two of the eight threads the processor is working on can be issued at any clock cycle. The comparative designs are single-chip multiprocessors, one with two CPUs and one with four, so the total resources for each chip are comparable to those of the SMT design. The authors then discuss the benchmarks and, finally, the results of their simulations. The results are interesting not only because they show the success of the SMT design, but because they analyze the causes of inefficiencies in the competing designs and study possible ways to avoid them. While this research group has talked about this design at several architecture conferences, and these talks are included in the conference proceedings, this paper appears to be the first time the work has been discussed in the general journal literature. The ideas are interesting, and I hope we will see them included in a commercial product soon. The paper is worth reading, even by people who do not consider computer architecture their primary interest. I recommend it as outside reading for advanced classes in computer architecture.

Reviewer: D. M. Bowen	Review #: CR121309 (9808-0595)

Parallel Processors (C.1.2 ... )

Instruction Set Design (C.0 ... )

Process Management (D.4.1 )

General (C.0 )

Would you recommend this review?

yes

Other reviews under "Parallel Processors":	Date

Spending your free time Gelernter D. (ed), Philbin J. BYTE 15(5): 213-ff, 1990. Type: Article	Apr 1 1992

Higher speed transputer communication using shared memory Boianov L., Knowles A. Microprocessors & Microsystems 15(2): 67-72, 1991. Type: Article	Jun 1 1992

On stability and performance of parallel processing systems Bambos N., Walrand J. (ed) Journal of the ACM 38(2): 429-452, 1991. Type: Article	Sep 1 1992

more...

Reproduction in whole or in part without permission is prohibited. Copyright 1999-2024 ThinkLoud^®
Terms of Use | Privacy Policy