Computing Reviews
Today's Issue Hot Topics Search Browse Recommended My Account Log In
Review Help
Search
Lazy instruction scheduling: keeping performance, reducing power
Mahjur A., Taghizadeh M., Jahangir A.  Low power electronics and design (Proceeding of the Thirteenth International Symposium on Low Power Electronics and Design, Bangalore, India, Aug 11-13, 2008)375-380.2008.Type:Proceedings
Date Reviewed: Nov 20 2008

The problem of “useless instruction execution” is addressed in this paper. An instruction is useless if the result is never used. With reasonable compilers, this problem can only take place when conditional branches occur between the instruction and the use. The paper proposes a new variation on hardware scheduling for a superscalar processor that delays the execution of the instruction until the value is about to be used. The instruction is considered useless and is discarded if the register holding the value is changed before any use.

The scheduling algorithm tracks the speculative execution of the conditional branches that occur following the instruction. It differentiates between speculative and deterministic execution of an instruction. When a nonspeculative use of the instruction result occurs, the instruction is executed.

Mahjur, Taghizadeh, and Jahangir make the argument that the scheduling algorithm will slow the execution by at most one instruction time. When useless instructions are executed, it will speed up execution because the execution units will not have to deal with the useless instructions. They argue that the implementation costs for memory and hardware are small.

This is an attractive hardware algorithm; however, it does cause difficulty for optimizing compilers. Compilers insert load operations to ensure that the data cache contains data before it is needed. This is done by inserting load operations early in the execution of the algorithm, so the data will be in place when used. This algorithm will eliminate or delay these instructions until the data is used. In either case, the cache does not contain the data early enough. The hardware must contain some form of load instruction or cache manipulation instruction that will always be useful.

Reviewer:  Charles Morgan Review #: CR136260 (1002-0160)
Bookmark and Share
 
RISC/ CISC, VLIW Architectures (C.1.1 ... )
 
 
Compilers (D.3.4 ... )
 
Would you recommend this review?
yes
no
Other reviews under "RISC/CISC, VLIW Architectures": Date
A cost-effective design for MPEG-2 audio decoder with embedded RISC core
Tsai T., Wu R., Chen L. Journal of VLSI Signal Processing Systems 29(3): 255-265, 2001. Type: Article
Apr 21 2003
Cost-Conscious Strategies to Increase Performance of Numerical Programs on Aggressive VLIW Architectures
López D., Llosa J., Valero M., Ayguadé E. IEEE Transactions on Computers 50(10): 1033-1051, 2001. Type: Article
Dec 16 2002
An FPGA-based VLIW processor with custom hardware execution
Jones A., Hoare R., Kusic D., Fazekas J., Foster J.  Field-programmable gate arrays (Proceedings of the 2005 ACM/SIGDA 13th International Symposium on Field-programmable Gate Arrays, Monterey, California, Feb 20-22, 2005)107-117, 2005. Type: Proceedings
May 23 2006
more...

E-Mail This Printer-Friendly
Send Your Comments
Contact Us
Reproduction in whole or in part without permission is prohibited.   Copyright 1999-2024 ThinkLoud®
Terms of Use
| Privacy Policy