Computing Reviews
Today's Issue Hot Topics Search Browse Recommended My Account Log In
Review Help
Search
Limits of region-based dynamic binary parallelization
Edler von Koch T., Franke B.  VEE 2013 (Proceedings of the 9th ACM SIGPLAN/SIGOPS International Conference on Virtual Execution Environments, Houston, TX, Mar 16-17, 2013)13-22.2013.Type:Proceedings
Date Reviewed: May 17 2013

Parallel computation is everywhere, from small portable devices to laptops, desktops, and data centers. This wealth of available parallelism challenges computer scientists and developers alike to rethink algorithms and rewrite old sequential source codes. However, we still have a long way to go. This paper offers a deep analysis on a different way to address the challenge. The authors investigate the performance limits of dynamic binary parallelization (DBP), a technique for transparently parallelizing single-thread binary executables.

The success of instruction-level parallelism on single-core architectures with multiple arithmetic logic units (ALUs) motivates a higher level of automatic parallelization on multi-core architectures (the DBP). In the paper, the authors first analyze this new proposed architecture by explaining how sections of binary code are identified for parallel execution, and then present their experiments and extensive results.

The key idea of DBP is to reduce the number of critical path instructions by overlapping execution segments of the instruction stream. The overlapping is done by speculative cores launched by a master core responsible for the correct execution of the original binary. However, this strategy may result in performance reduction if many speculative threads are invalidated, mainly due to data dependency between threads.

The experiments address various applications of a benchmark suite. The results, from an eight-core reduced instruction set computing (RISC) processor over a single-core execution, show an interesting 54 percent reduction in the number of instructions on critical paths. This is associated with a mismatched average speedup of 1.43 times. This is still far from the ideal performance gain, which suggests that the main bottleneck is bandwidth rather than instruction processing.

Reviewer:  Andre Maximo Review #: CR141227 (1308-0707)
Bookmark and Share
  Reviewer Selected
Featured Reviewer
 
 
Parallel Architectures (C.1.4 )
 
 
Run-Time Environments (D.3.4 ... )
 
Would you recommend this review?
yes
no
Other reviews under "Parallel Architectures": Date
A chaotic asynchronous algorithm for computing the fixed point of a nonnegative matrix of unit spectral radius
Lubachevsky B., Mitra D. Journal of the ACM 33(1): 130-150, 1986. Type: Article
Jun 1 1986
iWarp
Gross T., O’Hallaron D., MIT Press, Cambridge, MA, 1998. Type: Book (9780262071833)
Nov 1 1998
Industrial strength parallel computing
Koniges A. Morgan Kaufmann Publishers Inc., San Francisco, CA,2000. Type: Divisible Book
Mar 1 2000
more...

E-Mail This Printer-Friendly
Send Your Comments
Contact Us
Reproduction in whole or in part without permission is prohibited.   Copyright 1999-2024 ThinkLoud®
Terms of Use
| Privacy Policy