|
Browse All Reviews > Computer Systems Organization (C) > Processor Architectures (C.1) > Multiple Data Stream Architectures (Multiprocessors) (C.1.2) > Parallel Processors (C.1.2...)
|
|
|
|
|
|
|
|
|
1-10 of 73
Reviews about "Parallel Processors (C.1.2...)":
|
Date Reviewed |
|
Algorithm 980: sparse QR factorization on the GPU Yeralan S., Davis T., Sid-Lakhdar W., Ranka S. ACM Transactions on Mathematical Software 44(2): 1-29, 2017. Type: Article
Many large-scale scientific and engineering computational problems lead, after some kind of discretization, to the solution of huge systems of linear algebraic equations and/or linear least squares problems containing many hundreds of ...
|
Mar 14 2018 |
|
Coarray-based load balancing on heterogeneous and many-core architectures Cardellini V., Fanfarillo A., Filippone S. Parallel Computing 68 45-58, 2017. Type: Article
Load balancing to gain performance and energy efficiency within a heterogeneous processing environment, which contains processing units with different specialized cores and memory specifications that are supposed to collaborate in a pa...
|
Dec 12 2017 |
|
FPGA-GPU communicating through PCIe Thoma Y., Dassatti A., Molla D., Petraglio E. Microprocessors & Microsystems 39(7): 565-575, 2015. Type: Article
Hardware accelerators have seen increasing use in the last decade and can provide significant performance benefits compared to traditional central processing unit (CPU) architectures, especially for massively parallel applications. Spe...
|
May 18 2016 |
|
Speculative segmented sum for sparse matrix-vector multiplication on heterogeneous processors Liu W., Vinter B. Parallel Computing 49(C): 179-193, 2015. Type: Article
Sparse matrix-vector multiplication (SpMV) is a key operation in many scientific and graph applications. It is also challenging because of data compression and load balancing issues. Data compression is needed to efficiently store the ...
|
Apr 21 2016 |
|
Modular vector processor architecture targeting at data-level parallelism Rooholamin S., Ziavras S. Microprocessors & Microsystems 39(4): 237-249, 2015. Type: Article
This research presents a VHSIC hardware description language (VHDL) vector processor architecture specifically designed to address data-level parallelism by separating the vector lanes to use its own private memory, avoiding any stalls...
|
Apr 8 2016 |
|
Tiled QR decomposition and its optimization on CPU and GPU computing system Kim D., Park K. ICPP 2013 (Proceedings of the 2013 42nd International Conference on Parallel Processing,Oct 1-Oct 4, 2013) 744-753, 2013. Type: Proceedings
Single-node heterogeneous computing systems comprised of multicore central processing units (CPUs) and accelerators such as graphics processing units (GPUs) are becoming the norm in high-performance computing (HPC) environments. Each c...
|
May 20 2014 |
|
A hybrid parallel Barnes-Hut algorithm for GPU and multicore architectures Hannak H., Hochstetter H., Blochinger W. Euro-Par 2013 (Proceedings of the 19th International Conference on Parallel Processing, Aachen, Germany, Aug 26-30, 2013) 559-570, 2013. Type: Proceedings
Modularization helps to identify data structures that efficiently work with heterogeneous models where both central processing units (CPUs) and graphics processing units (GPUs) are used. This paper describes a modularized parallelizati...
|
Jan 8 2014 |
|
A scalable parallelization of the gene duplication problem Wehe A., Chang W., Eulenstein O., Aluru S. Journal of Parallel and Distributed Computing 70(3): 237-244, 2010. Type: Article
Historically, the hunt for evolutionary relatedness among groups of species is known as phylogenetics. It coincides with the task of searching, combining, and inspecting presumably related genes. It is an area of interaction between ma...
|
Oct 27 2010 |
|
Converting thread-level parallelism to instruction-level parallelism via simultaneous multithreading Lo J., Emer J., Levy H., Stamm R., Tullsen D., Eggers S. ACM Transactions on Computer Systems 15(3): 322-354, 1997. Type: Article
One of the highest compliments that an idea can be given is for others to ask, “Why didn’t I think of that?” As the push toward faster and faster computers relies increasingly on making use of the various ...
|
Aug 1 1998 |
|
Multithreading with Distributed Functional Units Gunther B. IEEE Transactions on Computers 46(4): 399-411, 1997. Type: Article
The already high number of transistors that can be etched onto a silicon wafer continues to grow, enabling computer architects to lay out more electronic circuitry, and, therefore, more functional units, on one computer chip. This has ...
|
Jun 1 1998 |
|
|
|
|
|
|