Computing Reviews
Today's Issue Hot Topics Search Browse Recommended My Account Log In
Review Help
Search
BPM/BPM+: software-based dynamic memory partitioning mechanisms for mitigating DRAM bank-/channel-level interferences in multicore systems
Liu L., Cui Z., Li Y., Bao Y., Chen M., Wu C. ACM Transactions on Architecture and Code Optimization11 (1):1-28,2014.Type:Article
Date Reviewed: Jun 17 2014

Dynamic random access memory (DRAM) interference in shared memory systems can, as demonstrated in this paper, lead to degradation of performance. The paper proposes a software-based solution to provide isolation between applications by allocating different banks (bank-level partitioning mechanism (BPM)) and channels (BPM+) to different applications, thereby improving performance. The paper also proposes dynamically allocating the banks to applications based on their memory requirements with the objective of attaining uniform channel utilization.

The paper attacks the important problem of interference on shared memory systems, and does an excellent job of quantifying the harmful effects of interference and highlighting the worsening of interference effects in the future. Moreover, the paper clearly quantifies the benefits of resource isolation and dynamic resource allocation to reaffirm these ideas, which have been proposed in the past. However, the paper’s software-based solution to reduce interference, determine memory requirements, and calculate dynamic allocations lacks details, is poorly evaluated, and might not be sufficiently lightweight to be useful in real systems with rapidly changing application phases or complicated memory behavior. Moreover, there is no analysis of the scalability of the proposed solution to future systems, which seem to be sharing ever-increasing resources between an increasing number of cores.

Here are some key thoughts on the paper in more detail:

  • The paper quantifies the impact of interference on fairness and throughput. The uneven and large slowdowns in applications due to bandwidth contention or row-buffer contentions can lead to degradation of overall system performance. The paper demonstrates that resource isolation can reduce interference to reduce fairness and improve throughput. This, however, comes at the cost of fragmentation: allocating banks and channels to an application that has smaller requirements can lead to underutilization.
  • The paper demonstrates that dynamic resource allocation is required because static/fixed allocation is insufficient. The system should be able to change the resource allocation based on changing application behaviors and overall system resource availability.
  • The paper’s evaluation is not very thorough. While the authors used modern benchmarks, both single threaded and multithreaded, the authors fail to show the results for all benchmark combinations under all scenarios. The results look cherry-picked.
  • Some claims in the paper are not backed up with data. For example, dynamic allocation is evaluated in intervals of ten seconds, but why?
  • The paper shows a graph between channel utilization mismatch and performance improvement, and concludes that better balance in channel utilization leads to higher performance. This might be a correlation rather than causality, as a better channel utilization will lead to both balanced channel utilization as well as higher performance.
  • While resource isolation is useful in reducing interference, and dynamic allocation helps to improve resource utilization, it is essential that these two mechanisms are lightweight and accurate in order to account for changing application phases. The paper’s software approach might be too expensive and slow to adapt to phase changes.
  • The paper uses last level cache (LLC) miss rate to determine the application’s memory behavior, which is insufficient since a low miss rate can be attributed to either large working set size or low memory-level parallelism in the application phase. The determination of application memory requirements from the LLC miss rate is not present in the paper.
  • The authors comment that hardware solutions are complicated, but the software solution might be more expensive in terms of execution time and energy. They fail to provide any evaluation of performance and energy overheads of the mechanism.
  • The related work section of the paper fails to look at very recent work in this area (for example, TimeCube [1]).

Overall, I would recommend reading this paper to get insights into the worsening ill effects of interference in shared DRAM systems, and the qualitative and quantitative benefits of resource isolation and dynamic allocation. However, I would be cautious about the paper’s software-based solution to achieve these two properties.

Reviewer:  Anshuman Gupta Review #: CR142405 (1409-0755)
1) Gupta, A.; Sampson, J.; Taylor, M. B. TimeCube: a manycore embedded processor with interference-agnostic progress tracking. In Proc. of the International Conference on Embedded Computer Systems: Architectures, Modeling, and Simulation (SAMOS XIII) IEEE, 2013, 227–236.
Bookmark and Share
 
Performance of Systems (C.4 )
 
 
Storage Management (D.4.2 )
 
Would you recommend this review?
yes
no
Other reviews under "Performance of Systems": Date
A computer and communications network performance analysis primer
Stuck B., Arthurs E., Prentice-Hall, Inc., Upper Saddle River, NJ, 1985. Type: Book (9789780131639812)
Jun 1 1985
A mean value performance model for locking in databases
Tay Y., Suri R. (ed), Goodman N. Journal of the ACM 32(3): 618-651, 1985. Type: Article
Mar 1 1986
The relationship between benchmark tests and microcomputer price
Sircar S., Dave D. Communications of the ACM 29(3): 212-217, 1986. Type: Article
Nov 1 1986
more...

E-Mail This Printer-Friendly
Send Your Comments
Contact Us
Reproduction in whole or in part without permission is prohibited.   Copyright 1999-2024 ThinkLoud®
Terms of Use
| Privacy Policy