In the paper, a novel parallel programming model, Chunks and Tasks, on top of C++ is presented. In this model, the programmer uses common C++ code to expose parallelism in two respects: data (chunks) and work (tasks). This parallelism is then automatically mapped to the physical resources of the target parallel system; the mapping happens transparently for the programmer.
The apparent advantage of the model is that the programmer is relieved from the burden of manually distributing data and work as, for example, required in related approaches such as Cilk-NOW or DAGuE.
For the performance evaluation, the matrix-matrix multiplication application example is used. The performance results are rather poor (60 percent of the AMD Core Math Library (ACML) peak performance), however, the weak spots are exposed and possible optimizations are suggested by the authors for future work.
In the appendix, a complete code example for calculating the Fibonacci numbers in the Chunks and Tasks model is presented and compared to a sequential C++ implementation. Since this example should demonstrate the ease of use of the suggested programming model, it would be desirable to see a (code) comparison to one of the related approaches.
Summarizing, the paper presents an interesting novel parallel programing model that reduces the complexity of parallel programming by exposing parallelism in data and work without dealing with further (low-level) details.