Quite often algorithms with bad worst-case complexity perform better on the average. Quicksort falls into this class. However, one of the drawbacks of Quicksort is that it is vulnerable to branch mispredictions, that is, branch misses. Modern processors have long pipelines that have to be filled fresh for every miss, causing undesirable execution delay. The authors address this problem by adopting what they call a “partially decoupling control from dataflow.”
As presented in the paper, partitioning is achieved by dividing the array to be sorted into equally sized blocks. Each block element is then compared with the pivot, and the results are stored in a buffer. In the second pass, an interchange of respective elements takes place, thereby avoiding conditional branches. The average number of branch mispredictions turns out to be εnlogn + Ο(n) where ε is a small positive number that is a function of block size. The authors also report an 80 percent increase in sorting speed over the GCC implementation of Quicksort (C++ std::sort).
Quicksort has another disadvantage: it is sensitive to ties. I have worked on several improved versions of Quicksort--including some developed by my own scholars--and have found that, despite the improvements, this disadvantage remains. It is therefore of interest to see how far the authors’ version can tackle this sensitivity.
The paper is interesting and should be of use to advanced undergraduate and postgraduate students, researchers, and practitioners of computing science.