Modern systems typically use a cheaper external memory with a high storage capacity. Since access to this memory is relatively slow compared to the internal memory, applications that process huge amounts of data need to minimize communication across such a memory hierarchy.
Thus, it becomes necessary to develop algorithms that minimize the input/output (I/O) operations between the internal and external memory. Roh et al. provide such algorithms for string problems, especially for the problem of finding the maximum suffix of a string. Although the paper mentions other string problems, the algorithms presented are shown to be easily derivable from an algorithm that can solve the maximum suffix problem. Among these other problems, an interesting one is the problem of string matching with a popular Knuth-Morris-Pratt (KMP) algorithm. Whereas the KMP algorithm uses a scan and shift method, the suggested improvement uses maximum suffix decomposition during the shift step.
Roh et al. have previously published most of this work [1]. Although the two papers share a lot of material, the earlier version has some additional examples that illustrate why merely using a maximum suffix algorithm written for an internal memory model may cause inefficiencies when applied to systems that have external memory. In both versions of the paper, the theory is well presented.
Adequate background material is provided for the benefit of the readers. First, the presentation considers an external memory model with a single disk, and later, it extends the concepts to models with multiple disks.
However, for practitioners interested in implementing these techniques, this paper does not offer any experimental evidence. It would be interesting to compare the performance of the internal memory maximum suffix algorithm to the algorithms suggested here, on datasets with large strings, using an industrial system.
The algorithms suggested add significant complexity to the implementation, but, on the bright side, they do advance the existing theory.