The suffix tree is a celebrated data structure in stringology and is used in providing efficient solutions for a plethora of problems. The main problem of suffix trees is their space usage: they may even require 20 bytes per text symbol! One solution to this issue is compressed suffix trees (CSTs). This paper proposes a new CST called grammar-compressed topology (GCT). GCT achieves low space on repetitive collections and much better times. In fact, GCT can be seen as a specialist for highly repetitive collections: the experiments of this paper show that on synthetic DNA collections with 99.9 percent similarity, GCT uses slightly higher space but runs up to three orders of magnitude faster.
Very briefly, GCT is built on the CST of Sadakane [1], but uses grammar compression on the tree topology instead of just a succinct representation. In fact, the authors adopted the idea of Bille et al. [2] for arbitrary trees and used string grammar compression on the sequence of parentheses that represents the suffix tree topology. The motivations and rationale of this approach followed from the fact that a repetitive text collection produces a suffix tree with repetitive topology.
Perhaps one drawback of the paper is the lack of a worst-case theoretical analysis on the representation of GCT. However, the authors do claim that GCT retains the full functionality of succinct tree representations [3], but is likely to use much less space when the tree has frequent repeated substructures. Their claim is backed up by experiments.
Due to the rapid improvements in sequencing technology, modern challenges are now to handle repositories of 1,000 genomes with capabilities of efficiently performing complex analyses on them. These fast-growing DNA collections are usually highly repetitive. It is in this context that the use of GCT can be extremely useful. Because it exploits the repetitiveness, GCT could have other interesting applications as well, for example, in the context of representation of versioned structured software repositories.