Computing Reviews
Today's Issue Hot Topics Search Browse Recommended My Account Log In
Review Help
Search
Faster compressed suffix trees for repetitive collections
Navarro G., Ordóñez Pereira A. Journal of Experimental Algorithmics21 (1):1-38,2016.Type:Article
Date Reviewed: May 9 2016

The suffix tree is a celebrated data structure in stringology and is used in providing efficient solutions for a plethora of problems. The main problem of suffix trees is their space usage: they may even require 20 bytes per text symbol! One solution to this issue is compressed suffix trees (CSTs). This paper proposes a new CST called grammar-compressed topology (GCT). GCT achieves low space on repetitive collections and much better times. In fact, GCT can be seen as a specialist for highly repetitive collections: the experiments of this paper show that on synthetic DNA collections with 99.9 percent similarity, GCT uses slightly higher space but runs up to three orders of magnitude faster.

Very briefly, GCT is built on the CST of Sadakane [1], but uses grammar compression on the tree topology instead of just a succinct representation. In fact, the authors adopted the idea of Bille et al. [2] for arbitrary trees and used string grammar compression on the sequence of parentheses that represents the suffix tree topology. The motivations and rationale of this approach followed from the fact that a repetitive text collection produces a suffix tree with repetitive topology.

Perhaps one drawback of the paper is the lack of a worst-case theoretical analysis on the representation of GCT. However, the authors do claim that GCT retains the full functionality of succinct tree representations [3], but is likely to use much less space when the tree has frequent repeated substructures. Their claim is backed up by experiments.

Due to the rapid improvements in sequencing technology, modern challenges are now to handle repositories of 1,000 genomes with capabilities of efficiently performing complex analyses on them. These fast-growing DNA collections are usually highly repetitive. It is in this context that the use of GCT can be extremely useful. Because it exploits the repetitiveness, GCT could have other interesting applications as well, for example, in the context of representation of versioned structured software repositories.

Reviewer:  M. Sohel Rahman Review #: CR144392 (1607-0518)
1) Sadakane, K. Compressed suffix trees with full functionality. Theory of Computing Systems 41, 4(2007), 589–607.
2) Bille, P.; Landau, G.; Raman, R.; Sadakane, K.; Rao, S. S.; Weimann, O. Random access to grammar compressed strings. In Proc. of the 22nd Annual Symposium on Discrete Algorithms (SODA) ACM-SIAM, 2011, 373–389.
3) Navarro, G.; Sadakane, K. Fully-functional static and dynamic succinct trees. ACM Transactions on Algorithms 10, 3(2014), Article No. 16.
Bookmark and Share
 
Data Structures (E.1 )
 
 
Information Storage And Retrieval (H.3 )
 
Would you recommend this review?
yes
no
Other reviews under "Data Structures": Date
Handbook of algorithms and data structures
Gonnet G. (ed), Baeza-Yates R., Addison-Wesley Longman Publishing Co., Inc., Boston, MA, 1991. Type: Book (9780201416077)
Feb 1 1992
Data structures and algorithm analysis
Weiss M., Benjamin-Cummings Publ. Co., Inc., Redwood City, CA, 1992. Type: Book (9780805390544)
Aug 1 1992
Data structures and program design in C
Kruse R., Leung B., Tondo C., Prentice-Hall, Inc., Upper Saddle River, NJ, 1992. Type: Book (9780137256495)
Oct 1 1992
more...

E-Mail This Printer-Friendly
Send Your Comments
Contact Us
Reproduction in whole or in part without permission is prohibited.   Copyright 1999-2024 ThinkLoud®
Terms of Use
| Privacy Policy