ComputingReviews.com

Conjugation-based compression for Hebrew texts
Wiseman Y., Gefner I. ACM Transactions on Asian Language Information Processing6(1):4-es,2007.Type:Article

Date Reviewed: 06/27/07

A compression technique designed for the Hebrew language is presented in this paper. The well-known Burrows-Wheeler algorithm is used, but a preprocessing step makes use of the fact that Hebrew words are derived from roots of two, three, or four letters, with morphology characterized by infixing additional letters.

These patterns, with a few exceptions, such as special forms for some final letters, are used in a first step, and roots are extracted where possible. The Burrows-Wheeler algorithm is used to compress both files. Hebrew words are written without vowels, which textually appear as diacritical marks. However, normally these are absent. The main computational obstacle to this method is choosing the set of patterns to use. In the paper, a greedy method is employed, but details are not provided.

The paper’s results are interesting and suggest that morphological features of a language can make material improvements in compression. The paper contains interesting information on the Hebrew language, and is part of a continuing research project on Hebrew text compression.

Reviewer: Bruce Litow

Review #: CR134476 (0806-0600)

Reproduction in whole or in part without permission is prohibited. Copyright 2024 ComputingReviews.com™
Terms of Use | Privacy Policy