This paper--on the image compression of documents containing texts--proposes a new compression method based on the mixed raster content (MRC) model. The new method improves compression performance over other similar methods for documents with Farsi and Arabic texts.
Sections 1 and 2 describe the MRC model of compression and existing packages that use the model. They discuss drawbacks of the existing packages in dealing with document images that contain Farsi and Arabic texts.
Section 3 begins with a block diagram of the method. The input document is segmented into background, foreground, and mask layers. The components of the segmentation are binarization, refinement, and boundary smoothing. All three components are described in detail, with threshold formulas. The mask layer compression encodes the library prototypes extracted from chain code signals from the input. It uses the properties of Farsi and Arabic texts. The section ends with a description of how foreground and background layers are compressed.
Section 4 discusses the performance results of the experiments on Farsi and Arabic documents. The performance is evaluated at various stages of the compression. The comparison is done using the DjVu compression method. Section 5 presents conclusions.
An introductory background in image compression may be enough to understand the paper.