ComputingReviews.com

ELSA:a multilingual document summarization algorithm based on frequent itemsets and latent semantic analysis
Cagliero L., Garza P., Baralis E. ACM Transactions on Information Systems37(2):1-33,2019.Type:Article

Date Reviewed: 01/16/20

Multi-document summarization involves the automatic generation of concise summaries of a number of textual documents, with the goal of succinctly presenting the most salient information. This allows readers to get the main idea without reading the full content. Arguably the most effective approach to multilingual document summarization is latent semantic analysis (LSA). However, while LSA works well in the context of single document summarization, it may fall short in a multilingual multi-document context because it does not consider “higher-order correlations between sets of document terms and the underlying concepts.”

In contrast, item-based summarizers are capable of discovering “the most significant correlations among multiple document terms.” However, an important issue here is that the relevance of an itemset is evaluated according to a frequency-based analysis (as opposed to the more desirable concept-based description) of the document collection, thereby making “the set of mined frequent itemsets inherently redundant.” Many similar itemsets may describe the same concept.

Cleverly combining the best of both worlds, the authors propose a fusion of the above approaches to overcome their respective limitations. The paper presents the enhanced LSA (ELSA)-based summarizer, which essentially combines “the itemset ability to consider correlations among multiple terms [with] the LSA ability to synthesize textual content into meaningful concepts.” Thus, ELSA promises to advance the state of the art through the clever exploitation of frequent itemsets to capture the latent concepts, and leverages LSA to ensure a compact set of uncorrelated concepts by reducing any redundancy potentially introduced by the former. Indeed, through carefully designed extensive experiments, the authors demonstrate the effectiveness of ELSA. In particular, several benchmark datasets containing multiple document collections (multilingual and not) are used to evaluate ELSA against the current methods. The results suggest the superiority of ELSA in terms of the ROUGE-4 R-measure. Additionally, it is evident from the experiments that ELSA is a relatively stable summarizer.

Although ELSA is presented in the context of multilingual multi-document summarization, only the preprocessing phase that eliminates stop words and stemming is language dependent. The remaining summarization process seems completely language independent. Perhaps more language-dependent components could be considered deeper in the pipeline to improve performance, especially in the context of multilanguage documents.

Reviewer: M. Sohel Rahman

Review #: CR146840 (2005-0114)

Reproduction in whole or in part without permission is prohibited. Copyright 2024 ComputingReviews.com™
Terms of Use | Privacy Policy