Computing Reviews
Today's Issue Hot Topics Search Browse Recommended My Account Log In
Review Help
Search
ELSA: a multilingual document summarization algorithm based on frequent itemsets and latent semantic analysis
Cagliero L., Garza P., Baralis E. ACM Transactions on Information Systems37 (2):1-33,2019.Type:Article
Date Reviewed: Jan 16 2020

Multi-document summarization involves the automatic generation of concise summaries of a number of textual documents, with the goal of succinctly presenting the most salient information. This allows readers to get the main idea without reading the full content. Arguably the most effective approach to multilingual document summarization is latent semantic analysis (LSA). However, while LSA works well in the context of single document summarization, it may fall short in a multilingual multi-document context because it does not consider “higher-order correlations between sets of document terms and the underlying concepts.”

In contrast, item-based summarizers are capable of discovering “the most significant correlations among multiple document terms.” However, an important issue here is that the relevance of an itemset is evaluated according to a frequency-based analysis (as opposed to the more desirable concept-based description) of the document collection, thereby making “the set of mined frequent itemsets inherently redundant.” Many similar itemsets may describe the same concept.

Cleverly combining the best of both worlds, the authors propose a fusion of the above approaches to overcome their respective limitations. The paper presents the enhanced LSA (ELSA)-based summarizer, which essentially combines “the itemset ability to consider correlations among multiple terms [with] the LSA ability to synthesize textual content into meaningful concepts.” Thus, ELSA promises to advance the state of the art through the clever exploitation of frequent itemsets to capture the latent concepts, and leverages LSA to ensure a compact set of uncorrelated concepts by reducing any redundancy potentially introduced by the former. Indeed, through carefully designed extensive experiments, the authors demonstrate the effectiveness of ELSA. In particular, several benchmark datasets containing multiple document collections (multilingual and not) are used to evaluate ELSA against the current methods. The results suggest the superiority of ELSA in terms of the ROUGE-4 R-measure. Additionally, it is evident from the experiments that ELSA is a relatively stable summarizer.

Although ELSA is presented in the context of multilingual multi-document summarization, only the preprocessing phase that eliminates stop words and stemming is language dependent. The remaining summarization process seems completely language independent. Perhaps more language-dependent components could be considered deeper in the pipeline to improve performance, especially in the context of multilanguage documents.

Reviewer:  M. Sohel Rahman Review #: CR146840 (2005-0114)
Bookmark and Share
  Reviewer Selected
 
 
Database Applications (H.2.8 )
 
 
Applications (I.5.4 )
 
Would you recommend this review?
yes
no
Other reviews under "Database Applications": Date
Databases for genetic services: current usages and future directions
Meaney F. Journal of Medical Systems 11(2-3): 227-232, 1987. Type: Article
Sep 1 1988
Database applications using Prolog
Lucas R., Halsted Press, New York, NY, 1988. Type: Book (9789780470211663)
Aug 1 1990
Oracle’s cooperative development environment
Kline K., Butterworth-Heinemann, Newton, MA, 1995. Type: Book (9780750695008)
May 1 1996
more...

E-Mail This Printer-Friendly
Send Your Comments
Contact Us
Reproduction in whole or in part without permission is prohibited.   Copyright 1999-2024 ThinkLoud®
Terms of Use
| Privacy Policy