Computing Reviews
Today's Issue Hot Topics Search Browse Recommended My Account Log In
Review Help
Search
ELSA: a multilingual document summarization algorithm based on frequent itemsets and latent semantic analysis
Cagliero L., Garza P., Baralis E.  ACM Transactions on Information Systems 37 (2): 1-33, 2019. Type: Article
Date Reviewed: Nov 16 2020

“You shall know a word by the company it keeps” is perhaps the most famous quotation attributed to J. R. Firth [1]. Searching for ways to automate natural language understanding (NLU), statistical natural language processing prevailed in the field for many decades. This was founded on the frequentist or empiricist traditions of British (corpus) linguistics, led by Firth, Michael A. K. Halliday, and John Sinclair. Contemporary computational linguistics looks at representing natural language as calculated frequencies of co-occuring terms and collocation within a metric space.

It was not long before mathematician Zellig Harris introduced the distributional hypothesis; having confluence with the frequentist tradition attributed to Firth and his contemporaries, it has since dominated computational linguistics. Harris believed that linguistic analysis should be understood in terms of a statistical distribution of words, that is, components in a corpus, conceived as a system of many levels in which items at each level are combined according to local constraints. This does not necessarily exclude semantics [2].

In this context, the paper is an excellent contribution to the world of statistical natural language processing (NLP), including its goal to create meaningful summaries of text documents such as those found in news coverage and analysis. The paper is very well written. It presents a new text summarization algorithm, ELSA, that combines latent semantic analysis (LSA) and frequent itemset mining in databases.

From a computational linguistics point of view, the main idea is to consider co-occuring terms by sentence instead of single terms. The sentences that contain the most significant concepts from a ranked list are then selected as document summarizers. Given that ELSA works with already written sentences within a document, it appears to be transferable to any natural language that has a sentence-based structure.

Apart from its clear theoretical and practical merits, the paper also benefits from excellent writing; for example, it includes discussion of the algorithm’s complexity--a too often neglected aspect nowadays. The experimental design and results are robust and refer to existing multilingual collections of documents and competitions taking place within this context.

The paper is therefore strongly recommended for researchers looking at document summarization. It is also recommended to readers who aspire to write high-quality research papers of their own.

Reviewer:  Epaminondas Kapetanios Review #: CR147108
1) Firth, J. R. The technique of semantics. Transactions of the Philological Society 34, (1935), 36–72.
2) Harris, Z. S. Distributional structure. Word 10, (1954), 146–162.
Bookmark and Share
  Reviewer Selected
Editor Recommended
 
 
Database Applications (H.2.8 )
 
 
Applications (I.5.4 )
 
Would you recommend this review?
yes
no
Other reviews under "Database Applications": Date
ELSA: a multilingual document summarization algorithm based on frequent itemsets and latent semantic analysis
Cagliero L., Garza P., Baralis E.  ACM Transactions on Information Systems 37(2): 1-33, 2019. Type: Article, Reviews: (1 of 2)
Jan 16 2020
Big data of complex networks
Dehmer M., Emmert-Streib F., Pickl S., Holzinger A.,  Chapman & Hall/CRC, Boca Raton, FL, 2016. 332 pp. Type: Book (978-1-498723-61-9), Reviews: (3 of 3)
Nov 8 2019
Cross-dependency inference in multi-layered networks: a collaborative filtering perspective
Chen C., Tong H., Xie L., Ying L., He Q.  ACM Transactions on Knowledge Discovery from Data 11(4): 1-26, 2017. Type: Article
Feb 11 2019
more...

E-Mail This Printer-Friendly
Send Your Comments
Contact Us
Reproduction in whole or in part without permission is prohibited.   Copyright © 2000-2020 ThinkLoud, Inc.
Terms of Use
| Privacy Policy