This paper describes a language independent, automatic text summarization system, the platform for language independent summarization (PLIS). Traditional text summarization systems mainly focus on the processing of English text. PLIS is, however, able to work with multiple languages by translating non-English text to English, before applying a set of features to score and rank sentences with the input text. It is an extractive summarization system, thus the top-ranked sentences are then selected to compose the final summary.
The paper explains PLIS well. It is appropriately structured and the language is simple and easily comprehensible. It is a short paper, however, and as a result misses out on several details. For example, while we know that PLIS makes use of three features, namely word frequency, sentence length, and sentence position, the paper neglects to explain how these are combined to obtain a final score for each sentence. It also does not explain if PLIS requires training to find an optimal combination of these three features, or if they are aggregated together heuristically in an unsupervised fashion.
While the work on PLIS is notable and very important, I would have hoped for more extensive experimentation. PLIS is described as a multi-language summarization system. However, the experiments only focused on text in English, Spanish, and Portuguese. In the introduction, the authors explicitly note this as a failing in several related papers. As a result, I expected experiments over a wide gamut of European Union (EU) languages.
Further, the paper does not mention the multilingual summarization pilot held at the 2011 Text Analysis Conference (TAC), which would have been very relevant [1].
All in all, this is a good read that is easy to follow. The technique employed by PLIS shows promising results, and this would be an interesting paper for researchers interested in multilingual text summarization.