Computing Reviews, the leading online review service for computing literature.

Search

Normalization of informal text
Pennell D., Liu Y. Computer Speech and Language28 (1):256-277,2014.Type:Article

Date Reviewed: May 9 2014

This detailed and well-written paper presents a study on the normalization of informal text. The idea of normalization is to convert or correct informal language use into its formal equivalent. An example would be the expansion of the abbreviation “tmr” to “tomorrow.” Informal language use is prevalent in many modern-day applications and is an obstacle to language processing technologies, most of which were researched and developed based on proper, formal language use. The authors examined the use of two models, one involving the use of sequential labeling (conditional random field (CRF)), and the other based on statistical machine translation (MT). Both models were shown to do better than competitive baselines. The authors went on to show that these models can be combined easily to produce a hybrid system that further improves performance. The paper is worth reading for several reasons. First, it gives a good introduction to the problem and related work. This is informative and will be useful for new researchers in the field. Next, it explains the experiments that were conducted in detail. Many of the decisions made by the authors are soundly justified and explained. It’s a convincing piece of work and is a good reference for sound scientific writing. The paper piqued my interest in this area of research, and made me want to test some ideas that I came up with while going through it. For example, I thought that a pure language model approach baseline would have performed better than was reported. The authors did not elaborate on how their language model was derived, but considering that the work was first done in 2011, it will be interesting to see if new language models built on larger text corpora will lead to a stronger baseline. I recommend this paper to researchers interested in this area. It is well written and informative, and I believe any time spent reading it would be worthwhile.

Reviewer: Jun-Ping Ng	Review #: CR142264 (1408-0685)

Text Analysis (I.2.7 ... )

Social Networking (H.3.4 ... )

Would you recommend this review?

yes

Other reviews under "Text Analysis":	Date

Some issues in the semantics and pragmatics of definite reference in the context of natural language database access Berry-Rogghe G. Circuits, Systems, and Signal Processing 3(1): 47-54, 1984. Type: Article	Jun 1 1985

Word division in Spanish Mañas J. Communications of the ACM 30(7): 612-616, 1987. Type: Article	Jul 1 1989

Schemata for understanding of argumentation in newspaper texts Roesner D. Progress in artificial intelligence (, Orsay, France,3111985. Type: Proceedings	Apr 1 1986

more...

Reproduction in whole or in part without permission is prohibited. Copyright 1999-2024 ThinkLoud^®
Terms of Use | Privacy Policy