Computing Reviews
Today's Issue Hot Topics Search Browse Recommended My Account Log In
Review Help
Search
Normalization of informal text
Pennell D., Liu Y. Computer Speech and Language28 (1):256-277,2014.Type:Article
Date Reviewed: May 9 2014

This detailed and well-written paper presents a study on the normalization of informal text. The idea of normalization is to convert or correct informal language use into its formal equivalent. An example would be the expansion of the abbreviation “tmr” to “tomorrow.” Informal language use is prevalent in many modern-day applications and is an obstacle to language processing technologies, most of which were researched and developed based on proper, formal language use.

The authors examined the use of two models, one involving the use of sequential labeling (conditional random field (CRF)), and the other based on statistical machine translation (MT). Both models were shown to do better than competitive baselines. The authors went on to show that these models can be combined easily to produce a hybrid system that further improves performance.

The paper is worth reading for several reasons. First, it gives a good introduction to the problem and related work. This is informative and will be useful for new researchers in the field. Next, it explains the experiments that were conducted in detail. Many of the decisions made by the authors are soundly justified and explained. It’s a convincing piece of work and is a good reference for sound scientific writing.

The paper piqued my interest in this area of research, and made me want to test some ideas that I came up with while going through it. For example, I thought that a pure language model approach baseline would have performed better than was reported. The authors did not elaborate on how their language model was derived, but considering that the work was first done in 2011, it will be interesting to see if new language models built on larger text corpora will lead to a stronger baseline.

I recommend this paper to researchers interested in this area. It is well written and informative, and I believe any time spent reading it would be worthwhile.

Reviewer:  Jun-Ping Ng Review #: CR142264 (1408-0685)
Bookmark and Share
  Featured Reviewer  
 
Text Analysis (I.2.7 ... )
 
 
Social Networking (H.3.4 ... )
 
Would you recommend this review?
yes
no
Other reviews under "Text Analysis": Date
Some issues in the semantics and pragmatics of definite reference in the context of natural language database access
Berry-Rogghe G. Circuits, Systems, and Signal Processing 3(1): 47-54, 1984. Type: Article
Jun 1 1985
Word division in Spanish
Mañas J. Communications of the ACM 30(7): 612-616, 1987. Type: Article
Jul 1 1989
Schemata for understanding of argumentation in newspaper texts
Roesner D.  Progress in artificial intelligence (, Orsay, France,3111985. Type: Proceedings
Apr 1 1986
more...

E-Mail This Printer-Friendly
Send Your Comments
Contact Us
Reproduction in whole or in part without permission is prohibited.   Copyright 1999-2024 ThinkLoud®
Terms of Use
| Privacy Policy