Computing Reviews, the leading online review service for computing literature.

Search

Extracting person names from diverse and noisy OCR text
Packer T., Lutes J., Stewart A., Embley D., Ringger E., Seppi K., Jensen L. AND 2010 (Proceedings of the 4th Workshop on Analytics for Noisy Unstructured Text Data, Toronto, ON, Canada, Oct 26, 2010)19-26.2010.Type:Proceedings

Date Reviewed: Mar 31 2011

The authors of this paper provide a satisfying read about name entity recognition (NER) in noisy optical character recognition (OCR) texts. They deliver on their promise of providing answers to many questions that researchers in this area might have. Packer et al. draw many interesting conclusions about performing the difficult task of extracting names from noisy scanned documents: “Word order errors can play a bigger role in poor extraction performance than character recognition errors”; “The knowledge-based approaches performed better than the machine learning (ML) approaches”; and “Combining basic extraction methods can produce higher quality NER.” Regarding the conclusion about machine learning approaches, ML lovers need not despair. The authors point out two ways to overcome their deficiencies: either apply a more realistic noise model of OCR errors to the computational natural language learning (CoNLL) training data or use semi-supervised ML techniques to take advantage of the large number of unlabeled documents.

Reviewer: João Luís G. Rosa	Review #: CR138942 (1110-1086)

Language Parsing And Understanding (I.2.7 ... )

Content Analysis And Indexing (H.3.1 )

Would you recommend this review?

yes

Other reviews under "Language Parsing And Understanding":	Date

Computer processing of natural language Krulee G., Prentice-Hall, Inc., Upper Saddle River, NJ, 1991. Type: Book (9780136102885)	Sep 1 1992

Deep and superficial parsing Wilks Y., Prentice Hall International (UK) Ltd., Hertfordshire, UK, 1985. Type: Book (9789780131638419)	Dec 1 1987

Compound noun interpretation problems Jones K., Prentice Hall International (UK) Ltd., Hertfordshire, UK, 1985. Type: Book (9789780131638419)	Dec 1 1987

more...

Reproduction in whole or in part without permission is prohibited. Copyright 1999-2024 ThinkLoud^®
Terms of Use | Privacy Policy