Computing Reviews, the leading online review service for computing literature.

Search

Inference of regular expressions for text extraction from examples
Bartoli A., De Lorenzo A., Medvet E., Tarlao F. IEEE Transactions on Knowledge and Data Engineering28 (5):1217-1230,2016.Type:Article

Date Reviewed: Jul 6 2016

This paper is a thorough evaluation of using machine learning to generate regular expressions for data mining, such as extracting email addresses from web pages. The paper even includes a comparison with humans asked to do the same tasks. It is thorough, and reference to the unpublished appendix shows that it finds quite readable and concise expressions, and the references even include xkcd. The tasks are fairly limited and performance is still low for harder tasks such as phone numbers and Congressional bill numbers, even with thousands of learning examples. We are still pretty far from being able to fix a general class of mistakes in a few lines of a database and have a program that watches us and then finishes the job, as was tried back in 1983 by R. P. Nix [1]. I recommended this paper to anyone interested in the details of machine learning methods as applied to text mining.

Reviewer: Michael Lesk	Review #: CR144546 (1609-0691)

1)	Nix, R. Editing by example. PhD Dissertation. Yale University, 1983. http://cpsc.yale.edu/sites/default/files/files/tr280.pdf. Accessed 06/15/2016.

Learning (I.2.6 )

Data Mining (H.2.8 ... )

Would you recommend this review?

yes

Other reviews under "Learning":	Date

Learning in parallel networks: simulating learning in a probabilistic system Hinton G. (ed) BYTE 10(4): 265-273, 1985. Type: Article	Nov 1 1985

Macro-operators: a weak method for learning Korf R. Artificial Intelligence 26(1): 35-77, 1985. Type: Article	Feb 1 1986

Inferring (mal) rules from pupils’ protocols Sleeman D. Progress in artificial intelligence (, Orsay, France,391985. Type: Proceedings	Dec 1 1985

more...

Reproduction in whole or in part without permission is prohibited. Copyright 1999-2024 ThinkLoud^®
Terms of Use | Privacy Policy