Computing Reviews
Today's Issue Hot Topics Search Browse Recommended My Account Log In
Review Help
Search
Inference of regular expressions for text extraction from examples
Bartoli A., De Lorenzo A., Medvet E., Tarlao F. IEEE Transactions on Knowledge and Data Engineering28 (5):1217-1230,2016.Type:Article
Date Reviewed: Jul 6 2016

This paper is a thorough evaluation of using machine learning to generate regular expressions for data mining, such as extracting email addresses from web pages. The paper even includes a comparison with humans asked to do the same tasks. It is thorough, and reference to the unpublished appendix shows that it finds quite readable and concise expressions, and the references even include xkcd.

The tasks are fairly limited and performance is still low for harder tasks such as phone numbers and Congressional bill numbers, even with thousands of learning examples. We are still pretty far from being able to fix a general class of mistakes in a few lines of a database and have a program that watches us and then finishes the job, as was tried back in 1983 by R. P. Nix [1].

I recommended this paper to anyone interested in the details of machine learning methods as applied to text mining.

Reviewer:  Michael Lesk Review #: CR144546 (1609-0691)
1) Nix, R. Editing by example. PhD Dissertation. Yale University, 1983. http://cpsc.yale.edu/sites/default/files/files/tr280.pdf. Accessed 06/15/2016.
Bookmark and Share
 
Learning (I.2.6 )
 
 
Data Mining (H.2.8 ... )
 
Would you recommend this review?
yes
no
Other reviews under "Learning": Date
Learning in parallel networks: simulating learning in a probabilistic system
Hinton G. (ed) BYTE 10(4): 265-273, 1985. Type: Article
Nov 1 1985
Macro-operators: a weak method for learning
Korf R. Artificial Intelligence 26(1): 35-77, 1985. Type: Article
Feb 1 1986
Inferring (mal) rules from pupils’ protocols
Sleeman D.  Progress in artificial intelligence (, Orsay, France,391985. Type: Proceedings
Dec 1 1985
more...

E-Mail This Printer-Friendly
Send Your Comments
Contact Us
Reproduction in whole or in part without permission is prohibited.   Copyright 1999-2024 ThinkLoud®
Terms of Use
| Privacy Policy