This paper is a thorough evaluation of using machine learning to generate regular expressions for data mining, such as extracting email addresses from web pages. The paper even includes a comparison with humans asked to do the same tasks. It is thorough, and reference to the unpublished appendix shows that it finds quite readable and concise expressions, and the references even include xkcd.
The tasks are fairly limited and performance is still low for harder tasks such as phone numbers and Congressional bill numbers, even with thousands of learning examples. We are still pretty far from being able to fix a general class of mistakes in a few lines of a database and have a program that watches us and then finishes the job, as was tried back in 1983 by R. P. Nix [1].
I recommended this paper to anyone interested in the details of machine learning methods as applied to text mining.