The deep web refers to the subset of web pages that are returned by search forms that allow semantic searches, such as queries for books written by a particular author or for cars within a specified budget. Having the ability to locate and understand search forms automatically would help search engines perform semantic searches, which is the reason why this research field has been so active in recent years.
This survey paper provides a good analysis of the current techniques to discover search forms. The authors have classified them into the following groups: deep web crawlers, which tackle the problem as a whole; form crawlers, which deal with the problem of locating search forms; and form classifiers, form clusterers, and form rankers, which deal with identifying good search forms. They have analyzed the proposals in each category and concluded that, except for two, the techniques heavily depend on a person to provide information to conduct the search process or to interpret the results. This suggests that this research field will stay very active in the coming years.
The paper is written fairly well. The ideas are presented clearly, it is not necessary to be an expert to understand the content, the classification is convincing, and the features used to study the proposals definitely help compare them side-by-side. I strongly recommend the paper to researchers who work on web integration, focused crawling, meta-queriers, web-scale information extraction, and related fields. Such researchers might also find some useful information in a paper by Khare, An, and Song [1], who surveyed current form interpretation techniques.