The mere existence of honeypots is a blessing. Suddenly, attackers need to worry: “This vulnerable application on a powerful server looks too good to be true. Could it be that I am the victim of a honeypot?”
This paper describes an almost-risk-free honeypot setup. The prototypical attack sequence starts with intelligence on possible targets, often via search engines. The sites are completely unaware of this investigation. However, such searches stand out and are identifiable.
The authors had the good fortune to have access to search engine information. Queries identifying possible targets are not like most queries, so data mining is possible. (What constitutes a good query to fingerprint vulnerable software?) This phase is largely automated, which is important for a practical approach.
Inserting honeypot page hits into search engine results to match the specific queries attracts predators to the honeypot. Once an attacker identifies a possible vulnerable target, the attacker will test whether this is indeed the case.
The predator’s requests to the honeypot site will deviate from any normal traffic that hits these prepared pages. First, the pages are not widely advertised, but are constructed to show up only in target-identifying queries. Second, even if users were to find links to the pages, one can expect only requests for those pages, and nothing else. In that sense, the pages are a whitelist for normal traffic. What is important here is that whitelisting is much easier and less error-prone than blacklisting (attack pattern detection).
A problem to address is how to make sure search engines rank honeypot pages high enough to show up in results, while maintaining a low false negative rate by avoiding most of the innocent visitors. The pages must have inbound links, and they must contain the right keywords for crawlers to pick them up. Ideally, this is the only nonmalicious traffic. The authors report an impressive false negative rate of, at most, one percent.
This type of honeypot has very attractive properties. One does not need to understand the attack or set up complex and vulnerable configurations, unlike with other honeypots. The concern of being compromised and thereby assisting the attacker does not exist.
As each attack fails, one can expect the attacker to try a full range of attacks. This may include attacks under development, long before they are mature.
A drawback of the method is that it is slower than the typical honeypot method. Search engines must first notice the pages and place them high enough in their ranking to attract the attackers.
Search engines are an Internet power that we can use with either good or bad intentions. It is refreshing to see that we can turn a badly intentioned use into a mighty defense: a honeypot with no vulnerabilities.