In Web 2.0, the content is no longer static, but rather dynamically user generated. In this universe, the more interaction a product page or user profile gets, the greater the potential profits an individual or company may achieve with advertisements. Hence, fraudulent behavior has come up in some means of Web 2.0 interaction. Artificial comments, evaluations, recommendations, and likes define artificial interest that may illegitimately portray the importance of online competitors. This paper looks at one specific kind of fraud: illegitimate likes in the Facebook social network.
CopyCatch is a method that identifies artificial behavior between users and pages in Facebook. It has been designed to find what the authors call lockstep behavior, which occurs when groups of users act together, generally liking the same pages at around the same time. CopyCatch was designed for the immense scales observed nowadays, the ones produced by Facebook; to cope with this data, it works in parallel, following Hadoop’s MapReduce framework. As the problem is comparable to the subspace clustering problem, it is arguable that it is NP-hard; therefore, CopyCatch does not try to find the exact solution, but only a good one, as expected in such problems.
CopyCatch uses two heuristics to achieve a good solution: the set of users is defined to maximize the sum of likes for a given set of pages, and the set of liked pages must never shrink. The algorithm is iterative, converging when these two sets present no changes.
CopyCatch is an interesting massive processing algorithm that has proved its value in the challenging environment of Facebook (cosponsor of the research). It is also a good example of what is possible when the data scales too high: parallel optimization. The paper excludes some details of the process due to confidentiality, but nothing that would prevent its reproduction.