Keila and Skillicorn present an interesting study in which they analyze 500,000 emails of Enron employees. This large corpus of emails is freely available for download at http://www.cs.cmu.edu/~enron/. Building on previous research, the authors assume that conspiracies in emails can be detected. Writers engaging in deceptive communication use first-person pronouns to “dissociate themselves from their words.” In addition, fewer prepositions, such as “but” and “except,” and more action verbs are used.
The results presented in the paper confirm the authors’ theory at least partially. Some references to Kohonen nets--also known as self-organizing maps (SOMs)--might have been useful. It has been shown that SOMs can achieve good results in classifying text data or other high-dimensional data. Text mining and text analysis are established areas of research. It is therefore a little surprising that the paper does not contain more references; the authors use single value decomposition to analyze the data, but they do not explain why they use this method.
The paper is enjoyable to read, and can certainly serve as an incentive for those experienced in text mining to analyze the Enron data set.