Several algorithms are able to identify certain so-called communities, regions of data points with more cohesion than the rest. But what happens if these communities and memberships evolve and both concepts are not necessarily preserved over time?
This easily accessible paper answers the question of how to detect outliers--data points whose behavior is different from the rest of the community they initially belonged to--in evolving datasets. Think of stockbrokers who deviate from investment trends or scientific authors who change their co-authorship networks.
Initial inputs to the proposed detection algorithm are P and Q, two partitions of a (constant) set of objects in a varying number of communities, corresponding to the two points in time to be compared. Obviously, a single comparison of community memberships P and Q in terms of a correspondence matrix S is too naive to discriminate between an outlier of a community and its core members. Therefore, the authors introduce an additional “outlierness” score, A, for a given object with regard to a community of Q. Because outlierness is not a crisp concept, the total outlierness score has to be constrained by a certain threshold to obtain a convergent algorithm.
Besides a rigorous exposition of the algorithm (sufficient to actually implement it), the authors also describe some theoretical properties, such as convergence and running time. Using both synthetic and real datasets (for example, subsets of data from the Internet Movie DataBase and the Digital Bibliography and Library Project), the authors convincingly demonstrate the applicability of their approach.
I definitely recommend this paper to researchers in theoretical or applied computer science with an interest in (statistical) communities and outlier detection.