Computing Reviews
Today's Issue Hot Topics Search Browse Recommended My Account Log In
Review Help
Search
Distributed strategies for mining outliers in large data sets
Angiulli F., Basta S., Lodi S., Sartori C. IEEE Transactions on Knowledge and Data Engineering25 (7):1520-1532,2013.Type:Article
Date Reviewed: Oct 21 2013

The outlier detection problem is a well-studied data mining task where the goal is to isolate data points that are significantly different from the majority of data points in a dataset. The solution to this problem plays an important role in applications in many different fields where the automated detection of anomalies in large datasets is critical.

This paper builds on previous work that introduced a sequential outlier detection algorithm called the solving set [1]. In the solving set algorithm, the solving set S is a learned model that can be understood as a compressed representation of a complete dataset D. The set S contains “a sufficient number of objects from D to allow considering only the distances” between the data point pairs in S and D “to obtain the top-n outliers.”

The current paper extends the solving set to a distributed setting. A distributed node, after receiving the current solving set together with the current lower bound for the distance of the nth outlier from a central site, will compare them with the local objects and look for local outliers.

The authors show that their new distributed algorithm demonstrates excellent performance. Experiments further show that the computed solving set has the same quality as a sequentially computed solving set. The authors also introduce a “lazy” variation of their distributed algorithm that reduces the amount of data that must be transferred over the network and achieves some performance improvements.

This is a well-written and easy-to-follow paper. I recommend it to researchers and practitioners interested in outlier detection in large datasets.

Reviewer:  Burkhard Englert Review #: CR141651 (1401-0092)
1) Angiulli, F.; Basta, S.; Pizzuti, C. Distance-based detection and prediction of outliers. IEEE Transactions on Knowledge and Data Engineering 18, 2(2006), 145–160.
Bookmark and Share
  Featured Reviewer  
 
Data Mining (H.2.8 ... )
 
 
Arrays (E.1 ... )
 
 
Distributed Databases (H.2.4 ... )
 
 
Distributed Databases (C.2.4 ... )
 
Would you recommend this review?
yes
no
Other reviews under "Data Mining": Date
Feature selection and effective classifiers
Deogun J. (ed), Choubey S., Raghavan V. (ed), Sever H. (ed) Journal of the American Society for Information Science 49(5): 423-434, 1998. Type: Article
May 1 1999
Rule induction with extension matrices
Wu X. (ed) Journal of the American Society for Information Science 49(5): 435-454, 1998. Type: Article
Jul 1 1998
Predictive data mining
Weiss S., Indurkhya N., Morgan Kaufmann Publishers Inc., San Francisco, CA, 1998. Type: Book (9781558604032)
Feb 1 1999
more...

E-Mail This Printer-Friendly
Send Your Comments
Contact Us
Reproduction in whole or in part without permission is prohibited.   Copyright 1999-2024 ThinkLoud®
Terms of Use
| Privacy Policy