Computing Reviews
Today's Issue Hot Topics Search Browse Recommended My Account Log In
Review Help
Search
Learning k for kNN classification
Zhang S., Li X., Zong M., Zhu X., Cheng D. ACM Transactions on Intelligent Systems and Technology8 (3):1-19,2017.Type:Article
Date Reviewed: Mar 27 2018

This paper aims to improve the performance of the k-nearest neighbors (kNN) approach and its variants in three data mining applications--classification, regression, and missing data imputation--by proposing an approach called the correlation matrix kNN (CM-kNN). The paper is well organized and easy to follow. The motivation is clear and the proposed approach has a sound mathematics derivation.

The standard kNN approach uses the same value of k for all the test data points, which usually leads to low prediction accuracy in classification applications. In contrast, it is interesting that in this paper, CM-kNN uses different values of k for different test data points, and reconstructs the test data points from training data points to learn a correlation matrix between the training data points and test data points; various techniques have been applied to improve the accuracy.

The least-squares error is used as the loss function to minimize the reconstruction error for reconstructing each test data point from all training data points. An l1-norm regularization term is added to the loss function in order to maintain the sparsity in the reconstruction process. In addition, an l2,1-norm regularization term is added to the loss function to remove the impact of noisy training data points, which are irrelevant to all test data points in the reconstruction process. This also helps to maintain sparsity in the reconstruction process. Moreover, a graph Laplacian-based LPP regularization term is added to the loss function to preserve the local consistency of structures of the training data points during the reconstruction process. Although the utilities of l1-norm, l2,1-norm, and LPP norm have been proven in previous work, their use in the context of the paper is novel. Finally, for the loss function, they propose an optimization based on the iterative reweighted least-squares optimization method, and also a corresponding algorithm; then, they prove the correctness of the algorithm.

Extensive experiments have been conducted on ten different types of data on the three data mining applications mentioned above. The conducted experiments are sufficient and the results are persuasive. In particular, missing data is randomly selected for a missing data imputation experiment. Prediction accuracy is used as the performance measure of classification, and correlation coefficient and root-mean-square error are used as the performance measures of regression and missing data imputation.

Reviewer:  Kam-Yiu Lam Review #: CR145936 (1806-0329)
Bookmark and Share
 
Data Mining (H.2.8 ... )
 
 
Classifier Design And Evaluation (I.5.2 ... )
 
 
Real-Time And Embedded Systems (C.3 ... )
 
Would you recommend this review?
yes
no
Other reviews under "Data Mining": Date
Feature selection and effective classifiers
Deogun J. (ed), Choubey S., Raghavan V. (ed), Sever H. (ed) Journal of the American Society for Information Science 49(5): 423-434, 1998. Type: Article
May 1 1999
Rule induction with extension matrices
Wu X. (ed) Journal of the American Society for Information Science 49(5): 435-454, 1998. Type: Article
Jul 1 1998
Predictive data mining
Weiss S., Indurkhya N., Morgan Kaufmann Publishers Inc., San Francisco, CA, 1998. Type: Book (9781558604032)
Feb 1 1999
more...

E-Mail This Printer-Friendly
Send Your Comments
Contact Us
Reproduction in whole or in part without permission is prohibited.   Copyright 1999-2024 ThinkLoud®
Terms of Use
| Privacy Policy