Computing Reviews, the leading online review service for computing literature.

Search

Variational Bayesian inference for infinite generalized inverted Dirichlet mixtures with feature selection and its application to clustering
Bdiri T., Bouguila N., Ziou D. Applied Intelligence44 (3):507-525,2016.Type:Article

Date Reviewed: Jun 20 2016

Model-based data analysis is a powerful, increasingly popular tool for inferring knowledge and abstract information from data sets. To date, the model-based learning literature has mostly been dominated by Gaussian mixtures. However, conventional use of Gaussian mixtures may not always be appropriate, especially when the data partitions are not Gaussian. In such a case, as the authors reported in their previous work, “the inverted Dirichlet mixture model and generalized inverted Dirichlet mixture model [may] outperform [Gaussian mixtures] in terms of clustering accuracy.” In this paper, the authors propose a variational Bayesian framework of the infinite generalized inverted Dirichlet mixture with feature selection. They further investigate the capabilities of the framework through computational experiments using both synthetic and real data sets. The reported variational inference framework “is to determine a distribution Q ... that approximates the true posterior distribution” of data. Q is selected from a restrictive family of distributions that can be factorized into disjoint tractable distributions. Although the model is a full Dirichlet process, the variational distribution in calculation is truncated with two variational parameters for the levels of truncation. Through the learning process reported as an algorithm in ten major steps, the two parameters are optimized iteratively. To examine the performance of the proposed approaches, computational experiments are carried out on both synthetic and real-world data. For the synthetic data sets, the algorithm can optimally select the parameters that lead to the actual clusters. Two real-world experiments are performed on visual scene data sets: classification and digit categorization. In both, the algorithm is capable of assigning different weights to features that reflect their significance in the clustering process. The feature selection process brings noticeable improvements in clustering accuracy in both experiments. The authors claim that the proposed algorithm “can be used for any positive data and has promising applications in different areas that have [a] huge amount of data to be clustered and analyzed.” Researchers and practitioners in data science fields, especially with interests in model-based learning and clustering, should benefit from reading this paper. Feature selection strategies and accuracy improvements are critical in analyzing high-dimensional big data. In addition, the proposed approach provides an alternative to the popular Gaussian mixtures in model-based data analysis; however, performance comparisons are not reported in the paper.

Reviewer: Chenyi Hu	Review #: CR144507 (1609-0695)

Feature Evaluation And Selection (I.5.2 ... )

Clustering (I.5.3 )

Would you recommend this review?

yes

Other reviews under "Feature Evaluation And Selection":	Date

Labeled point pattern matching by Delaunay triangulation and maximal cliques Ogawa H. Pattern Recognition 19(1): 35-40, 1986. Type: Article	Feb 1 1988

Features selection and ‘possibility theory’ Di Gesù V., Maccarone M. Pattern Recognition 19(1): 63-72, 1986. Type: Article	Dec 1 1987

An analytic-to-holistic approach for face recognition based on a single frontal view Lam K., Yan H. IEEE Transactions on Pattern Analysis and Machine Intelligence 20(7): 673-686, 1998. Type: Article	Oct 1 1998

more...

Reproduction in whole or in part without permission is prohibited. Copyright 1999-2024 ThinkLoud^®
Terms of Use | Privacy Policy