A visual or image tracking method for videos based on sparse distributed representations described in terms of general basis functions (GBF) extracted from image patches is presented in this paper. The GBFs are extracted as features, using an algorithm based on independent component analysis (ICA) on a large set of natural image patches. The authors then model the appearance of the tracked target as the probability distribution of these features. The proposed method has three steps: (1) GBF extraction, using image patches and ICA; (2) targeted representation using selected features based on entropy, and the computation of the probability distribution of the features; and (3) targeted search to minimize the Matusita distance between the distribution of the target model and the candidate. As this method uses a sparse representation and local features, it is robust to partial occlusion. It is also robust to camouflage environments, pose changes, and illumination changes, as demonstrated by the authors in the experiments (this is adequate for usual video surveillance applications).
The authors compare their approach to the mean shift tracker, the L1 tracker, and the BH tracker, with better results. I would have liked to see them adopt a standard benchmark database. They might also have compared their method with other visual tracking methods such as SIFT or SURF, or even with the improved methods described by Yang et al. [1], whose research was referenced in this paper. I wish the authors had also discussed and evaluated the robustness and invariance of the proposed approach during significant background changes and general scene translations, rotations, and scaling.