Transforming images from one image space to another is a fundamental problem in computer vision. Since natural imaging processes are too sophisticated to be modeled precisely, the conventional methods have focused on learning a statistical function to model the transformation. This paper follows the same track, but offers a speed improvement with its novel algorithms.
The authors thoroughly discuss two techniques--coupled dictionary learning and partitioning in the high-dimensional sparse feature space--to support the contributions. I think the former is more innovative; experts should read about it.
The key observation is that sparse coding of the source and target images alone will not introduce much efficiency. However, if the 0-1 patterns in the coding are made similar, significant improvement follows. In this implementation, both source and target images are sparse coded at the same time with a penalty term added to enforce the similarity of the resulting coding coefficients in terms of their non-zero entry positions. Thus, the dimension of the coefficients is implicitly reduced to only contain those corresponding to the non-zero entries. The authors relate this technique to a nonlinear feature selection process. They also report that this technique is able to reduce noise and recover missing data.
The second technique, involving space partitioning, reinforces this idea by constraining groups of similar feature pairs to provide enough within-class samples for regression. Examples of intrinsic image estimation and image super-resolution are used to test the efficacy of the framework.
The results convince me that the sacrifice of prediction errors during coupled dictionary learning (compared to traditional dictionary learning) doesn’t hurt the quality as a consequence. Considering the efficiency gain, more attention should be given to this work to test other heavy-duty vision applications.