Object recognition is one of the key features of most of the computer vision applications performing object tracking in images. This becomes troublesome when these images accompany scene reflections leading to mixed images. This paper handles mixed images acquired by mobile cameras.
The authors define the tracking scheme in six steps: (1) the prediction stage of RGB color-cue-based estimation draws samples from the importance prior density; (2) the correction stage of RGB color-cue-based estimation updates the importance weight; (3) the prediction stage of the motion-cue-based estimation draws samples from the importance prior density; (4) the correction stage of motion-cue-based estimation updates the importance weight; (5) the weight of each particle is optimized using maximum likelihood; and (6) the target state is estimated.
The authors perform the experiments in two parts. In the first part, they use eight videos: seven videos with reflections recorded by NCU VC-Lab and one video without reflection from Dataset 5 of 2001 PETS. They further compare the proposed method with the fast L1 tracker and robust tracking methods. In the second part, the authors compare the proposed method with the L1 tracker using accelerated proximal gradient, locally orderless tracking, and tracking via multi-task sparse learning. They also use five videos with reflections and visual tracker benchmark videos without reflections, featuring eight attributes: illumination variation, scale variation, occlusion, deformation, motion blur, in-plane rotation, out-of-plane rotation, and background clutter.
This method combines co-inference and maximum likelihood to fuse RGB color and motion cues, thus improving tracking accuracy; this makes the paper interesting and useful for those who are working to develop object-tracking schemes. In the future, the authors intend to extend the single object-tracking scheme to a multi-target tracking scheme using images from a moving camera.