An algorithm is presented in this research paper for estimating the pose and motion of objects of interest. The method focuses on delivering motion regions that are connected with the mobile parts of objects, such as humans.
The silhouette of interest and its pose are first determined with Hu moments, and the use of the Mahalanobis distance metric. A timed motion history image is created from a conventional image sequence containing the motion of the silhouette. A simple averaging technique is used to separate the object of interest from the background, and the images are superimposed in such a way that the moving regions are coded with gray values, the lightest corresponding to the most recent motion. From such an image, gradients are computed and thresholded in order to determine the moving region. Segmentation is performed in such a way as to guarantee that the area where the motion is detected corresponds to a moving part of the object.
Recognition of motion is tested on a human subject whose gestures are aimed at controlling a music synthesizer. The detected body motions are successfully translated into the corresponding beats and sounds. This technique has advantages over optical flow in that a complete sequence of motion can be processed from a single timed motion history image, leading to true real-time capabilities. Limitations of the technique include the chosen method of background removal, and the inability to perform a 3D structure computation with the motion output.