As required in many computer vision problems, local descriptor construction has been drawing research interest for decades, especially in situations where local image structures can be detected repeatedly, invariant to a range of image transformations. Specifically, researchers are trying to find discriminative and compact feature descriptors for object detection and matching. So far, scale-invariant feature transform (SIFT) and its descendant speeded up robust features (SURF) have been widely used, but they require expensive computations.
In this paper, the authors propose using local difference binary (LDB) descriptors that are based on average intensity and first-order gradients, dx and dy. By dividing the image patch into different sizes of grids, the authors construct the LDB descriptors on different scales. Next, they employ an AdaBoost variant approach to select the most effective subsets of bits from the descriptor. Finally, such bit descriptors are used for patch matching that is evaluated on multiple public datasets. The experimental evaluation shows that LDB descriptors are not only discriminative for matching, but also inexpensive to compute. It’s interesting to see that these descriptors can be computed quickly (in approximately 0.1 milliseconds (ms)) on a mobile device, too.
Several thoughts on a higher-level picture:
- The proposed method decides the grid size heuristically and compresses the bits by selecting a subset from it. How about employing a learning approach for deciding the grid size by computing some statistics quickly? This is one way to adapt the grid size, and it might save the effort of selecting a subset of bits.
- The LDB descriptors seem to work well on patch-based natural scene datasets, but what about other datasets, such as face, handwritten digits, and so on?
- As a matching evaluation, some readers might be interested in seeing the top one performance in addition to the proposed top ten performances, to get a better idea of how well it works.