Text-independent writer identification (writer ID) has been studied for many years, and more and more techniques are now proposed to deal with various scripts and writing conditions. In general, there are two types of feature extraction: histogram-based statistical features that capture local slope information along the contours of handwritten blobs, and grapheme-based codebook features that capture local shape similarity between handwritten primitives.
This paper proposes to use speeded-up robust features (SURF) to describe scale and local orientation within each word region. Following a number of pre-processing techniques such as text line segmentation, binarization, word segmentation, and overlapping words splitting, the authors used the EM algorithm to cluster SURF descriptors into N categories and use them as a codebook for follow-up feature extraction. The feature extraction consists of two parts: SURF descriptor signatures and histogram features. A SURF signature is essentially a normalized vector with each element representing the Euclidean distance between the descriptor and each codebook component. Histogram features count the number of SURF descriptors in a histogram that is based on X octaves and Y sublevels in each octave, thus giving (X * Y) scales by using SURF. However, the authors did not specify the dimensionality of the histogram features.
The experimental evaluation is extensive and impressive. The authors evaluated their approach on eight public datasets including five English ones, one Chinese, and two hybrid ones (ICDAR 11 and ICFHR 12 competition datasets). Using both hard and soft measures, the authors compared their approach with others in the literature and showed impressive absolute gains. Although these absolute gains seemed substantial, it is still scientific to see some statistical significance tests on all of the claimed gains. Anyway, I think people focusing on writer ID will be interested in looking into the proposed feature extraction approach.
From a completely different perspective, I am a bit concerned about the sequence of pre-processing steps, each of which modifies the original image without preserving necessary information for image recovery or integrity check. For example, binarization is expected to modify handwritten strokes, which might be critical for extracting local features for writer ID. It would be interesting to see how the propagation of algorithmic errors in word segmentation impacts the follow-up feature extraction.