Park et al. offer a unique combination of nonnegative matrix factorization (NMF) and fuzzy relations to produce a better document clustering algorithm. They note that clustering is useful for document organization, automatic summarization, topic extraction, and information filtering or retrieval. They also note that their algorithm “can extract important cluster label terms [...] using semantic features” via NMF, and “can remove the dissimilar documents [in clusters] using [a] fuzzy relation between semantic features and document terms.”
After showing how NMF and fuzzy relations work, the authors begin describing their algorithm, which employs preprocessing to remove stop words and to stem the remaining terms in a document set. They then use NMF to manipulate the document-term matrix in order to generate cluster label terms. Finally, they perform document clustering using a fuzzy relation.
The authors use a standard test database of documents to measure the performance of their algorithm against other clustering mechanisms. The results show that their algorithm offers improved performance. I wish, however, that they had discussed in more detail the normalized mutual information metric that they employ.