In this paper, the authors aim to address three problems associated with document clustering: determining the number of clusters, structuring the collection description matrix into a form suitable for statistical analysis, and overcoming the collection description matrix sparseness problem. For determining the number of clusters, they employ support vector clustering (SVC) and a measure called Silhouette. To overcome sparseness and make data more suitable for statistical analysis, they combine singular value decomposition (SVD) and principal component analysis (PCA).
The authors perform experiments using two document collections: a set of 159 news articles, and 98 patent documents. In the first set of experiments, the goal is to show the efficacy of their approach. In patent data tests, their aim is measuring the success of their method in predicting research and development trends.
The results of the experiments are inconclusive. In both cases, the experimental collections are too small. In the trend analysis, the authors hypothesize and show that, in a research field with a small number of patents, it is expected that there would be a greater number of patents in later years. The authors provide only one observation to support their claim. This paper would have been better if they had provided several observations with more data covering a wider time window.