Computing Reviews, the leading online review service for computing literature.

Search

Document clustering method using dimension reduction and support vector clustering to overcome sparseness
Jun S., Park S., Jang D. Expert Systems with Applications: An International Journal41 (7):3204-3212,2014.Type:Article

Date Reviewed: Sep 19 2014

In this paper, the authors aim to address three problems associated with document clustering: determining the number of clusters, structuring the collection description matrix into a form suitable for statistical analysis, and overcoming the collection description matrix sparseness problem. For determining the number of clusters, they employ support vector clustering (SVC) and a measure called Silhouette. To overcome sparseness and make data more suitable for statistical analysis, they combine singular value decomposition (SVD) and principal component analysis (PCA). The authors perform experiments using two document collections: a set of 159 news articles, and 98 patent documents. In the first set of experiments, the goal is to show the efficacy of their approach. In patent data tests, their aim is measuring the success of their method in predicting research and development trends. The results of the experiments are inconclusive. In both cases, the experimental collections are too small. In the trend analysis, the authors hypothesize and show that, in a research field with a small number of patents, it is expected that there would be a greater number of patents in later years. The authors provide only one observation to support their claim. This paper would have been better if they had provided several observations with more data covering a wider time window.

Reviewer: F. Can	Review #: CR142734 (1412-1096)

Document And Text Processing (I.7 )

Clustering (H.3.3 ... )

Database Applications (H.2.8 )

Systems (H.2.4 )

Would you recommend this review?

yes

Other reviews under "Document And Text Processing":	Date

Text retrieval from early printed books Marinai S. International Journal on Document Analysis and Recognition 14(2): 117-129, 2011. Type: Article	Sep 29 2011

Handbook of document image processing and recognition Doermann D., Tombre K., Springer Publishing Company, Incorporated, New York, NY, 2014. 1055, Type: Book (978-0-857298-58-4)	Oct 15 2014

Path-based methods on categorical structures for conceptual representation of Wikipedia articles Kucharczyk Ł., Szymański J. Journal of Intelligent Information Systems 48(2): 309-327, 2017. Type: Article	Nov 3 2017

Reproduction in whole or in part without permission is prohibited. Copyright 1999-2024 ThinkLoud^®
Terms of Use | Privacy Policy