Computing Reviews
Today's Issue Hot Topics Search Browse Recommended My Account Log In
Review Help
Search
Document clustering method using dimension reduction and support vector clustering to overcome sparseness
Jun S., Park S., Jang D. Expert Systems with Applications: An International Journal41 (7):3204-3212,2014.Type:Article
Date Reviewed: Sep 19 2014

In this paper, the authors aim to address three problems associated with document clustering: determining the number of clusters, structuring the collection description matrix into a form suitable for statistical analysis, and overcoming the collection description matrix sparseness problem. For determining the number of clusters, they employ support vector clustering (SVC) and a measure called Silhouette. To overcome sparseness and make data more suitable for statistical analysis, they combine singular value decomposition (SVD) and principal component analysis (PCA).

The authors perform experiments using two document collections: a set of 159 news articles, and 98 patent documents. In the first set of experiments, the goal is to show the efficacy of their approach. In patent data tests, their aim is measuring the success of their method in predicting research and development trends.

The results of the experiments are inconclusive. In both cases, the experimental collections are too small. In the trend analysis, the authors hypothesize and show that, in a research field with a small number of patents, it is expected that there would be a greater number of patents in later years. The authors provide only one observation to support their claim. This paper would have been better if they had provided several observations with more data covering a wider time window.

Reviewer:  F. Can Review #: CR142734 (1412-1096)
Bookmark and Share
 
Document And Text Processing (I.7 )
 
 
Clustering (H.3.3 ... )
 
 
Database Applications (H.2.8 )
 
 
Systems (H.2.4 )
 
Would you recommend this review?
yes
no
Other reviews under "Document And Text Processing": Date
Text retrieval from early printed books
Marinai S. International Journal on Document Analysis and Recognition 14(2): 117-129, 2011. Type: Article
Sep 29 2011
Handbook of document image processing and recognition
Doermann D., Tombre K., Springer Publishing Company, Incorporated, New York, NY, 2014.  1055, Type: Book (978-0-857298-58-4)
Oct 15 2014
Path-based methods on categorical structures for conceptual representation of Wikipedia articles
Kucharczyk Ł., Szymański J. Journal of Intelligent Information Systems 48(2): 309-327, 2017. Type: Article
Nov 3 2017

E-Mail This Printer-Friendly
Send Your Comments
Contact Us
Reproduction in whole or in part without permission is prohibited.   Copyright 1999-2024 ThinkLoud®
Terms of Use
| Privacy Policy