The current trend on the Web is for people to share various types of personal data (such as blogs, tweets, pictures, and videos) with other people. This huge amount of personal content provides new opportunities for researchers. Effortless access to this data, for various purposes and under different classification schemes, is an appealing research avenue that has practical implications.
The authors studied text-based video content classification. For this purpose, they exploited user-generated text data--for example, comments, descriptions, and titles--to use as video surrogates. Their work is inspired by authorship attribution studies, and they employed various text-based (lexical, syntactic, and content-specific) features. Their experiments consider different combinations of text-based features in order to find the best possible alternative that provides the highest classification accuracy. For classification, they experimented with three different techniques: C4.5, a decision-tree-building algorithm; naive Bayes, a probabilistic classifier; and the support vector machine (SVM), a statistical machine learning method.
As a test bed, the authors used data obtained from YouTube. They achieved more accurate results with the SVM technique when they combined all of the feature sets. The authors’ findings are supported with statistical tests, and they provide additional experiments on video classification for community detection and social network analysis. In summary, the paper provides a broad literature review, flows nicely, and presents interesting results.