“Information from quarterly reports or breaking news stories can dramatically affect the share price of a security.” Previous attempts to use machine learning techniques to exploit such information to predict price movements have relied on using a pre-identified set of keywords.
Here, Schumaker and Chen experiment with using other linguistic elements for prediction, specifically bags of words, noun phrases, and named entities. Their system first extracts the attributes from news articles, and then uses various models for prediction: a regression model and “three models [that] use supervised learning of support vector machines (SVM) regression.” The first of these models uses only the terms extracted from the article; the second model uses both “terms and the stock price at the time the article was released”; and the third uses the “terms and a regressed estimate of the [future] stock price.” In all cases, the future meant 20 minutes later. For each of these models, each of the three different entities was used, giving 12 different prediction systems. The experiments were performed using data for the period of October 26th to November 28th, 2005.
The authors found that the second model--the one using terms and the current stock price--performed best in all cases. Noun phrases performed best in predicting direction, whereas named entities gave better results when closeness of prediction was sought.
Schumaker and Chen performed additional experiments, employing a representation that used noun phrases tagged as proper nouns--a hybrid of noun phrases and named entities. This model had the best performance. It seems to be worth exploring the degree to which this insight applies to other systems that analyze text.