While the title of this paper suggests the use of traditional expert systems technology, the research reported here concerns how to make predictions when a team of experts is available and their predictions are known.
In particular, the authors define an algorithm to solve the following prediction problem: We want to predict whether an event will occur, and we may use the predictions of a finite set of experts; each expert can see a sequence of measurements, and gives his or her opinion on the probability of the event as a real number between 0 and 1. The problem is to build an algorithm that uses the experts’ advice to give the best possible estimate. This requires minimization of the loss, where loss is defined as the absolute value of the difference between the predicted number and the real outcome.
The authors define a family of algorithms, some of them taken from the literature, and study their performance. In particular, the optimal solution algorithms are discussed, and their lower and upper bounds are demonstrated. Since the optimal algorithms have complexity exponential in the number of experts, the authors provide other algorithms to generate the prediction, and demonstrate their lower and upper bounds.
The algorithm P is based on computing and upgrading a nonnegative weight for each expert that reflects the expert’s prediction ability. A parameter controls when to drop poorly performing experts. The choice of the parameter uses a priori knowledge. An iterative variation P* can make predictions without a priori knowledge of the best expert’s loss. In the last part of the paper, the authors discuss an application of the algorithm in pattern recognition, where the problem is to predict the label (0 or 1) of examples taken at random from an arbitrary distribution. The authors compare their solution with other, similar results. The comparison indicates that the new approach is better.
The proofs of the main theorems are difficult and require a few lemmas. This paper is important for everyone working in the field. It covers almost every aspect of a loss function that minimizes the number of mistakes and can use any kind of expert output.