Suppose a delivery truck visits the same customers every day, but traffic conditions are unknown and vary. What is the best route for the truck to take? This is one (briefly mentioned) example of the kind of problem addressed by this paper.
Specifically, it assumes that a player plays repeatedly against an opponent. The cost of each round of play is to be minimized and is a function of the player’s decision and of a vector of weights chosen by the opponent, but the opponent’s choice is not known. The cost function is assumed to be linear in the opponent’s weight vector. If there exists an algorithm for approximating the best player decision when the history of the opponent’s choices is known, this paper shows how the player can adapt that algorithm to discover an equally approximate best decision when the full history of the opponent’s choices is not known, given only the cost of each round of play. If the weight vector chosen by the opponent is known after each round, the nearly best decision can be found even faster. Thus, this paper significantly extends the set of offline optimization algorithms that can be converted to online optimization algorithms.
The results of this paper will be of interest to anyone studying game theory, as well as optimization theory. The methods used in the proofs may also be of interest to those who study machine learning, as the improved performance of the player in repeated play of a game can be viewed as a form of learning.