The core problem of information filtering (IF) is the same as the core problem of information retrieval (IR): identifying documents relevant to a user’s request. However, IF is distinguished from IR by the application of a standing request to a document stream over time. The work described in this paper attacks this core problem by decomposing it into two more manageable ones: identifying the class to which a document belongs and identifying classes relevant to the user request. It is assumed that both the characteristics of the document stream and the user’s interests (as indicated by relevance judgments) may change over time.
Both a formal model and a specific implementation are presented. A heuristic clustering algorithm is used to determine the classes; incoming documents are assigned to a single class. User profiles may be user-specified or learned by the system from the user’s relevance judgments; they are adjusted automatically or manually as user interests change. Extensive experimental results explore the system parameter space; most of the experiments rely on simulation, although some correspondence between the simulated results and those obtained from work with a small number of actual users is shown.
The paper is clear and well organized; it should interest researchers and developers in this area. However, it does not compare the system’s performance to that of other information filtering systems.