Information retrieval encompasses a wide variety of different research efforts pertaining to such highly business-oriented topics as knowledge management in companies. Nonetheless, the fundamental concepts include recall and precision when searching for information. Recall indicates how many relevant items were retrieved, compared to the total number of existing relevant items, whereas precision describes how many of the retrieved items are relevant.
The National Institute of Standards and Technology organized a number of experiments as part of the Text Retrieval Conference (TREC) that allow researchers to compare different algorithms and concepts with regard to recall and precision. Participants in TREC-6 have to process 50 queries, which they do not know in advance, based on a text corpus of 2GB. The competing algorithms each return the 1000 documents they deem to be relevant. Every algorithm can be run more than once. The top 100 documents of every run (this is the pool depth) are judged in terms of recall and precision.
Even if systems retrieve different documents, chances seem to be high that almost all relevant documents can be retrieved with a pool depth of 100. Keenan et al. discuss whether different pool depths would change anything for a single run, as opposed to how many relevant documents can be found in several runs. A good run, retrieving better results, should be continued longer than a bad run. The authors came to two conclusions: a pool depth of 100 is appropriate, as systems will probably have found all relevant documents they will ever find; and good systems can be recognized by looking at shorter runs in which they outperform weak systems.