This paper presents a linear hashing schema for files that grow and shrink dynamically. It is written in a concise and clear style and describes a nice new algorithm.
The author first reminds us of existing linear hashing algorithms. In linear hashing schemes, pages are split in a predetermined sequence. To avoid uneven distributions of the records in the file, one can use Linear Hashing with Partial Expansions (LHPE) schemes. The cost of such expansions strongly depends on the cost of searching all the records addressed to a home page, which should then be minimized.
The author suggests an open-addressing method to handle overflow and describes a possible predetermined sequence that decreases the length of probe sequences. The pages being numbered, the overflow records of a full page are directed to the next page of higher number that is not full. To avoid clustering of full pages, one does not split pages in increasing order but five groups are created, which are split in turn. These groups are formed of buckets evenly distributed over the whole file (i.e., with numbering increasing by five). Moreover, as records tend to cluster on pages with higher numbers, the buckets in the groups are split in decreasing order. Retrieval, insection, and deletion are done as in linear probling. A specific problem appears when one splits a bucket. One locates all the records of the associated probe sequence and eventually moves them to the new bucket. This may leave holes in the probe sequences of the next buckets. Thus, they are copied in some “record pool,” reorganized, and eventually some overflow records are moved closer to their home bucket.
Finally, performances obtained by simulations are discussed. They are typical of both linear hashing and linear probing schemes. In particular, performances are cyclic and deteriorate rapidly when the load factor is greater than some threshold, say 0.80-0.85. It appears that the introduction of five groups and the backwards split order makes this algorithm better than the previous ones. And the operatons are greatly speeded up by using more buffer space.