The fast addition of integers is important in many computing applications, such as selection, sorting, graph problems, and image processing, to name a few. As such, there is a large body of work on serial and parallel algorithms for addition, both bit-wise and word-wise. This paper presents some new material on implementations of fast parallel addition.
The model the authors use is a reconfigurable mesh (RM), an array of processors connected in a two-dimensional grid, with a dynamically reconfigurable bus system. Adjacent processors are connected, and a processor can be dynamically connected or disconnected locally during program execution. The paper also describes a very large scale integration (VLSI) reconfigurable circuit (VLSI RC) with a configurable switch.
The introduction contains a table that summarizes and compares the sizes of RMs and the computing times for algorithms to add n binary values or n d-bit integers. These values for known algorithms are listed and compared with those for the new algorithms presented in the paper.
The second section defines RM formally, explains VLSI RC, and describes the formats used in the paper for representing integers. The third section explains the basic algorithms derived in the paper, first for one-dimensional RM. The complexity results are stated as lemmas. Then integer summing of n d-bit integers using n lookahead carry generators is described. This approach performs this computation in constant time using 2 n × 2 d n RM. The next subsection computes the sum of n binary values in O ( log n &slash; &sqrt;m ) time given an m × n RM, using a method called prefix remainder computation.
The fourth section describes the sum of n binary values on a &sqrt;m n × &sqrt;n RM and then uses this to compute the sum of n d-bit integers on a &sqrt;n m × d &sqrt;n RM. The authors describe a technique called snake-like embedding. The complexity is shown to be O ( log* n - log* m ) , where 1 ≤ m ≤ log n . This is followed by an algorithm to add n d-bit integers using the two-stage summing method. It is shown to have the same complexity as above.
Section 5 describes an algorithm for calculating the sum of n d-bit integers, given one for each processor, in O ( log d + log* n ) time on a &sqrt;n × &sqrt;n RM of the word model.
Section 6 describes VLSI RC for summing integers. It shows the implementation and the results obtained in terms of complexity measures.
The paper will be of interest to designers and to algorithm specialists. It is theoretical but is well written and amply illustrated. I recommend it.