Image processing algorithms are used to test a variety of microprocessors on their ability to perform real-time tasks within a video cycle. The microprocessors evaluated include a conventional general-purpose M68020, a four-connected network processor (the transputer T800) and a number of special-purpose digital signal processors (DSPs)--the TMS320C25, TMS320C30, M56000, and NEC’s &mgr;PD77230. The M68020 and transputer are included for comparison with the DSPs. The image processing (IP) tasks considered comprise the simplest class of point operations (histogramming and thresholding), single-pass 3×3 neighborhood operations (Laplacian and Sobel edge detection), an iterative 3×3 morphological operation (thinning), and a domain transform (Hough). Flow diagrams for the IP tasks are given along with timing results measured or estimated for these tasks. The benchmark setup is presented in a clear and concise way.
The authors argue that, except for the number of elementary instructions available and the cycle time, the special-purpose DSPs do not differ in throughput. All DSPs have instructions like integer ADD, SUB, MULT, LOAD, and STORE. Some also have DIV and floating-point versions of the integer instructions.
The authors conclude that for 256×256 images, only the simplest IP tasks can be performed by a DSP within a video cycle. For Laplacian, Sobel, and threshold operations to finish, one has to lower the resolution to 128×128. Both DSPs and transputers can be connected to divide the IP task; in this case the transputer network presents a flexible but expensive solution, reflecting the disappointing performance of a single transputer.
Although this paper’s title starts with benchmarking, I find this an overstatement, since most of the findings are based on paper calculations. Only two processors--one DSP and one transputer--actually ran the IP tasks. For readers with enough basic image processing knowledge, the authors present enough detail to check the results for the other processors if needed. Although the references seem a bit biased toward research papers written by the authors, they mention relevant multiprocessor implementations of DSP and transputer systems.
It would have been helpful if Table 2 had presented relative performance figures, dividing the execution times by the fastest in the row. When we do this calculation, the overall performance figures become 2.4 for the TMS320C25, 1.04 for the TMS320C30, 1.45 for the M56000, 3.6 for the NEC, 11 for the T800, and 8.1 for the M68020. If one also corrects for differences in cycle time, the M56000’s architecture is a clear winner and the T800 a clear loser.