This is a report on one of the many results out of the Reconfigurable Computing Cluster (RCC) project at the University of North Carolina at Charlotte. Other results from the lab are extensively referred to throughout the paper. Here, the authors investigate the feasibility of building very large parallel systems using field-programmable gate arrays (FPGAs) as building blocks. In particular, they investigate the scalability of one specific input/output (I/O) bound application called BLASTn. They test their implementation of BLAST on a cluster using configurations varying from one node up to 32 nodes (each node contains 16 FPGA cores). Nodes are physically interconnected with bidirectional high-speed serial lines and, for the experiments described in the paper, were configured as a torus.
The idea of the project is to replicate the same FPGA programming on each of the multiple cores, thus minimizing development cost. The initial performance results presented look good, with one FPGA node performing multiple times better than the equivalent implementation using general-purpose processors. The main goal of the experiment, however, is scalability. Tests were repeated with one node, four nodes, and so on, up to 32 nodes. Memory contention is suggested as the reason for the poor scalability obtained, at most 30 percent for some queries, for the highest point (32 nodes compared to one node).
In short, these experiments describe a first attempt at a complete solution using previous results from the RCC as building blocks.