Developed for the US Census Bureau, the Parallel Automated Coding Expert (PACE) uses an empirical learning model called memory-based reasoning (MBR). It aims to replace the Automated Industry and Occupation Coding System (AIOCS) developed for the 1990 census. PACE was easier to develop (four person-months versus 192 for AIOCS); shows higher performance (60 percent versus 47 percent accuracy); matches parallel computer hardware and programming models; and reduces clerical workloads by 60 percent.
Using 132,247 preclassified returns as a training database, PACE exhibits a 54 percent improvement over the expert system in occupation codes and a 10 percent improvement for industry codes. The authors admit that using canonical forms does not bring free-form responses into 100 percent parity with human levels of accuracy. Despite refinements in MBR metrics, increases in database size, and tradeoffs in setting referral thresholds in order to reach acceptable confidence levels, low-frequency token-types seriously limit system accuracy.
The project uses an 8192-processor CM2 with floating-point enhancement and a Sun 4/280 front end. For category and cross-category weightings of the more than 4.5 million feature tokens by segment and type, a parallel weight algorithm was developed. Unlike standard expert systems, MBR “knows when it knows” (p. 62). PACE, with MBR, clearly provides a process of progressive classification with results and efficiencies superior to the AIOCS system. Yet coexistence of processing accuracy and total coverage of census returns remains elusive.