Random testing really works. On Mac OS X, seven percent of 135 command-line utilities and 73 percent of 30 graphical user interface (GUI)-based applications were found to crash or hang under random testing using the freely available tools fuzz, ptyjig, and fuzz-aqua. In command-line utility testing, 24 files of random characters were generated by permuting fuzz tool options: files either did or did not include null characters, either did or did not include nonprintable characters, and were either of size 1,000, 10,000, or 100,000 characters. In GUI-based application testing, typically 100,000 random user-input events were generated using the fuzz-aqua tool. Options with this tool included the setting of the delay between events and the blocking of the sending of input events that might, for example, log out the current user. Among the root causes of failure were, sadly, some familiar culprits: failure to check return values, null-pointer dereferences, and array buffer overflows.
Some minor details are unclear in this paper. Was the randomness employed fully controllable? Could the same tests be repeatedly generated and executed with the same results for both command-line utilities and GUI-based applications? Without explanation, no root cause analysis appears to have been undertaken for the utilities zsh and indent, which crashed.
There is no disputing the fact that random testing found serious defects, that GUI reliability was found to be getting worse, and that random testing should be part of any arsenal deployed when software reliability is of paramount importance. As such, this paper is strongly recommended to the software engineering community.