Web applications interact with an uncontrolled and arbitrarily hostile environment. One common attack in such environments involves submitting what is supposed to be harmless data, such as a customer name; however, that data actually contains commands that can be executed by the Web application or its supporting system. Such attacks are known as SQL spoofing, cross-site scripting, and command injection.
Static analysis can determine a Web application’s vulnerability to tainted-data attacks by examining data flow through the application, making sure possibly tainted environmental data is sanitized before it is used. The dynamic languages used to implement Web applications complicate analysis by obscuring necessary type, data access, and control information.
This paper shows how flow-sensitive, interprocedural, and context-sensitive data flow analyses combine to provide effective analysis of applications written in the PHP dynamic language. The analysis hinges on alias analysis to determine the various names by which data are referenced. The authors implemented their analyses in Pixy, a tool that analyzes Web PHP applications. In experiments on seven open-source PHP programs, Pixy found hundreds of new vulnerabilities at high speed, and with variable but usually low false-positive rates.
This work fits naturally in static analysis, although knowledge of static analysis isn’t required. The emphasis on alias analysis and a PHP orientation gives this work its novelty and interest. The paper is easy to read, requiring only general language analysis knowledge, and complete, including pseudocode versions of the alias analysis algorithms as an appendix.