Computing Reviews
Today's Issue Hot Topics Search Browse Recommended My Account Log In
Review Help
Search
Bootstrapping privacy compliance in big data systems
Sen S., Guha S., Datta A., Rajamani S., Tsai J., Wing J.  SP 2014 (Proceedings of the 2014 IEEE Symposium on Security and Privacy, May 18-21, 2014)327-342.2014.Type:Proceedings
Date Reviewed: Oct 27 2014

In this paper, researchers from Carnegie Mellon and Microsoft Research present their “experience building and operating a system to automate privacy policy compliance checking in Bing.” The claim is impressive: “The system, bootstrapped by a small team, checks compliance daily of millions of lines of ever-changing source code written by several thousand developers.” Later on, we discover that it is limited to MapReduce applications. This is in fact for a good, and fundamental, reason, as it bypasses the usual information flow analysis problem wherein everything rapidly depends on everything. Indeed, it is not clear how this methodology could be adapted to more general programming.

As this paper points out, current compliance methods for companies processing personal data do not scale well. They quote Google StreetView as an example of a violation, but do not show how their methodology could have caught this case. The major components are Legalease, a language for writing policy statements, a “self-bootstrapping data inventory mapper” called Grok, and a policy checker that verifies the output of Grok against a policy written in Legalease. The programs seem pleasantly short: Grok is just over 7000 lines and the policy checker is 652 lines.

The authors admit that Grok is based on heuristics, but some, such as “jobs for purpose AbuseDetect ... run (only) by the AbuseTeam and vice versa,” seem somewhat dubious. It looks as if we have role-based access control by the back door, with roles being “populated from the organizational directory service.” There are also limitations on Legalease: it doesn’t do temporal logic, which means that the privacy policy, which says “we delete the information collected through the Bing Bar Experience Program at 18 months,” merely translates into a clause on “:Expired,” with the expiry mechanism being outside this project’s scope. One big plus of this project is that it allows “what if” analysis on policy changes, which is impracticable otherwise.

This approach to privacy reminds me of Dr. Johnson: it is “like a dog walking on his hind legs. It is not done well; but you are surprised to find it done at all.” It is a sad commentary on the state of the average programmer’s concern for privacy that the authors can rely on “good coding practices enforced rigorously in engineering teams through code reviews” as part of their inference process, but feel that privacy can only be verified through a totally separate mechanism.

Reviewer:  J. H. Davenport Review #: CR142859 (1501-0103)
Bookmark and Share
  Reviewer Selected
Featured Reviewer
 
 
Public Policy Issues (K.4.1 )
 
 
Types Of Systems (H.4.2 )
 
 
Models And Principles (H.1 )
 
Would you recommend this review?
yes
no
Other reviews under "Public Policy Issues": Date
The United States vs. Craig Neidorf
Denning D. Communications of the ACM 34(3): 22-43, 1991. Type: Article
Aug 1 1991
Targeting the computer: government support and international competition
Flamm K., The Brookings Institution, Washington, DC, 1987. Type: Book (9789780815728528)
Mar 1 1988
Datawars: the politics of modeling in federal policymaking
Kraemer K. (ed), Dickhoven S., Tierney S., King J., Columbia University Press, New York, NY, 1987. Type: Book (9789780231062046)
Mar 1 1988
more...

E-Mail This Printer-Friendly
Send Your Comments
Contact Us
Reproduction in whole or in part without permission is prohibited.   Copyright 1999-2024 ThinkLoud®
Terms of Use
| Privacy Policy