Center for Computational Thinking, Carnegie Mellon

PROBE on Improving Privacy via History Independence

Organized by Guy E. Blelloch and Daniel Golovin

The recent privacy MindSwap focused on maintaining privacy when accessing databases, and supporting the implementation of complex privacy policies. The work presented at the MindSwap focused primarily on the essential problem of controlling the release of sensitive information that is stored within the system by design. However, another important privacy problem regards the release of sensitive information that is stored within the system unintentionally. Such information may be maintained in various forms such as documents, code releases, unflushed buffers, data structures, logs, and so on. For example, file formats such as Microsoft DOC and Adobe PDF often store clues to their editing history. Storing such unwanted information can lead to significant violations of privacy and security, and there are many stories in the media concerning government-classified or proprietary information being improperly revealed as a consequence¹.

Such concerns raise deep questions about precisely what information a given system, file, or communication actually reveals. Computation is all about the transmission and manipulation of information, and computational thinking requires the careful consideration of what information is available and how it can be operated on. Despite this, and perhaps surprisingly, the information content of current systems is not well understood. Recent theoretical results [1, 2, 3, 4] on strongly history independent data structures provide a way to address this deficiency; in particular, they suggest a principled way to build efficient computer systems and file formats that provably store exactly the information specified in their designs. Such systems and file formats completely eliminate the possibility that unintentionally stored information is leaked.

Potential applications include

• Filesystems that have the property that deleting a file provably leaves no trace whatsoever that it ever existed, and reveals nothing about what order files where created or last modified or last accessed.
• Databases storing sensitive information that provably reveal nothing about the order of record insertions or about records that have been deleted, nor retain any evidence of previous queries.
• Voting Machines that provably store only the set of participating voters, and nothing about the order in which they voted.
• Web Browsers that support secure browsing sessions after which the browser reverts to the precise state it was in before the session began.
• Web Browsers that support secure browsing sessions after which the browser reverts to the precise state it was in before the session began.
• Advanced File Formats that maintain search indices or other data structures embedded in the file itself to speed up certain operations on the file, yet are strongly history independent and thus provably store only the information specified by their creator via a well specified interface. Already many document, spreadsheet, and presentation file formats are based on the extensible markup language (e.g., OpenXML in DOCX) and incorporate trees. This trend towards more complex file formats is likely to further exacerbate the problem of sensitive information leaks due to failed or incomplete attempts at redaction, unless we apply a principled solution to the problem.
¹See [3] for examples.

All of these applications protect the privacy of their users by making it impossible for a (computationally unbounded) adversary to extract any information that the system is not designed to store, such as evidence of deleted files, database records, browsing history, or document editing history.

References

[1] Guy E. Blelloch and Daniel Golovin. Strongly history-independent hashing with applications. In 48th Annual IEEE Symposium on Foundations of Computer Science, pages 272–282. IEEE, October 2007.
[2] Guy E. Blelloch, Daniel Golovin, and Virginia Vassilevska. Uniquely represented data structures for computational geometry. In SWAT ’08: Proceedings of the 11th Scandinavian Workshop on Algorithm Theory, pages 17–28, Gothenburg, Sweden, July 2008. Springer.
[3] Daniel Golovin. Uniquely Represented Data Structures with Applications to Privacy. PhD thesis, Carnegie Mellon University, Pittsburgh, PA, August 2008.