Yitzhak Mandelbaum
AT&T Labs - Research

Formatted Data Considered Harmful


* * * * * PLEASE NOTE:  NOT USUAL DAY OR CONFERENCE ROOM * * * * *

Abstract:

Massive amounts of useful data are stored, processed and exchanged in formatted representations, requiring parsers and printers to translate the data to and from application data structures. Users often choose to write parsers and printers for such formats by hand. Yet, writing parsers and printers by hand is a notoriously difficult and error-prone process. In security-critical applications, the results can be disastrous, allowing attackers to execute arbitrary code on vulnerable machines or manipulate vulnerable applications to attack others (e.g. with cross-site scripting). In other applications, the results can still be lost productivity and fragile software.
 
In this talk, I will discuss Yakker and PADS/ML, two projects aimed at addressing the many challenges of formatted data. Yakker examines Request for Comments (RFC) documents that specify network-protocol message formats in ABNF notation, and automatically generates parsing libraries, printing libraries and even simple firewalls, directly from the RFCs. I will discuss the challenges of the extraction process, the parsing technology required to effectively use the extracted ABNF grammars, and a novel meta language that we designed to help application developers with parsing and printing messages.

PADS/ML targets non-standard, or ad hoc, data formats. It includes a declarative, type-based, scannerless language that permits users to describe the physical layout of their data and its semantic properties; a compiler that converts descriptions into a collection of useful data-processing tools, including parsing and printing routines; and a powerful generic programming framework which allows third party developers to write complex format-independent tools in vanilla OCaml. I will explain these three components of the PADS/ML project and provide a case study of its successfull application to the problem of parsing Cisco router configuration files.
  
 
Host:  Bob Harper
Appointments:  April Foster <aprilf@cs.cmu.edu>

 * * * * * PLEASE NOTE:  NOT USUAL DAY OR CONFERENCE ROOM * * * * *

Thursday, February 7, 2008
3:30 - 5:00 p.m.
Wean Hall 5409

Principles of Programming Seminars