Yitzhak
Mandelbaum
AT&T Labs -
Research
Formatted
Data Considered Harmful
*
* * * *
PLEASE NOTE: NOT USUAL DAY OR CONFERENCE ROOM * * * * *
Abstract:
Massive amounts of
useful data are stored, processed and exchanged in formatted
representations, requiring parsers and printers to translate the data
to and
from application data structures. Users often choose to write parsers
and
printers for such formats by hand. Yet, writing parsers and printers by
hand is
a notoriously difficult and error-prone process. In security-critical
applications, the results can be disastrous, allowing attackers to
execute
arbitrary code on vulnerable machines or manipulate vulnerable
applications to
attack others (e.g. with cross-site scripting). In other applications,
the
results can still be lost productivity and fragile software.
In this talk, I will
discuss Yakker and PADS/ML, two
projects aimed at addressing the many challenges of formatted data.
Yakker
examines Request for Comments (RFC) documents that specify
network-protocol
message formats in ABNF notation, and automatically generates parsing
libraries, printing libraries and even simple firewalls, directly from
the
RFCs. I will discuss the challenges of the extraction process, the
parsing
technology required to effectively use the extracted ABNF grammars, and
a novel
meta language that we designed to help application developers with
parsing and
printing messages.
PADS/ML targets
non-standard, or ad hoc, data formats. It includes a
declarative, type-based, scannerless language that permits users to
describe
the physical layout of their data and its semantic properties; a
compiler that
converts descriptions into a collection of useful data-processing
tools,
including parsing and printing routines; and a powerful generic
programming
framework which allows third party developers to write complex
format-independent tools in vanilla OCaml. I will explain these three
components of the PADS/ML project and provide a case study of its
successfull
application to the problem of parsing Cisco router configuration files.
* * * * * PLEASE NOTE:
NOT USUAL DAY OR CONFERENCE ROOM * * * * *
Thursday, February 7, 2008
3:30 - 5:00 p.m.
Wean Hall 5409
Principles
of Programming Seminars