David Walker
Princeton University

PADS/ML: A Functional Data Description Language


Massive amounts of useful data are stored and processed in ad hoc formats for which common tools like parsers and pretty printers do not exist.  Traditional data management systems provide rich infrastructure for processing well-behaved data, but are of little use when dealing with data in ad hoc formats.  To address the challenges of ad hoc data, we have designed PADS/ML, a declarative data description language for the ML family of languages.  PADS/ML is based on the ML type structure and features polymorphic, dependent, recursive datatypes for describing ad hoc data. 

In this talk, we will describe the design, implementation and semantics of PADS/ML data descriptions.  The design exploits the elegance of ML's datatypes and the power of its module system.  The implementation has been done in O'Caml. Our compilation strategy uses a "types as modules" paradigm that pushes up against the limits of ML's advanced module system and poses practical challenges for module system designers.  The semantics are based on an extension of our previous work on the Data Description Calculus [POPL 06] with type-parameterized types, and we have proven the resulting system type-correct --- generated parsers return data of the expected type.

This is joint work with Yitzhak Mandelbaum, Kathleen Fisher, Mary Fernandez and Artem Gleyzer.
 - - - - - - - -
Host:  Karl Crary
 - - - - - - - -

Tuesday, May 30, 2006*
3:30 p.m.
Wean Hall 8220

Principles of Programming Seminars