BabelBuster by Jonathan M. Baccash (jbaccash@princeton.edu)

Submitted to Computer Science department, Princeton University
in partial satisfaction of thesis requirement.  Spring, 2001.

Adviser: Andrew W. Appel.

BabelBuster translates preprocessed C source code into English.
It can also translate the English back into C.

Enclosed are two folders, ckit and code.

The ckit folder contains ckit version 1.0, slightly modified
to fix a bug parsing large integer literals.  ckit is a C front
end written in SML that translates C source code (after
preprocessing) into abstract syntax represented as a set of
SML datatypes.  It is also capable of pretty-printing the
abstract syntax tree in C.  I grabbed it from the web.  See:
http://cm.bell-labs.com/cm/cs/what/smlnj/doc/ckit/index.html.

The code folder contains SML code and test-cases that I wrote.
To use it, go to the code directory and type "build".  Then
the programs c2e and e2c translate C source to English and
the English back to C source, respectively.

Alternatively, you can start up an SML session and type
"CM.make();" at the prompt.  When CM has finished making the
program, the main structures introduced into the environment
are BabelBuster and Test.  BabelBuster contains functions to
translate files, printing the result to standard output, a
specified output stream, or a specified file.

The Test structure contains some of the functions that I used
to test whether or not BabelBuster can correctly translate a
given C file to English and back.  To do this, I pretty-print
the C file, then translate it from C to English and back.  The
test is a success only if the result is the same as the
pretty-printed C file.  Test.regress runs the first N
regression tests.  There are currently 16 regression tests,
located in the directory code/test.  Furthermore, there is a
function, ckitTests, which can be invoked to run tests on ckit's
test C source files.  Three of these tests fail because old-style
function arguments are converted to normal function arguments
by the English translator, and are not by the pretty-printer.
The resulting C, however, is still correct.

Known limitations:
- The English is usually pretty good, but could obviously be better.
- Does not translate preprocessor macros.
- Some trivial information is lost in translation.
  - Unary plus is removed.
  - Preincrements and predecrements whose results are unused
    are sometimes turned into postdecrements and postincrements.
  - Nested struct/union definitions are converted to top-level
    definitions.
  - Anonymous structs/unions/enums are given names.
  - Does not preprocess the C file.  This should be done separately.
  - Code comments are removed.
  - Hex and octal literarals are replaced with decimal integers.
  - Other trivial lost information.
- To use c2e or e2c, current directory must be babel-buster/code.
