Next: Relational
Analysis Up: References
Previous: Daniel
Jackson
The type of a function tells you how it treats its arguments. If an argument is an integer, for example, one can assume that it is not dereferenced as a pointer. A more expressive type system allows more information to be conveyed; a function whose argument can be any type whatsoever (that is, a polymorphic function) cannot be performing any operation on the argument itself.
These ideas are well known to programmers in languages such as ML, and have motivated the design of programming languages. We have had a simple insight that allows type theory, and its associated algorithms, to be exploited in a different context: reverse engineering of large C programs. The declared type of a function in a C program will rarely tell you as much as it might; a function might declare an argument to be a pointer, for example, even if it is never dereferenced. But by inferring, in a richer type system than C's, the most general type of a function, one can obtain detailed information about the use of data structures.
Our Lackwit tool can show where data structures are read and written and how values flow from one to another; it can find candidates for data abstraction and identify representation exposures; and it can show dependences between functions due to shared data, correctly distinguishing incidental use of shared library functions from true architectural connection. The analysis is completely modular and can accommodate missing or incomplete code, handles aliasing and higher order functions smoothly, and seems to scale remarkably well. The prototype tool has been applied successfully to Reid Simmon's Morphin, a robotics application, and is now being used to analyze the source code of the Linux operating system.