Practical Program Understanding With Type Inference

Authors: Robert O'Callahan, Daniel Jackson

Technical Report CMU-CS-96-130.

Download the PostScript.

Abstract

Many questions that arise in the reverse engineering or restructuring of a program can be answered by determining, statically, where the structure of the program requires sets of variables to share a common representation. With this information we can find abstract data types, detect abstraction violations, identify unused variables, functions, and fields of data structures, detect simple errors of operations on abstract datatypes (such as failure to close after open), and locate sites of possible references to a value.

We have a method for computing representation sharing by using types to encode representations. We use polymorphic type inference to compute new types for all variables, eliminating cases of incidental type sharing where the variables might have different representations. The method is fully automatic and smoothly integrates pointer aliasing and higher-order functions. Because it is fully modular and computationally inexpensive, it should scale to very large systems.

We show how we used a prototype tool to analyze Morphin, a 17,000 line robot control program written in C, answering a users questions about program structure, detecting abstraction violations, and finding unused data structures and memory leaks.