My work focuses on fully automatic analyses of specifications and code. The approach taken by my group attempts to be at the same time principled and pragmatic. The models underlying our analyses are simple and well-understood mathematical models (such as the relational algebra and the typed lambda calculus), which allows to build powerful and predictable analysis engines. On the other hand, our tools and notations are motivated by practical case studies.Languages and analyses interest us only to the extent that they solve important industrial problems.
Many large developments suffer enormous costs due to errors made early in requirements or design that are only discovered much later. By analyzing specifications automatically, many of these errors can be detected while they are still cheap to fix.
We have developed a tool called Nitpick that checks properties of software specifications, and simulates their execution. Hand-in-hand, we have developed a specification language called NP whose design attempts to optimize expressive power with the constraints of tractability. NP is roughly a subset of the Z specification language.
Many software artifacts can be specified in NP and checked by
Nitpick. We have successfully analyzed a number of small but realistic
examples, including the style mechanism of Microsoft Word, a mobile
host routing protocol, and the directory/aliasing structure of the
Macintosh Finder. Using a variety of reduction mechanisms, Nitpick can
check huge numbers of cases; in one example, we checked
transitions of an operation for the preservation of an invariant.
The fundamental premise of Nitpick/NP is that increased power of analysis can be gained by reducing the expressive power of the specification language and the comprehensiveness of the specification itself. Instead of following the traditional route of writing a full formal specification in a very expressive language such as Z, we construct a partial specification in a restricted language, NP, focusing on the particular aspects of behavior that are regarded as most likely to be the source of serious and costly errors. As a result, the cost of specification is dramatically reduced, and the analysis can be performed completely automatically. Measured purely in terms of the depth and breadth of the specification, our approach provides both less coverage and less power than a conventional approach. But like a surgical laser in comparison to a light bulb, the coherence of the approach overcomes the drawbacks in power and coverage, and results in greater benefits with much improved efficiency. We plan in the near future to apply Nitpick to larger case studies, in particular telephone call processing and railway switching. The NP language also seems to be well suited to expressing properties of architectural structure, and we plan to exploit this soon.
The crippling costs of maintenance are due largely to the inscrutability of large software systems. As soon as a system reaches more than a few thousand lines, it becomes difficult to extract even simple properties, such as how values pass from one global structure to another, or what aliasing patterns are in effect at various points in the code. And yet for almost all maintenance purposes, such properties must be known to the maintainer. For example, if the maintainer wants to eliminate the platform dependence of a system, it might be necessary to find all places in the code that access some handle to an external resource (a file descriptor, for example). Simple lexical tools such as UNIX grep, and even syntactic tools such as call-graph generators and cross-reference analyzers, cannot extract this kind of information.
We have built a tool called LACKWIT that can provide answers to such questions employing an efficient and uniform analysis. Essentially, LACKWIT identifies groups of values having the same representation, and tracks the propagation of these values through code and global data. This information is vital for many applications, such as the reverse-engineering of abstract data types, and can be exploited for many other purposes, such as identifying undesirable couplings between subsystems, and tracing the repercussions of a local modification.
LACKWIT differs from conventional program analyzers by using type inference as its underlying algorithm. This makes it highly efficient (the cost of a complete analysis is barely more than linear in the size of the program) while elegantly handling 'tricky' programming language features such as pointer aliasing and first-class functions. This kind of tool fills a gap by achieving the performance of a lexically-based analysis while still incorporating a large amount of semantic information. We have implemented an analyzer for C and are evaluating its results on industrial applications.
Gregory Zelesnik