Research Overview: David Walker

Research Overview
David Walker

The goal of my research is to improve the process of building reliable, efficient, and secure software systems. Using programming language theory and my experience with the design and implementation of practical language technology, I hope to make valuable contributions to this field.

Much of my past and current research focuses on the development of safe, strongly-typed programming languages and on the use of typing information during compilation. In strongly-typed source languages, like Java, types provide a form of checkable documentation that specifies invariants about the shape of data and the behaviour of code. Within a compiler, typing information plays a very similar role: compiler writers can use types to help reason about program transformations, guide program optimizations, and detect and isolate internal compiler errors. In addition, as in the Java virtual machine language, types can be used as the foundation for verifying critical security properties of untrusted mobile code.

Beginning in 1997, I, with others at Cornell, decided to explore the very limits of type-directed compilation. One of the major results of our study has been the theoretical analysis and practical implementation of a typed assembly language (TAL). Like an ordinary assembly language, TAL contains conventional assembly instructions and data, but unlike an ordinary assembly language, it can also contain typing annotations. We use a sophisticated type checker to check these annotations and to ensure that the code obeys a variety of safety properties. This project represents a significant step forward for the state-of-the-art because it demonstrates, for the first time, how to take advantage of types in all phases of language implementation, from the initial source program to final target code.

Among other theoretical results, we have proven that it is possible to compile high-level type-safe languages to an abstract typed assembly language and that a significant subset of the features that we have implemented are sound. On the practical side, the implementation itself verifies code for the Intel Pentium 32-bit architecture. We purposely chose a wide-spread, commercial machine architecture to show that the concept of a typed assembly language is not only theoretically feasible, but also quite practical and applicable to even the most complex real-world instruction sets.

Our group has experimented with compiling multiple different source languages into TAL, from dynamically-typed functional languages such as Scheme to statically-typed imperative languages including a safe variant of C. Because TAL acts as the target for multiple different compilers of multiple different source languages, we have had to define new, general-purpose type structure to ensure code safety. For example, we have discovered a novel typing discipline for the run-time stack and we have proven it sound. The typing discipline is flexible enough to allow us to implement both functional and imperative-style languages efficiently. However, perhaps even more importantly, the concept is general enough that an independent researcher (Robert O'Callahan at Carnegie Mellon University) was able use our techniques to show how Java bytecode verification could be improved. Hence, our careful analysis of low-level programming language structure led to a robust solution with broad practical impact.

More recently, I have been studying problems associated with programming in an environment with limited memory resources. In an embedded system or networking application, programmers may need to aggressively reuse memory, but type-safe languages rarely allow explicit memory deallocation or recycling. Rather than giving up the benefits of type safety when resources are limited, I have developed two different typing disciplines for verifying that memory can be safely deallocated and subsequently reused.

I worked on the first project with Karl Crary and Greg Morrisett. It is based on the idea that objects can be allocated into one of many "zones" or "regions" of memory. The type system keeps track of which regions are live and prevents access to regions that have been deallocated. Our result improved the community's understanding of region-based memory management by showing how to formulate a type safety proof using simpler, more conventional techniques than had previously been possible. Moreover, our type system is considerably more powerful than other type systems for region-based memory management. Using our techniques, it is possible to verify unstructured, low-level languages like TAL.

Regions make it possible to deallocate large collections of objects all at once, but they do not provide a mechanism for deallocating and reusing individual objects. The key to the latter problem is the ability to control aliasing so, together with Greg Morrisett and Fred Smith, I developed new type structure that makes it possible to represent and reason about pointers in our typed assembly language. Although there are many strategies for reasoning about aliasing, ours is unique with respect to the ease with which it interoperates with advanced higher-order type systems.

Since coming to Carnegie Mellon University, I have continued to explore topics in memory management. Currently, I am investigating techniques for giving source-language programmers both a high degree of control over memory management and guarantees of safety. More specifically, I have been working with Kevin Watkins to develop a theoretical model of memory that will allow us to incorporate a number of different memory management paradigms, ranging from reference counting to explicit region-based memory management, in the same language. I also continue to collaborate with Greg Morrisett. We are planning to implement some of the ideas developed in my formal framework in one of Greg's projects at Cornell, a type-safe variant of C called Cyclone. However, much research remains to be done. Development of a practical source-level language will require new techniques in program analysis and type inference as well as careful consideration of syntax, language design and the organization of the run-time system in order to succeed. I am interested in exploring all of these issues as well as experimenting with the resulting language on real systems applications.

Memory is just one of many resources that must be protected in an environment that may contain faulty or malicious processes. In order to secure a host computer against untrusted code, it is necessary to control other resources including, for example, access to files and use of hardware devices such as the network or CPU. In general, I am interested both in research that can help us understand how to define and reason about security policies as well as research that provides sound mechanisms for enforcing these policies. Recently, I contributed a solution to the latter half of the problem by showing how to compile and verify code that is not only type-safe for a standard notion of type-safety, but also secure with respect to a security policy. More specifically, I demonstrated that it was possible to use automaton-theoretic specifications of security policies to instrument programs with static typing information and the occasional run-time check. This combined static and dynamic approach to security made it possible to compile any program into a well-defined target calculus and yet verify that the resulting code adhered to the given policy. The techniques I discovered are extremely general; they can be used to enforce any safety property, for a broad definition of safety, including access control policies, resource bounds policies and policies that depend upon the history of the computation.

I also continue to be interested in security issues at Carnegie Mellon. I have recently written a grant proposal with Frank Pfenning to engage in a joint research project with NRL that will study how to use formal logics to specify important security protocols. At the same time, we are investigating how programming language technology can help give greater confidence that an implementation of one of these specifications is correct.

In future research projects, I will continue to investigate ways to improve the design and implementation of modern programming languages. I believe that progress in this field relies upon a healthy collaboration between foundational theory and pragmatic experience with implementation and applications. Therefore, I plan to strike a balance between these research directions as I develop the next generation of safe and reliable programming language technology.