Date: Tue, 10 Dec 1996 23:02:43 GMT Server: NCSA/1.4.2 Content-type: text/html Last-modified: Fri, 21 Jun 1996 18:28:08 GMT Content-length: 12909 Research & Teaching Summary

Research Summary

Jeffrey Dean

Over the past four years I have pursued research in programming language implementation, programming language design, and software engineering. This research is driven by the belief that programmer productivity can be significantly improved through the use of high-level programming languages. These languages aid the rapid development of robust software by incorporating advanced language features such as garbage collection, flexible type systems, closures, inheritance, and message passing. However, programmers have been reluctant to adopt these languages because their implementations have imposed severe performance penalties. To combat this, my research has developed techniques that permit high-level languages, and in particular object-oriented languages, to be implemented more efficiently, enlarging the set of problems to which these languages can realistically be applied.

Object-oriented languages promote the development of flexible software, by decoupling the clients of abstractions from their implementations. This decoupling happens through the use of dynamic dispatches (i.e. virtual function calls in C++ or message sends in Smalltalk), which delay the selection of the appropriate implementation of a behavior until run-time. However, while implementation details are a burden to the programmer, they are essential to the compiler to generate efficient code. The absence of implementation knowledge imposes both direct and indirect costs on a language's implementation. The direct cost is simply due to the overhead of determining which implementation of a behavior should be invoked at run-time. Often much more significant, however, is the indirect cost, the lost opportunity to apply traditional compiler optimizations such as inlining, due to this lack of implementation information. These costs are exacerbated by frequent use of dynamic dispatching and by the decomposition of programs into many small routines.

The key, then, to improving the performance of object-oriented programs is to enable the compiler to discover the implementation details of abstractions. Some languages, such as C++, allow programmers to expose implementation information to the compiler by foregoing the flexibility of dynamic dispatching at some call sites. The problem with this approach is that programmers must decide a priori where they want flexibility in their program. These decisions are not always obvious in advance, and performing these hand optimization decisions detracts programmers' attention from the real task at hand: writing software.

A better approach, and the motivation for my research, is to obtain high performance without compromising the flexibility of the programming model. This can be done by applying compiler analyses and transformations to discover the implementations that are invoked by dynamic dispatches. If the compiler can prove that only a single implementation can be invoked from a dynamic dispatch site, then it can be converted to a direct procedure call, which is then amenable to other optimizations, such as inlining. Unfortunately, the traditional separate compilation model hinders these analyses, because the compiler must assume that implementations might exist in other modules, making it impossible in most cases to know the complete set of implementations that might be invoked by a particular dynamic dispatch.

To support more efficient implementations, my research has explored the use of whole program analysis and optimization to improve the information available to the compiler. This opens up many opportunities for optimization to the compiler. For example, because the complete inheritance hierarchy of the program is known, the compiler can optimize away all dynamic dispatches for which only a single implementation of the behavior exists in the program. The remaining dynamic dispatches are places where programmers really might use the flexibility, because multiple implementations of a particular behavior are defined in the program. To attack these cases, I have explored selective specialization and profile-guided optimization of dynamic dispatches. I have developed an algorithm that detects when it is beneficial to compile multiple, specialized versions of a single source routine, each applicable to a different set of inputs. Often, this technique has the effect of moving dynamic dispatches from program hot spots to less frequently executed portions of the program. A final technique applies profile feedback from the program to optimize the remaining dynamic dispatches for their common cases. An issue with these techniques that perform optimizations are based on the current structure of the program is that they introduce complex dependencies from the generated code to various aspects of the program's source code. In order to be practical for use in an interactive development environment, the ability to do incremental compilation must be preserved. To accomplish this, I have developed techniques for managing these intermodule dependencies and selectively invalidating only those pieces of compiled code that must be regenerated after a programming change.

To experimentally validate these ideas and to ensure that I remain focused on practical techniques, I have co-designed and implemented Vortex, an optimizing compiler for object-oriented languages. Vortex is relatively language independent, currently compiling programs written in Cecil, C++, and Modula-3, and it implements the optimization techniques developed as part of my dissertation. Since Vortex is written entirely in Cecil, a language with many of the high-level features that I espouse, it also serves as a demonstration that such languages can be practical, given appropriate implementation techniques: its performance has been sufficient to put it to daily use in an interactive development environment.

The techniques I have developed, embodied by their implementation in Vortex, have been remarkably successful: the optimizations found in Vortex improve the performance of large Cecil programs by a factor of four over an optimizing baseline without these techniques, and experiments are currently underway to assess their effectiveness on C++ and Modula-3 programs. Despite the success of these techniques, many potential areas of research remain. In the future, I envision three areas in which I will focus my research efforts.

First, I will continue to investigate implementation issues related to object-oriented languages. Many avenues still remain unexplored in this area. One possibility is to develop techniques that rearrange the layout of objects and cluster frequently used objects together in order to improve locality. This sort of transformation is likely to become more important as program performance becomes increasingly dependent on effective utilization of the memory system. Another interesting area is in exploring more powerful forms of interprocedural analyses. A serious challenge here is in developing interprocedural algorithms that are both incremental and scale to realistically-sized programs.

Second, I would like to explore the synergistic interaction between language design and language implementation. The development of new implementation techniques can make potential language features much less expensive, spurring language designers to explore new areas. Similarly, the introduction of new language features can provide challenges to language implementors. For example, starting with the viewpoint that the compiler will have access to the entire program allows many compromises made in existing language designs to be avoided. To explore these issues, I am interested in designing a new systems programming language that includes features such as closures, multi-methods, and garbage collection, yet has excellent performance and the mechanisms needed for low-level control of data representations.

Finally, as the gap between the source language and implementation language widens, it becomes more difficult for programmers to identify and understand performance bottlenecks. With lower-level languages, simple profiling tools coupled with an understanding of the implementation costs of source language constructs were sufficient to identify code that needed performance tuning. With high-level languages and implementations that perform sophisticated optimizations, it is no longer obvious which source constructs are expensive and which pieces of code require tuning. I would like to explore means of providing more informative feedback to the programmer to more precisely identify performance bottlenecks.

All of these directions have the common goal of improving programmer productivity through the use of high-level languages, by making these languages faster, more expressive, and therefore applicable to a wider range of problems.

Teaching Summary

Jeffrey Dean

One of the primary reasons that I have chosen an academic career is the ability to have a profound influence on students. By educating them with the knowledge and skills that they need, they can go on to understand and solve the problems faced by computer scientists today and beyond. In the process, I hope to impart to them the enthusiasm I have for my field.

As a graduate student at the University of Washington, I've been a teaching assistant (TA) both for the introductory undergraduate computer science courses (three quarters) and for the graduate courses in compilers and programming languages (one quarter each). Furthermore, at the end of my first year in graduate school, I was the instructor for the second course in the introductory computer science sequence (Data Abstraction in Ada, although the course has since converted to using C++). This provided me with valuable experience in developing a course curriculum, exams, and projects, and in giving course lectures. Through my experience as both a TA and an instructor, I have discovered that teaching is challenging, but it is also rewarding to see students learn and prosper when given the proper stimulation.

I feel that my teaching skills would best be utilized in courses such as compilers, programming languages, and software engineering. My philosophy is that to truly engage the students in these types of courses, it is essential that they get hands-on experience in building systems through course projects. At this level, subtleties that have been glossed over in the text and in course lecture become apparent, enhancing the student's understanding. Often these projects can be group projects, which also teach students to effectively communicate their ideas, an essential skill regardless of their career path.

I am also a firm believer that involving advanced undergraduates in research projects is an important part of their education. As an undergraduate, my own involvement in research provided me with valuable experience and was a major factor in my decision to go to graduate school. As a graduate student, I have advised and worked closely with several undergraduates doing projects related to my research. I would hope to continue this involvement in undergraduate research as a faculty member.

I have consistently received very good and excellent ratings on my teaching evaluations, and these ratings have shown improvement in every quarter that I have taught. Here are some excerpts from my evaluations as an instructor:

"Splendid, absolutely splendid. I really enjoyed this course even though it required a lot of work. Jeff Dean was an excellent instructor."

"Your knowledge, enthusiasm, organization, preparedness for lectures, encouragement, and the way you answered questions were especially good."

"Jeez. Every aspect of the course was good. Jeff is a very clear communicator. Ideas about pointers, linked lists, etc. are pretty hard to grasp, but Jeff gave very good explanations and examples"

"Thorough and clear teaching, very good teaching manner (i.e. easy to follow, not threatening or intimidating, but not boring either)"

As well as some excerpts from my evaluations as a teaching assistant:

"Of the dozens of TAs with whom I have interacted (I've TAed as well as been TAed to), Jeff is perhaps the best."

"You frequently did a better job of explaining the material than the professor!"

"You are an excellent TA - the best I have ever had. Thank you!"

"Your patience and your desire for us to learn is probably your greatest asset."


jdean@cs.washington.edu