People of Programming Languages

An interview project in conjunction with POPL 2018  

David Walker

Interview with David Walker

David Walker is a Professor in Princeton’s Computer Science Department. He studies type systems and various kinds of domain-specific programming languages (DSLs). David won POPL’s Most Influential Paper Award in 2008, along with Greg Morrisett, Karl Crary, and Neal Glew, for their 1998 paper "From System F to typed assembly language." In this interview we discuss this work on typed assembly, as well as his work on various DSLs.

 

JY: Tell us about yourself.

DW: I grew up in Toronto, Canada where I got a great initial education, thanks in large part to the influence of my parents Elizabeth and Michael. I did an undergraduate degree at Queen's University in Canada and I wound up going to Cornell for my PhD.

I had very little work-life balance for quite a while during my grad school and assistant professor days, but more recently, I’ve been working on changing that. One new hobby is singing: I joined the Philadephia Gay Men's chorus about a year ago. It is a great stress reliever and allows me to use a different part of my brain than I do in my day job.

JY: I want to talk about the paper that you and Greg won most influential POPL paper for in 1998, the typed assembly paper. Could you tell us how that paper came about?

DW: Greg Morrisett had just come to Cornell, and I was a young PhD student. For his thesis, Greg had been working on compiling ML into type-safe intermediate languages. The idea was that by creating a type checker for your intermediate language, you could help improve the reliability of a compiler by running the type checker every time you did a transformation. This would help you debug your programs, find errors, and ensure certain kinds of safety guarantees. Greg had pushed types part way down through the compiler stack, and we just decided, "let's push it down one level more, and develop a slightly lower level typed intermediate language, maybe one where we could allocated an initialize data structures safely." We started working on it, and then we pushed it down a little, pushed it down a little more, pushed it down a little more, and at a certain point we said, "Well, why don't we type check assembly language? Let's go all the way to the bottom." We surprised ourselves with the way it turned out.

JY: The fact that typed assembly can work is really not obvious at all. When I taught a junior-level class last spring, the students were completely shocked by it. They kept saying, "But this error can still happen." Then we said, "No, it can't happen because the type system prevents it."

DW: My recollection that the paper, when we submitted it, got very weak reviews. It got amongst the weakest reviews of any paper that I've had accepted at POPL.

JY: I can imagine. It's not intuitive at all.

DW: I remember Karl Crary gave the talk at POPL, and he got a lot of pushback with people, exactly those kinds of questions. "What about this? What about this? What about this?" It was pretty controversial at the time.

JY: How has the reception changed and what caused the reception of the paper to change?

DW: One thing that caused it to change was that we built an implementation. That certainly helped a lot, because people could see a concrete demonstration that it would actually work. Moreover, once we showed the basic idea had promise, a lot of other researchers were drawn into the space. There were a lot of ways that you could make our initial type systems more flexible and general. You can more-or-less bring all of the past work on type theory and logical frameworks to bear on the problem. So the work generated a wide, technically challenging problem space, and academic researchers like that kind of thing, especially when you can explain why the end results might be broadly useful to society. As a result, we gained a lot of academic momentum. The research isn't a direct building block for some of the things that people are doing now in terms of proving the full correctness of compilers, but I think it might have been influential in that it pointed out that it is possible to use formal methods to prove the correctness of really, really ugly code—assembly language, maybe the ugliest code out there. That might have given others the confidence to reach further: "Oh, they can do it for type safety. Maybe we can do it for stronger properties than that like full compiler correctness."

Another broad impact of the work was perhaps that typed assembly language, and the many other related projects that were happening at the same time, helped the community as a whole get a very firm grip on a really powerful proof technique (syntactic proofs of type safety). These days, doing type-safety proofs is pretty boring. It's necessary, but not really a result you can hang a paper on any more. And I think that is wonderful indication of the progress the community's made over the last 20 years or so.

JY: I also wanted to ask you about the breadth of your work. Something that I admire about your work is how you take in these really principled ideas and apply them to all sorts of domains. I've been wondering what has been the unifying theme behind these kinds of domain forays.

DW: I'm not sure there is much of a unifying theme behind the choice of problem or the choice of domain, to be honest. But there are some technical tools that get reused. One of the things I learned in grad school, and had a great deal of practice with, was how to write down little languages together with their static and operational semantics. Usually, these were little idealized assembly languages. All that practice writing and rewriting definitions of this ilk turned out to be really useful. It turns out there is quite a bit of art and taste and experience that goes into constructing elegant little languages. Part of that involves crafting the language in such a way that it highlights the feature you are most interested in, minimizes everything else, and makes the property you are after relatively easy to prove.

If I were to give some advice to students in programming languages, it would be to make sure you take the time to learn and to perfect some deep and highly reuseable technical skill. Doing that typically takes lots of time and repetition—the time you need to master a deep technical topic is much harder to find once you leave grad school so you had best do it early. Our community has lots of those skills—abstract interpretation, type theory, logical relations, decision procedures, etc. Mastering one that can be used again and again to produce results of interest is a great way to make a career.

One of the reasons I think I've bounced around from one domain to another is that I've never been particularly confident that I can identify the most important problems myself. Instead, what's happened is that I've reached out to other people who seem to know what the important problems are, and I've taken some of my technical skills, and I've worked with them, trying to apply programming language tools to solving those problems. I'm almost never the person that actually comes up with the problem in the first place.

For example in the work on PADS, a language for describing "ad hoc" data formats and the data structures that will be used to program with them, that was really Kathleen Fisher and Bob Gruber. Kathleen was working AT&T at the time and she discovered that AT&T essentially had a massive big data programming problem (we didn't call it "big data" back in those days). Anyway, she saw AT&T having to process millions of phone call detail records looking for fraud, and she did several things in that space. She developed Hancock, a language that would help analysts write a script that would big data feeds, and then implement them efficiently. Then she realized that the thing that was missing from Hancock was really the ability to describe the formats of those feeds and to generate the data structures needed by the Hancock processing engine. So Kathleen identified the problem. We got together to work on the semantics of the language and on synthesizing specifications from example data because of a bit of fluke: We were both on the same train riding back from a National Science Foundation proposal review meeting, and she started to tell me a bit about it. A fantastic PhD student at the time, Yitzhak Mandelbaum, got interested and we set to work.

JY: What would you say you see as the unifying values of POPL?

DW: Technical precision. If you aren't able to explain the technical guts of how your system works with precision, then it's not really going to be very popular at POPL. And I think that is a good thing because without precise definitions, it is difficult to make progress as a community. Without precision, it is difficult to tell how ideas relate to one another or to build on the ideas without having to redo everything. On the other hand, there is a balance. For instance, I don't believe that every property in a POPL paper needs to be backed up by a formal proof in Coq. Doing so will slow down the rate of research in our community. And many of the properties we prove aren't all that important—it's often the broad idea embodied by the paper that is the important thing.

JY: What are your thoughts on the directions the field has taken in the last few years?

DW: I think there are a huge number of things to be excited about in programming languages these days. There have been a lot of interesting applications of program synthesis in data processing, spreadsheets and super-optimization, that are pushing the creation of algorithms in the space. I've personally been working at the intersection of programming languages and networking. I think that programming languages and formal methods researchers can have a huge impact on the reliability of modern data center and backbone networks, which is massively important to increasingly networked society. If I have a disappointment about the field, it is that we have largely ceded programming systems for big data and data centers to other communities. I think that is a bit of a shame. I wish we saw more work on designing, implementing and programming with parallel collections, big graphs, or streams in POPL or PLDI—these topics seems to be in systems conferences. I'm concerned that the same thing will be happening with or is happening with machine learning. It would be wonderful to come to POPL to find out more about TensorFlow++ and related systems. Nevertheless, I think we've got it pretty good. There are certainly a few papers at POPL 2018 that I can't wait to find out about.