Research

My research has focussed on large-scale systems, including an application-specific framework for geographic environmental modeling systems (GEMS), network-attached secure disks (NASD), and Active Disks, storage devices with the capability to run application-specific code. The fundamental issues spanning these projects were how to 1) partition computation among heterogeneous nodes in the system, 2) scale the system to a large numbers nodes, and 3) operate on large volumes of data efficiently. The core problem that ties these projects together the specification, and more importantly the analysis, required as part of the "functional model" of large systems. What mechanisms can we provide to describe the behavior of the components of a system that allows enough application-specific "customization" while introducing enough constraints to provide some hope of understanding overall system behavior?

My future research work will focus on this aspect of systems development, working from the context of active storage systems. A server that manages a large number of Active Disks ties together the problems of distributed computing, programming languages, parallel computing, and fault tolerance (disks must continue to preserve the integrity of user data). Starting from storage systems provides a key point of leverage as current interfaces to storage are relatively limited and well-defined, and the question becomes how much or how fast to pull things "across" these interfaces. As more and more data becomes computerized and widely available, storage will also become increasing important - in large-scale databases, in data mining, in Web systems, and in multimedia. Critical issues to address include: How can we aid programmers in re-partitioning their code for Active Disks? Are there general "design patterns" that can be codified? Are there general performance models that can be applied before a system is built (rather than requiring profiling or extensive "tuning" after the fact)? Can we define a set of measurable core properties that a distributed component must have in order to perform well as part of a whole (communication/compute ratio limited to X, or variance in latency limited to Y)? Or particular properties that it should not have (single point of failure, memory requirement of Z)? Can this be done across all the important "performance" characteristics - including basic "throughput", but also reliability and manageability?

Teaching

In addition to this research work, I have been a teaching assistant four times for three different faculty members. Three of these were project-based courses in Software Engineering and Advanced Software Engineering that teach the basics of software engineering techniques and object-oriented design and then "sets the students loose" on a large development project that they complete as a team. The project brings in a "client" from outside the university who has a particular software problem they want solved - when I taught these included a mobile system for emergency management first responders, a wireless healthcare system, and an aircraft reservation system. This class exposes students to working in teams and dealing with the issues of having a real client, real system issues, and real deadlines. Many students have contacted us after taking this course and told us that this was the single best experience that helped them during job interviews and in their eventual software development positions. I also taught a senior-level undergraduate and first-year graduate course in systems architecture that started from the premise "you've learned all about how processors work, now let's talk about the rest of the system." We taught the details of the memory hierarchy, basic performance evaluation, performance tuning, and touched on storage systems and supercomputer architectures. I believe the most successful parts of these courses was the "hands on" approach and the direct relationships we drew between the course work and the issues the students would face in their jobs, and I hope to continue that in my own teaching. One of the best-received topics in the systems architecture class was the storage lecture that I gave and the lecture on fault tolerance given by another guest speaker. Storage and I/O issues are becoming more and more important in real systems and in industry focus (as we reach the limits of what can be gained from faster CPUs, data bases continue to grow, and the "connectedness" of data continues to increase). I believe these topics need to form a larger and more important part of both computer science and computer engineering curricula in the future, and this is something I will work to promote.

Research

Active Disks, intelligent storage, parallel programming, databases

Storage systems, networking, file systems, distributed systems

Application-specific frameworks, object-oriented systems

Challenges & Future Work

Teaching