Research
Current Research
I am currently sixth-year PhD student at Carnegie Mellon University, expecting to graduate in May 2009. My advisor is Jim Herbsleb. Other members of my committee are Brad Myers, Gail Murphy, and Andre van der Hoek.
My research aims to help software developers manage and preserve knowledge, for their own use and for the use of other stakeholders in the project, especially in globally distributed teams where face-to-face knowledge transfer is limited.
We began this line of work by studying design meetings at the OOPSLA DesignFest event. We learned that the diagrams created in such meetings are difficult to interpret after-the-fact without additional contextual details, and that developers associate knowledge such as assumptions and decisions with specific entities but then forget it when the same entity appears in another context. In our OOPSLA’07 paper, we suggested that attention should be focused on preserving this knowledge, rather than on attempting to find a perfect notation for design or on attempting to automatically recognize the frequently improvised notations.
My current work is the eMoose project, a prosthetic memory tool for supporting software developers in Java and Eclipse that is inspired by human memory, and particularly by the models of semantic and episodic memory. Humans recall experiences from their past as episodes, reminiscent of stories of videos, which contain objective details about experienced events, along with subjective thoughts and experiences of these details. These subjective details may include thoughts about specific events or artifacts, and memory of the episode is recalled when these events, concepts or artifacts are encountered in other contexts.
Accordingly, eMoose maintains a knowledge space that consists of “objective” and “subjective” knowledge. The former is generated automatically by analyzing a stream of low-level telemetry that are provided by our monitoring infrastructure. The latter consists of knowledge about activities and artifacts that is provided by the developers, externally through a series of popups or voice recordings, or internally as tagged comments within the source code or tagged clauses in the artifact’s documentation.
The main feature of eMoose is that it “pushes” the subjective knowledge associated with artifacts such as methods to the code that references them, such as their callers. This increases the prospects that a developer writing or examining code that is a client of the annotated artifact would become aware of the important information. For instance, we systematically annotated the API documentation of several core libraries and identified “directives” such as instructions to do or not do certain things. While API authors may invest a lot in a correct and detailed specifications, client authors rely on too many services to be able to devote significant attention to thoroughly reading each documentation. As a result, certain directives are missed. With eMoose, awareness of the availability of directives on a certain target along with convenient means to access them may help avoid errors. This is detailed in our CSCW note and in a full paper submission (currently under review). eMoose also “pushes” to-do comments, making clients aware that there are incomplete clauses in the code that they depend on.
A more exploratory feature of eMoose involves the “episodic view”, generated by combining the timestamped objective events with the subjective activity details provided by users. Unlike more formal “tasks” or “bugs”, eMoose users can provide short informal indications of their intents and goals or of reminders and issues that must be addressed. The subjective activities partition and may help interpret and search the objective data.
In the short run, this may aid orientation since it makes it easier to see what the recent activities, intentions, and visited locations were. In the long run, the episodic view may offer significant decision traceability. For example, to gain some more understanding about the intentions behind a piece of code we may want to see all the situations where it was written or subsequently edited, what the intentions were at the time, what was edited, and most importantly, what code and documents were examined since example code or external details could have influenced the decision.
As part of my dissertation, I will also explore analogues of these ideas for software design.
An early version of eMoose is available for download and use.
Previous Research
Managing Interruptions in Software Development
In the summer of 2004 I did a research internship at the IBM Research Laboratory in Cambridge, MA. I developed an Eclipse-based framework for conducting research on interruption management in software development. This was integrated into the Jazz collaboration environment.
A paper on this work, titled "Eclipse as a Platform for Research on Interruption Management in Software Development", was presented at the Eclipse Technology Exchange workshop at OOPSLA '04. A longer version is available but is unpublished.
Investigating classes with formal concept analysis
My master's thesis involved applying formal concept analysis to the binary relation of accesses between methods and fields in individual Java classes and inferring information from the resulting concept lattice. The lattice presents the interface of a class in a more organized manner than an alphabetical listing of features, essentially serving as a heuristic for feature categorization. We can use the lattice to reason about the class and discover problems in its structure and implementation, as well as to select effective order for code inspection. There are also obvious applications for reverse engineers. My advisor was Yossi Gil. A paper was published at WCRE’03.
A language for patterns over source code
An exploratory research carried out while I was working for IBM research in Haifa. We were investigating the difficulties involved in defining a language for specifying patterns in source code, supporting the look-and-feel of individual languages while providing support for patterns that span multiple programming languages. Our conclusion was that an entire family of languages is necessary to provide this features, but that this investment is worthwhile due to many potential uses.
Other research interests
Following are some questions, problems and topics I hope to research some day:
Collaboration in software teams:
- Improving knowledge transfer within the team.
- How to preserve undocumented information when a developer leaves the team?
Documentation
- How to capture knowledge and design intent that doesn't quite fit into the documentation?
- What tools can help a developer document and annotate code sections without breaking the flow of "coding"?
A "redundant array of inexpensive programmers"
Can new techniques and tools allow us to make new progress on this? Can we create a system where novice developers can gain experience doing menial tasks and advance according to their abilities, while sandboxing their work from damaging other parts of the program? What economic models are appropriate here?
Open Source Development
One of the advantages of open source is that everyone can see everybody else's code. What drawbacks does this have? Do people shy away due to fear of peer criticism? Can we still develop exploratory projects in an open-source mode despite this problem? How does the fact that everything is archived forever affecting developers?