Research Ideas
Not all of these problems are necessarily worthwhile to study, or easy to
solve. I have included each problem because I think it's interesting and
because I think there's an easy first step.
Tools
- A system for collecting and manipulating program traces for debugging,
testing, and understanding.
- A SAX-like API built into gcc to provide a convenient, generic
C/C++ front end.
- A tool to merge C/C++ files together into one big file for "whole-module"
compilation.
- Adding new object-oriented optimizations to g++:
- Optimizing classes introduced in anonymous namespaces
- A simple, flexible, very scalable version control system based on a
dynamic graph of relationships between repositories.
- Data structure compression: automatic runtime techniques for compressing
an application's data.
- Binary-level techniques for finding identical/similar code blocks and
merging them/reporting them to the user.
- Analyzing per-process filesystem interactions to find potentially
exploitable race conditions.
- Speeding up builds (make): processing a recursive makefile system with
one make process; bulk stat'ing of files using an inode monitor daemon;
incremental compilation and linking.
- Integrating make with local branch management (see above) for efficiently
building and managing multiple local versions.
- Speeding up gcc by compiling multiple files at once and forking at the
first substantial source divergence
- Partial static analysis: how bad is it if you assume no knowledge outside
your compilation unit?
Structures
- A language/library for layout, providing automatic incrementality,
asynchrony, laziness, extensibility, constraint handling, and profile-driven
optimization.
- A new programming model based on events, transactions, and objects,
designed to facilitate parallelism, fault tolerance, and distribution while
actually being easier to use for building large concurrent applications.
- A high level language for writing simulations, that people will actually
use.
- Statistical analysis of large open-source software projects (from source
repositories, bug databases, newsgroups, etc), with machine learning to
predict bugs, design problems, schedule, personnel issues, etc. (Sourceforge)
- Performance analysis and optimization of COM-based applications;
declaring, checking and using "out of band" implementation promises. It
would be especially interesting to be able to vary the exposure of
implementation details at compile time, without modifying any sources,
in order to easily tune the tradeoff between performance and binary
compatibility.
- Composable caching: given some resource to be partitioned among
a number of different cache modules, what should the interface to the
cache modules be, and what algorithms should be used, in order to ensure
optimal trade-offs among the modules? Large applications (e.g. Mozilla)
cache the results of lots of different kinds of computations. How much
potential performance is being lost, and where?
- Heterogenous garbage collection; given that we have a set of "memory
management domains" each with its own memory management/garbage collection
strategy, what kind of interface can we put on each of the domains to achieve
global garbage collection? COM-style reference counting works except for
cycles, but how can we do tracing?
- In general, .NET tries to shovel everything into one virtual machine.
Can we do better by designing interfaces to allow multiple independent
virtual machines to interoperate?
- Forget structural subtyping and release-to-release binary compatibility.
They're hard and constraining. How can we design glue into our object
frameworks to provide upward/downward compatibility?
Web
- Bi-incremental XSLT-like tool: incremental in the styled document and the
styling script.
- Office Killer: Reinventing the "traditional" productivity apps using
universal Web documents; making it easy to use HTML/XML/CSS/SVG/Javascript
to write memos, theses, presentations, spreadsheets and charts.
- The Web the way you really want it: a repository of site-specific
user style sheets, easily accessed (a la What's Related) and updated. Lawsuit
city.
Systems
- An on-the-fly disk layout optimizer that learns from disk access patterns
and serializes data on the disk. Can exploit large, solitary disks by
duplicating blocks into strips of frequently-sequentially-accessed data.
- Using profile data to drive global code placement.
- A scheme for allowing JIT'ing architectures (e.g. Crusoe, MAJC) to
natively compile higher-level JITted (Java) or interpreted (Perl) code
directly.
- Operating system support for a database of extensible user secrets
(a "universal agent").
- A scheme for single-chip multiprocessors that uses one or more CPUs as
coprocessors to speed the execution of single-threaded code running on one
CPU, e.g. by performing prefetches or speculative operations.
- Hardware support for partial evaluation by hacking the trace cache.
- TCP Cop: Monitoring TCP streams to detect misbehaving hosts (especially
important for congestion control).
User
- A system of dumb objects in a shared virtual space with basic "physics",
programmable with behavioural "rules" governing the interaction of objects
and assigning semantics to them, the way humans organise games.
- A truly easy to use system that organises all user-visible data as a
"soup" accessible through links and searches, i.e. no file names. All data
is in universal Web formats. There are no applications, just a collection of
tools (commands) that each operate on one or more formats.
- A tool to explain to the user what is currently going on inside their
computer system, in detail --- e.g. allocation of system resources to
applications and within applications.
- Extending XMLTerm with lightweight GUI wrappers to the command line.
Social
- Cooperative multiplayer games that build relationships by providing
synergy to teams cooperating against an environmental adversary.
- How do we teach programming as a life skill? Is Javascript/DOM a
good pedagogical environment?
Robert O'Callahan
Last modified: Wed Feb 21 01:38:22 EST 2001