Project Suggestions -- Optimizing Compilers Sp'05
- Exploiting local memories
Local arrays that are not aliased could be implemented
in ASH (application-specific hardware) as local memory banks,
improving latency and reducing competition for bandwidth to
the main memory. Implement compiler analysis to recognize such cases
and update the Pegasus simulator to evaluate its impact.
- Dave's reg alloc suggestions
David Koes has suggested some interesting theory papers on
register allocation; apply and evaluate these ideas.
Mikkel Thorup, "All structured programs have small tree
width and good register allocation",
Proves that structured (goto-free) C programs have control flow
graphs with treewidth less than 6 making them 6-complex. Provides an
approximation algorithm for coloring the intersection graphs of the CFG
(register allocation) with optimality bounds within a factor floor(k/2) + 1
and running time O(wn+w^{2.5}n) where w is the most variables which
conflict with each other at one point.;
Sampath Kannan and Todd Proebsting, "Register allocation in structured programs";
Hans Bodlaender and Jens Gustedt and Jan Arne Telle,
"Linear-time register allocation for a fixed number of registers",
A theory paper. It is possible to determine if it is possible to register
allocate without spilling if a program is regular in linear time. (But exponential
in the number of registers). A linear time algorithm exists even if you allow
for reordering of instructions (in order to improve register allocation, not
scheduling). If there are register classes, the complexity is linear without
rescheduling but polynomial proportional to the number of registers (W-hard)
if there is rescheduling. The way they go from local
to global is to note that there are at most
2^k local solutions (at least at the boundary). Since this is "constant"
they solve for all of them then combine them to get the global solution.
- Profiling + opts
Implement profiling in Pegasus and a logical set of optimizations
utilizing this information; evaluate performance benefits and
how robust performance is in the face of changing execution
characteristics (i.e. different dataset than that used during
profiling).
- Smart hyperblock formation
Hyperblock formation in Pegasus currently simply forms the largest
hyperblocks possible. This project would make it more intelligent,
likely incorporating profiling information (which currently has not
been implemented), and evaluate its impact on ASH and/or C6X targets.
- Dynamic pointer disambiguation in SW
Back in 1989, a pure software approach to dynamic pointer
disambiguation was described by Nicolau "Run-Time Disambiguation:
Coping with Statically Unpredictable Dependencies".
It tried to utilize unused slots in a VLIW schedule to
add code for the necessary checking and branching to fixup code.
Could this be useful targeting the C6X?
- Instruction Selection (including automatic utilization of pre/post increments)
Like many DSP architectures, the C6X has pre/post increment
versions of memory instructions. This project would add analysis
and transformations to our compiler to automatically
generate these addressing modes and evaluate the improvement.
- Semi-dynamic optimization: loop version selection
Vector compilers would often make two versions of a loop,
one vectorized and the other not. If the loop count was
too low to offset the vector startup overhead, it would
branch to the non-vectorized version. There is a similar
situation in reconfigurable computing, but the reconfiguration
overhead for an accelerated loop is typically much more
significant -- would a similar approach make sense? What
about "while" loops where the iteration count is not known
at entry? Could recent executions be used to predict
future iteration counts?
- Predicate Demotion
The Pegasus IR is aggressively speculative in that all 'safe'
operations in a hyperblock are executed even when their output
will end up being discarded. This is called "full predicate
promotion" in some literature. When targeting the C6X,
this unfortunately greatly increases register pressure
in large hyperblocks where several control paths have been combined.
But we are not yet utilizing the C6X's full predication support;
if we start predicating all operations, then operations from
alternative paths can in fact use the same set of registers
since only one will be enabled any given hyperblock execution.
- Simultaneous scheduling and register allocation
...for cluster architectures such as the C6X.
- Adapt Aiken's BANSHEE pointer analysis
Prof. Aiken's group
has developed a very fast and scalable points-to analysis for C;
this project would find a way to incorporate it into the
SUIF framework for use by existing and new Pegasus optimizations
and evaluate its benefit for hardware and/or C6X targets.
See http://banshee.sourceforge.net/.
- Software Pipelining
Create a cyclic scheduler that can overlap the execution of successive
iterations of a loop, being aware of cluster constraints
and register pressure.
- Implement if-conversion and target a non-predicated architecture (even
the c6x without predicates)