Optimizing Compilers project by Laura Hiatt and Kevin Bierhoff (Spring 2006).
The goal of our project is to look at what we will call “incremental path profiling”. Path profiling is an important part of compiler optimization; however, its computational overhead is quite high. Our project’s aim is to lessen the computational overhead of path profiling while still providing useful information about the hot paths of the program being compiled. We will do this by incrementally increasing the number of paths we profile by slowly adding instrumentation along high-use paths. By doing this, we hope to show that partial instrumentation of paths incurs less overhead than full path profiling while still providing enough information about which paths are hot to be useful in optimizations.
Full project proposal
Results poster
Path profiling counts how often each path through a function is taken at runtime. Path profiles can be used by optimizing compilers. Functions can be compiled such that the “hot path”, the most frequently taken path through the function, executes particularly fast (Ammons & Larus, 1998).
While somewhat impractical for normal compilers, path profiling and subsequent recompilation to accommodate hot paths is feasible for modern Just in Time (JIT) compilers. Unfortunately, path profiling is expensive. For example, the original “efficient” algorithm proposed by Ball & Larus (1996) is reported by the authors to have 30% overhead. For a JIT compiler, 30% are a very high cost because they can be directly perceived by the end users of the software. One remedy is to work with edge profiles. Edge profiles are cheaper to acquire, but they also often fail to identify the hot path correctly (Ball & Larus, 1996, p. 10). Path profiling results in “complete counts”: After execution, the compiler knows exactly how often each path was taken.
In the incremental approach we only find the hot path and ignore other paths. We incrementally instrument functions just enough to identify the hot path. Based on Ball & Larus (1996) we use the following algorithm.
Our implementation always finds the “true” hot path. It usually runs with less overhead than full path profiling. Ignoring file I/O the runtime of a partially instrumented program compares to full path profiling as follows.
This compares one run of a partially instrumented program with running a fully instrumented program. The number of iterations needed to determine the full hot path is linear in the length of the hot path. File I/O overhead accounts for almost 50% of the profiling overhead and can be ignored in a JIT compiler. It is unknown how many runs through the program one iteration needs to count before it can reliably identify the branch to extend. It should be smaller than the number of runs needed in full path profiling to find the complete hot path. We leave this question to future work.
The implementation is based on the L3 compiler used in class. It hooks into the canonicalizer to insert instrumentation based on an incrementally extended path graph. The instrumentation uses an additional runtime library to write to a trace file that is analyzed in the next compiler run.
The experimental results are based on the timing (big) test cases from the L3 test suite. The spreadsheet can be read in the following way.
(a) section is the result of l3_test*.hot, so the full path. (b), (c), etc. are the result of hot_inc, so the result after each iteration.
We think this is a promising result that can be further investigated in a number of ways.