The Effects of Simple May-Alias Analysis in L3
David Abraham (dabraham)
Aliasing occurs when a storage location can be accessed in two or more ways. Without a formal model of alias information, many compiler optimizations are overly conservative. For example, in Common Subexpression Elimination (CSE), any assignment to a memory location kills the availability of all other memory expressions. Alias information can also be used to improve the performance of Dead-Code Elimination (DE), and Instruction Scheduling (IS). In this project, we will investigate the improvements that can be gained by implementing an alias analysis.
In this project, we study the effects of simple may alias analysis in the L3 compiler. Alias analysis can potentially increase the effectiveness of many compiler optimization techniques. In this report, we examine how it affects the performance of Constant Propagation (ConstP) and Common Sub-Expression Elimination (CSE). We have implemented these optimizations to take advantage of alias information. We have also implemented Constant Folding, since this is required to asses the benefits of ConstP, and Copy Propagation (CopyP) to clean up the unnecessary copies produced by CSE.
Aliasing occurs when a storage location can be accessed in two or more ways. Without a formal model of alias information, many compiler optimizations are overly conservative. For example, in Common Subexpression Elimination (CSE), any assignment to a memory location kills the availability of all other memory expressions. Alias analysis seeks to model the memory locations variables can be associated with.
Formally, we say that *p and *q are aliases if they point to the same storage location. However, tracking the locations a pointer refers to can be difficult. As such, we also define a weaker notion of aliasing. Pointers *p and *q may-alias each other if they may point to the same location.
Alias analysis is an ongoing area of research in Optimizing Compilers. There are a number of different types of approaches that have been considered. Here we outline some of the major differences between the approaches.
Flow Insensitive versus Flow Sensitive
A flow insensitive analysis is designed to be correct for all possible execution paths of the program. For instance, if at some lin *p can be aliased by *q on one entry path to the line, then we simply record that at the line *p and *q may-alias. A flow sensitive analysis is more detailed. It would record on which control paths the two pointers may-alias and on which they do not. This more detailed analysis could potentially be used by flow sensitive optimization techniques.
Intra-procedural versus Inter-procedural
In intra-procedural alias analysis, if a pointer is assigned to a function call, then we do not analyze the memory usage of the procedure. Rather, we simply say that the pointer may alias all other memory references. In an inter-procedural analysis (e.g., ), the function call is analyzed in terms of its parameters, the globals it accesses, and the other functions it calls. This leads to more precise may alias information.
Context Insensitive versus Context Sensitive
In a context sensitive alias analysis (e.g.,), functions are analyzed at every call sight with the specific input parameters for that call. A context insensitive algorithm just analyses each function once. Context sensitive algorithms can result in more accurate alias information.
For more information on the different types of alias analysis algorithms we direct the reader to .
For this project, we have implemented simple may-alias analysis. Our alias analysis is flow insensitive, intra-procedural and context-insensitive. We outline this algorithm in Section 2. In Section 3 we describe each of the optimizations that we have implemented. In particular we discuss how these optimizations use the may-alias information. In Section 4 we describe the experiments we have conducted to understand the effects of our may-alias analysis. In Section 5 we discuss the results.
2. The May-Alias Algorithm
In L3, the function alloc(num, type) is the only provision for memory allocation. This function allocates an array of num * sizeof(type) words. We track each static call to alloc with a unique base label, as well as a record of the array's length. (Whenever an array length is determined dynamically, we conservatively record the length as infinity.)
For each statement in the IR, we record the set of memory locations to which each temp may point. A memory location is specified by an array base label (see above), together with an offset. So abstractly, we maintain a set of triples (p, b, o), where p is a temp, b is a base label and o is an offset. We can determine if two temps p and q may alias each other by testing if p and q are contained in triples with a common base and offset.
We construct an IN and OUT set of triples for each statement.
The transfer function has several cases, one for each type of statement. We give some of the main cases below: (Note that All(p) refers to all possible triples (over all bases and offsets) with p as a first component.)
3. Optimizations Studied
Constant Folding and Constant PropagationConstant folding is an optimization that deals with evaluating binary operations on constants during compile time. More generally, if x=BINOP(c1, c2) where c1 and c2 are constants, the compiler evaluates BINOP(c1, c2) to the appropriate constant c3 and replaces the statement with x=c3. Alias analysis has little impact on constant folding. It can, however, help constant propagation. Constant propagation deals with assignments of the form x=c where c is a constant. It finds statements which use x, for example:
Common Sub-expression Elimination
Common sub-expression elimination can decrease the running time of a program by reducing the number of times an expression is evaluated. Instead of computing a recurring expression again and again, one may store the result at the first occurence of the expression into a temporary variable and replace the following instances of the expression with the temporary variable when it is safe to do so. To identify such safe instances, one may utilize a standard available expressions dataflow analysis, which gathers the information at each point in the program of what expressions are available there, i.e., have not changed since they were evaluated last. If an expression is both used and available at a point in the program, then one may use the stored value instead of evaluating the expression again.
Without alias analysis, we are forced to employ a conservative version of available expressions as described in , where when an assignment occurs to a memory location, i.e., of the form MEM[x] = ..., all expressions involving memory access are killed (i.e, no longer available after the assignment statement). However, given such an assignment to a memory location, alias analysis enables us to only kill those expressions involving memory accesses to locations that may alias x, thereby letting us identify a larger set of available expressions.
Note that we convert the intermediate representation to a three-address form to facilitate the task of common sub-expression elimination. Our three address form only allows functions calls with temporary variables or constants, a binary operation with temporary variables or constants, and a memory operation with a temporary variable to be on the right hand side of a MOVE operation.
A copy statement has the form:
Copy statements are also generated by optimizations such as Common Sub-Expression Elimination. As we will discuss in the results section, we found that Common Sub-Expression Elimination tended to make our compiler run slower. We conjectured that the generation of many copy expressions might be part of the problem. Also, copy expressions introduce many temporary variables, which makes it difficult for optimizations such as constant propagation and common subexpression elimination to identify memory expressions valid for optimization. More specifically, having many copy expressions makes it unlikely to see one memory expression, e.g. MEM[x], more than once, because it is likely there exists a copy expression y = x, after which all subsequent memory operations at address x will be performed in the form MEM[y]. As such, we have implemented standard copy propagation.
The aim of copy propagation, is given some copy statement:
We can propagate a copy statement s: x:= y to some use of x if the following conditions hold.
4. Experimental Setup
We use the L3 directory "big" as our test base. All the optimizations implemented were also tested on the "correct" directory and produced the desired output for each file.
In order to identify the sources of performance gains (and hits) we compare the following version of the compiler:
Impact of Alias AnalysisThe graph below, shows how much adding alias analysis improves our optimizations. On the x-axis are the "big" test cases, 1 through 10. On the y-axis is the percentage improvement of using alias analysis over the reference compiler with all optimizations turned on, minus the improvement over the reference compiler without alias information. Alias analysis increases performance by large percentages in many cases. In the few cases where it decreases performance, the difference is not significant.
Results for Different Combinations of OptimizationsIn this section, we look at the performance of different combinations of our optimizations on each individual test case. The graphs in this section show bars for each different combination of optimizations for each test case. The y-axis of each graph shows the percentage of improvement of the optimizations compared to the un-optimized reference compiler. Below we show the results for "big" test cases 1 through 5. On the test cases 1 through 4 there is no significant difference between the reference compiler and any of the versions of our optimized compiler. However on test 5, we see a large improvement from the optimizations. Without the alias analysis there is a drop in performance when all the optimizations are turned on. Once alias analysis is added we see a big improvement. Our optimizations, in particular CSE and alias analysis, lead to a large improvement on this test case due to the significant amount of pointer arithmetic and dereferencing in this test case. Below we show the results for the "big" test cases 6 through 10. On cases 6,7 and 8, we see that all optimizations with alias analysis perform slightly better than the reference compiler, while without the alias analysis the optimizations performslightly worse. On test 8, we see that Copy Propagation and alias analysis greatly improve CSE. However, using CSE on this test case overall decreased the performance and we see the best results when only Constant Folding and Constant Propagation are used. On case 10, CSE performs very poorly. Adding alias analysis improves its performance very significantly. The other optimizations also hurt performance but not significantly.
Results in Terms of Line NumbersIn this section, we look at how adding our optimizations decreased the number of lines in the IR code. All our optimizations use the 3-Address form, so we discuss the results in terms of whether the optimizations lead to less lines of code than un-optimized 3 Address. The graph below shows the number of lines in the IR code for different optimization settings for test cases 1 through 5 and 6 through 10. In each case, our code with all optimizations and alias analysis on or off decreased the number of lines compared to the un-optimized 3-Address code. The increased number of lines in some of the bars is due to CSE without copy propagation. When copy propagation is added the number of lines decreases.
Results in Terms of Number of TemporariesThe results we obtain when we compare the number of temporaries used in the IR in three address form with and without our optimizations are similar to the results for the number of lines. CSE increases the number of temporaries but when we use copy propagation this number drops. Alias analysis further reduces the number of temporaries.
6. ConclusionsOur experiments show that using alias analysis improves the effectiveness of most of our optimizations. In particular, when run together with all optimizations, it can often lead to an improvement over the reference compiler (using 3address form).
We found that it is essential to use copy propagation in order to see an effect from using alias analysis. This is due to the fact that CSE and fat pointer operations create new temporaries. For example, suppose we knew that MEM(x)=4, but CSE (or the fat pointer operations code) could have copied y=x, and then all statement would be using MEM(y) instead of MEM(x). This happens in practically all test cases and hence without copy propagation using alias analysis does not benefit the optimizations at all.
Three address transformation and CSE increase both the number of temporaries used, and the number of lines of code. This often causes the register allocator to do worse and hence slows the code down. Alias analysis with copy propagation makes CSE more effective, and always decreases both the number of temporaries used and the number of lines. Unfortunately, the register allocator does not always perform better on a smaller number of temporaries since their interference graph may be complex. Therefore we do not always see the performance increase over the reference compiler we expect.
7. ExtensionsWe considered adding alias analysis to dead code elimination, and to copy propagation. We expect the effect of alias analysis in those two optimizations to be minor, mainly due to the conservativeness of our approach.
8 References Andrew Appel, "Modern Compiler Implementation in ML", Cambridge University Press, 1999.
 B.Cheng and W.Hwu, "Modular Interprocedural pointer analysis using access paths: design implementation and evaluation,", PLDI 2000, Vancouver, British Columbia, Canada.
 W.Landi and B.Ryder, "A safe approximate algorithm for inter-procedural pointer aliasing". In Proceeding of the ACM SIGPLAN'92 Conference on Programming language Design and Implementation, pp.235-248, June 1992.
 J. Wu. Survey of Alias Analysis [http://www.cs.princeton.edu/~jqwu/Memory/survey.html].
 S. Muchnick, Advanced Compiler Design and Implementation. Morgan Kaufmann Publishers, 1997.