download it
read the readme
see a presentation
For Task 1, we tracked information flow for preforming constant propagation, dead code elimination, and common subexpression elimination. This did not, however, attempt to determine if conditions required for a statement to be executed were contradictory or not. If the values required of the test expressions of the CJUMP statements alone the path in a control flow graph from the entry point of a function to a statement are contradictory, then that statement will never actually be executed. For example, consider the program:
x=input();
y=3;
if(x==1)
if(x==2)
y=2;
return(y);
It may be reduced to
x=input(); return(3);However, simple constant propagation and folding will not be able to achieve this optimization since it fails to spot that for y=2; to be executed requires a contradictory set of predicates to be satisfied. What are needed are the conditions placed on the line y=2; being executed.
Such conditions may be found from a program dependence graph (PDG) and are known as path conditions. I would like to work towards discovering these path conditions given the IR of L3 code.
Static Single-Assignment form (SSA) is an intermediate representation in which every virtual register is only the assigned value of at most one assignment statement. Phi nodes are used to represent points at which the the value that should be assigned to a virtual register is statically unknown but must be one of a fixed set of values. For example, the IR
CJUMP(EQ, Q, P, L1, L2) L1: R=3 JUMP L3 L2: R=2 JUMP L3 L3: ...becomes
CJUMP(EQ, Q, P, L1, L2) L1: R1=3 JUMP L3 L2: R2=2 JUMP L3 L3: R3=Phi(R1, R2) ...in SSA where R is replaced by R3 in ... .
Work by Cytron, Ferrante, Rosen, Wegman, and Zadeck provides an efficient method of translating code into SSA form (cfrwz91).
PDGs consist of two types of edges: flow edges and control edges. A flow edge from a node (statement) m to a node n means that data flows from m to n by assignment at m and use at n. A control edge from m to n means that the execution of n depends on the result of the conditional statement (CJUMP) m. Control edges are labeled with the value that the control predicate of the conditional statement must have for that edge to be taken.
D. J. Kuck with others did seminal work on PDGs (kmc72, kkplw81). J. Ferrante, K. J. Ottenstein, and J. D. Warren also did the early work (fow84, fow87). These works used them for program optimization, particularly parallelization. However, PDGs have found uses in program slicing, chopping, and information-flow analysis.
Given a statement n (node of the CFG), a backwards slice will provide all the statements that affects the values at n. A forward slice will provide those statements that n affects. A chop from m to n are those statements that are in the intersection of the forward slice of m and the backwards slice of n (jr94). That is, it provides those statements that allows m to affect n. Many uses for slices and chops have been found and much work has gone into efficiently producing them (for an overview, see, e.g., tip95).
One way to produce a chop uses a PDG. Give a PDG, a chop from m to n is just all the nodes that lay on a path from m to n.
An execution condition for a statement n is a propositional
logic formula that must be true for n to execute. The
execution conditions for a node may be found from a chop between the
entry node and it. Let Chop(entry, n) be a chop from the entry node to
n. Let p ∈ Chop(m,n) mean that p is a path
in the chop from m to n. Let l, b → m ∈ p mean that the path
p includes a control edge from l to m with the label
b. That is, for this edge to be taken, the control predicate
at node l must have the value b. The execution conditions of n is
EC(n) = ∃ p ∈ Chop(entry, n), ∀ l, b → m ∈ p, l=b
Note that this really is a propositional logic formula: the ∃ and ∀ could be replaced by disjunction and conjunction since the chop consists of a finite set of paths each of finite length (cycles need only be rounded once).
A path condition from m to n is a propositional logic formula that must be true for m to affect n. Let l ∈ p mean that l is a node in p.
Then, the path condition from m to n is
PC(m, n) = ∃ p ∈ Chop(m, n), ∀ l ∈ p, EC(l)
Again, the ∃ and ∀ may be replaced with disjunction and conjunction making this a true propositional logic formula.
Execution conditions and path conditions may be refined by including information about the Phi nodes of the program. For example if the statement
R3 = Phi(R1, R2)is present, then one may infer that either R3 equals R1 or R3 equals R2.
Gregor Snelting and others have been developing path conditions (rs02, hks06). They have produced implementations for both C and Java using PDG for producing the chops needed for the above method.
I did not make full use of the knowledge known about the Phi nodes. Based on which paths are taken in the chop, one can limit the choices provided by the Phi nodes.
The production of accurate PDGs in the presence of aliasing is very difficult. I just assume that any access to the heap may affect any other. I also do not deal with interprocedural analysis. Although believe my PDGs to be sound for intraprocedural analysis, many false dependences show up do to these limitation.
The construction of a path conditions for a statement (node in the PDG) involves many graph transversals that were not implemented in a very efficient manner. This and other factors leads to a very slow implementation.
Since identifying a path condition as contradictory is not only undecidable but also hard in practice (often requiring powerful theorem provers and symbolic execution), Only the most obvious simplification steps are made. For this reason, my implementation is actually incapable of performing the simplification given in the example above.
You may download the implementation here. For more information on using the implementation see the readme.
A short presentation on this material may be found here.
This project was produced for 15-745 Optimizing Compilers taught by Professor Peter Lee in the Computer Science Department, School of Computer Science, Carnegie Mellon University.
Without the standard libaries of L3, you will not be able to compile much. Furthermore, you need a linker to actually create code. You can find these things along with other L3 compilers and a test suite in the code bundle given to the class.
The output is post-processed by dot. See http://www.graphviz.org/ for more information about dot.
hmc72
D. Kuck, Y. Muraoka and S. Chen, On the number of operations simultaneously executable in fortran-like programs and their resulting speedup, IEEE Transactions on Computers, Vol. C-21, No. 12, dec 1972, pp. 1293--1310.
kkplw81
David J. Kuck, R. H. Kuhn, D. A. Padua, B. Leasure and M. Wolfe, Dependence graphs and compiler optimizations. In POPL '81: Proceedings of the 8th ACM SIGPLAN-SIGACT symposium on Principles of programming languages, pp. 207--218, ACM Press, 1981. http://doi.acm.org/10.1145/567532.567555
fow84
Jeanne Ferrante, Karl J. Ottenstein and Joe D. Warren, The program Dependence Graph and its Use in Optimization. In Proceedings of the 6th Colloquium on International Symposium on Programming, pp. 125--132, Springer-Verlag, 1984.
fow87
Jeanne Ferrante, Karl J. Ottenstein and Joe D. Warren, The program dependence graph and its use in optimization, ACM Trans. Program. Lang. Syst., Vol. 9, No. 3, 1987, pp. 319--349. http://doi.acm.org/10.1145/24039.24041
cfrwz91
Ron Cytron, Jeanne Ferrante, Barry K. Rosen, Mark N. Wegman and F. Kenneth Zadeck, Efficiently computing static single assignment form and the control dependence graph, ACM Trans. Program. Lang. Syst., Vol. 13, No. 4, 1991, pp. 451--490. http://doi.acm.org/10.1145/115372.115320
jr94
Daniel Jackson and Eugene J. Rollins, A new model of program dependences for reverse engineering. In SIGSOFT '94: Proceedings of the 2nd ACM SIGSOFT symposium on Foundations of software engineering, pp. 2--10, ACM Press, 1994. http://doi.acm.org/10.1145/193173.195281
tip95
Frank Tip, A Survey of Program Slicing Techniques, Journal of Programming Languages, Vol. 3, No. 3, 1995, pp. 121--189. http://www.research.ibm.com/people/t/tip/papers/jpl1995.pdf
rs02
Torsten Robschink and Gregor Snelting, Efficient path conditions in dependence graphs. In ICSE '02: Proceedings of the 24th International Conference on Software Engineering, pp. 478--488, ACM Press, 2002. http://doi.acm.org/10.1145/581339.581398
hks06
Christian Hammer, Jens Krinke and Gregor Snelting, Information Flow Control for Java Based on Path Conditions in Dependence Graphs. In Proceedings of IEEE International Symposium on Secure Software Engineering, 2006. http://www.infosun.fmi.uni-passau.de/st/papers/issse06/