PREP print Steensgaard paper print this document why is pointer analysis important? example program: z = 1 x = &z *x = 2 print z flow function for *x = y [z -> sigma(y) | must-point-to(x,z)] [w -> sigma(y) join sigma(w) | may-but-not-must-point-to(x,w)] sigma strong update vs. weak update may vs. must point to Two kinds of pointer analysis alias analysis: set of pairs of variables (x,y) where x,y may or must point to the same location points-to analysis x -> y iff x may/must point to y today: Andersen and Steensgaard flow-insensitive (DEFINE) context-insensitive (DEFINE) inclusion-based vs. unification-based later: 0 aggregate modeling (field-sensitive) 1 heap modeling (types, allocation sites, shape analyses) 1 flow sensitivity (shape analysis) 2 higher-order flow analysis (in FP) 2 context-sensitivity 3 call graph construction (in OO) 4 representations (BDDs, declarative) Pointer Statements p = &x x in pt(p) p = q q -> p *p = q x -> *p x = *q *q -> x Andersen Constraints p = &x : x in pt(p) p = q : q -> p *p = x : x -> *p x = *q : *q -> x Andersen Rules q -> p && x in pt(q) => x in pt(p) x in pt(p) && p -> *q && y in pt(q) => x in pt(*y) *q -> p && x in pt(q) && y in pt(*x) => y in pt(p) apply Andersen to example above Formulating Anderson for abstract locations same as above but use new statements instead of &x and annotate each new statement with an abstract location Andersen's efficiency deriving constraints: O(n) solution size / space use: O(n^2) how many times could a constraint fire? O(n) constraints O(n) variables in each points-to set 2 points to sets in 2 of the rules => O(n*n*n) Theorem: this is all you need to consider (David McAllester, SAS'99) Steensgaard's analysis (simplified - no function pointers) x = y join(var(x),var(y)) x = &y join(pt(var(x)), var(y)) x = *y join(pt(var(x)), pt(pt(var(y)))) *x = y join(pt(pt(var(x))), pt(var(y))) join(e1, e2) if (e1 == e2) return e1next = pt(e1) e2next = pt(e2) unify(e1, e2) join(e1next, e2next) example 1 - see Steensgaard paper example 2 - Rayside x = &a x = &b p = &x p = &y compare Steensgaard to Andersen on above efficiency example - Rayside q = &x q = &y p = q q = &z // Andersen does extra work here // not Steensgaard - already unified p and q Efficiency analysis each statement processed once - O(n) unify takes near linear time - O(n * a(n)) short-circuit test on join will fail at most O(n) times (once for each variable created in program) Thus O(n * a(n)) overall and uses O(n) space scaled to a million lines of code in 1996 Going to Java how to handle fields - treat all the same or separate? REFERENCES http://groups.csail.mit.edu/pag/6.883/lectures/points-to.pdf http://www.cs.rutgers.edu/~ryder/OOAnalRefacDagstuhl.pdf