Pointer (and Alias) Analysis:
Goal is to determine when two expressions can point to the same memory
location.
Memory aliasing
The problem:
foo() {
int a,k;
extern int *q;
...
... maybe &a or &k
...
k = a+6;
f(a,&k);
*q = 13;
k = a+6 <---- is this necc or not?
...
}
What could cause the statement to be required?
- call could cause k to be modified
- ptr assignment could cause k or a to be modified.
Aliases can arise from:
- pass by reference parameters
foo(ref i, ref j) { k=i; l=j; if (k==l) ... }
foo(int* i, int* j) { ... }
- address of operator
p = &q; q = 5; ...; *p = 5;
- derefencing pointers
*p
- array subscripting
i=foo(); j=bar(); k = a[i]; l=a[j]; if (k==l) ...
- non-local variables
let var i =
function foo(j) = { i=j }
in
foo(i)
end
- assignment
a = new X; b = new X; c = b; ...; c = a;
Two kinds of alias information:
must-alias information and may-alias information.
p = &q
a / \ b
p=&x ...
c \ / d
...
| e
at a p must alias with adr of q
at b p must alias with adr of q
at c p must alias with adr of x
at d p must alias with adr of q
at e p MAY alias with adr of x or q
Two kinds of analysis:
flow-insensitive: independent of control-flow. i.e., property holds
for entire block/function/program
flow-sensitive: dependent on control-flow, i.e., property holds at
program point p
Flow-insensitive:
useful in languages with strong typing.
use type information -> gathering of alias information must happen
when types are still present.
----------------
Kinds of alias information you might collect:
Points-to information (may or must):
- at program point p, determine the set of pairs (v, x) which indicate
v can point to x.
Alias Pairs (must or may)
- at program point p, determine the set of pairs (exp1, exp2) where
exp1 and exp2 must/may point to the same memory location.
Shape Analysis
- at progam point p, determine an abstract description of a pointer
structure (e.g., list, tree, etc.)
--------------------------------
Two types of pointer analysis problems:
- pointers to heap addresses (e.g., from malloc)
heap analysis
- pointers to stack objects (e.g., &foo where foo is local)
stack analysis
stack analysis has the benefit that locals have a compile time name
To solve this for heap analysis it is often the case that memory
malloced at a line number is given the name "created at line x"
--------------------------------
How might it be used:
- improve all previous analysis (CSE, PRE, CP, etc.)
- eliminate redundant loads/stores
x = *p;
...
y = *p; do we need this?
- parallelization
can we parallelize across function calls?
- saftey analysis
----------------------------------------------------------------
Structure of a PA:
Gathering alias information
- find all the abstract memory locations
Propogating alias information
- manipulate them
----------------------------------------------------------------
Lets do a flow-based may-alias intra-procedural stack analysis that
understands pointer arithmetic.
Sets of tuples for each program point:
(t, d, k)
t = variable
d = is an alias class (i.e., an abstract location set)
k = offset into an alias class
At each program point P there is a set of tuples such that if tuple
(t,d,k) is in the set, then, variable t may point to alias class d at
offset k. Or, t-k points to d.
At the start of the function A = {(FP, frame, 0)} where FP is the
frame pointer register and frame is the base of the stack frame for
the local variables.
We will generate these tuples using a data flow analysis based on
transfer functions:
A is the current state of alias information at program point p.
\sigma_t = set of all tuples that are type compatible with t
Assume compiler knows about type of new/malloc, i.e.,
a = (struct bar*)malloc(sizeof(struct bar)) creates the tuple
(a, fresh-bar-space, 0) where fresh-bar-space is type
compatible with bar and is a new location set.
DF equations:
in[entry] = {(FP, frame, 0)}
in[n] = Union over all preds, p: out[p]
out[n] = trans_n(in[n])
Where trans_n is defined for each stmt type:
t <- b (A - \sigma_t) \union { (t,d,k)|(b,d,k) \in A }
all possible location sets reached by b are now reached by t
t <- b + k (A - \sigma_t) \union { (t,d,i)|(b,d,i-k) \in A }
this is for k constant.
all possible location sets reached by b are now reached by t with an
offset of k
t <- b op c (A - \sigma_t) \union { (t,d,i)|(b,d,k) \in A OR
(c,d,j) \in A}
t <- M[b] A \union \sigma_t
d: t <- new a (A - \sigma_t) \union {(t,d,0)}
t <- f(...) A \union \sigma_t
struct intlist {
int val;
struct intlist* next;
};
struct int foo() {
struct intlist* p = 0;
struct intlist* q = 0;
int a = 1;
q = new intlist(0, NULL);
p = new intlist(0, q);
q->val = 5;
a = p->val;
return a;
}
1: MOVE t2 <- 0 p
2: MOVE t3 <- 0 q
3: MOVE t4 <- 1 a
4: CALL t5 <- malloc(8) (t5,4,0) q <- new
5: PLUS u1 <- t5, 0 (u1,4,0)
6: STORE MEM[u1] <- 0 q->val = 0
7: PLUS u2 <- t5, 4 (u2,4,4)
8: STORE MEM[u2] <- 0 q->next = 0
9: MOVE t3 <- t5 (t3,4,0) q
10: CALL t6 <- malloc(8) (t6,10,0) p <- new
11: PLUS u3 <- t6, 0 (u3,10,0)
12: STORE MEM[u3] <- 0 p->val = 0
13: PLUS u4 <- t6, 4 (u4,10,4)
14: STORE MEM[u4] <- t3 p->next = q
15: MOVE t2 <- t6 (t2,10,0) p
16: PLUS u5 <- t3, 0 (u5,4,0)
17: STORE MEM[u5] <- 5 q->val = 5
18: PLUS u6 <- t2, 0 (u6,4,0) p
19: LOAD t4 <- MEM[u6] a
20: RET t4
struct intlist* foo() {
struct intlist* p = 0;
struct intlist* q = 0;
int a = 1;
p = new intlist(0, NULL);
q = new intlist(6, p);
if (a == 0) p = q;
else p->val = 4;
return p;
}
1: CALL t2 <- malloc(8) (t2,1,0) p
2: PLUS u1 <- t2, 0 (u1,1,0)
3: STR MEM[u1] <- 0 p->val = 0
4: PLUS u2 <- t2, 4 (u2,1,4)
5: STR MEM[u2] <- 0 p->next = 0
6: MOVE t3 <- t2 (t3,1,0) x=p
7: CALL t4 <- malloc(8) (t4,7,0) q
8: PLUS u3 <- t4, 0 (u3,7,0)
9: STR MEM[u3] <- 6 q->val = 6
10: PLUS u4 <- t4, 0 (u4,7,0)
11: STR MEM[u4] <- t3 q->next = p
12: MOVE t5 <- t4 (t5,7,0) q
13: MOVE t6 <- 1 a
14: CJUMP (EQ, t6, 0, ifT0,ifF1 a == 0 ?
15: LABEL ifT0
16: MOVE t3 <- t5 (t3,7,0) x=q
17: LABEL ifF1
in: (t3,7,0)&(t3,1,0)
18: PLUS u5 <- t3, 0 (u5,7,0),(u5,1,0)
19: STR MEM[u5] <- 4
How do we use this information:
Example: available expressions:
gen kill
t <- b op c b op c - kill[s] exprs containing t
t <- M[b] M[b] - kill[s] exprs containing t
M[a] <- b {} exprs of form: M[x]
f(...) {} exprs of form: M[x]
However, if we know M[a] and M[b] are different, then line 3 is too
general:
M[a] <- b {} {M[x]|a may-alias x at s}
Example from above: Constant propagation (treat each (d,k) as a variable)
Another way is to treat each location set as a var, so for instance
above we could figure out that a has value 0.
This brings up order of optimizations. First we do constant prop, then
alias, then constant prop?
----------------
interprocedure analysis
shape analysis
new time interprocedural analysis.
What do we do with:
1:p = malloc
2:q = malloc
3:p->next = q;
4:p->next->val = 4;
t1 <- p (t1, 1, 0)
t2 <- t1 + 4 (t2, 1, 4)
M[t2] <- q p->next = q
t3 <- p (t3, 1, 0)
t4 <- t3 + 4 (t4, 1, 4)
t5 <- M[t4] (t5, 1, 0)(t5, 2, 0) current rule says t5 pts to anything
t6 <- t4 + 0 (t5, 1, 0)(t5, 2, 0)
M[t6] <- 0 ?
----------------
Shape analysis:
determine if a recursive data structure has the form of a tree, dag,
cycle.
tree: unique path from p to any two nodes accessible from p
dag: multiple paths from p, but no nodes can reach themselves
cycle: otherwise, cycle
what can we get from this?
assume p is a tree, then p->a and p->b are definitely not aliased.
How can we determine shape of a data structure?
dataflow analysis which manipulate direction, interference matrices.
Using D and I we cna determine shape of each structure at each program
point.
To be continued ...