Control Flow, 1/24/01
----------------------------------------------------------------

Basic Blocks
Basic Block Optimizations
Control Flow Graphs
Dominators
Loops

Basic Blocks
------------
A basic block is a sequence of instructions with one entry point and
one exit point.

Advantages of converting into basic blocks?

Creating basic blocks is easy:
- Scan instructions for LEADERS
  A leader is:
  - the first instruction
  - a label
  - an instruction that follows a branch
- Each leader starts a basic block which ends at the instruction
  before the next leader

There are cases when the definition of a leader is different.
For example, in the above definition calls do NOT end a BB.  (Thought 
setjmp and longjmp seem to be cases for this usually they aren't.)
Delay slots may cause the BB to be reformulated.

Optimizations on Basic Blocks: 

(History: Most optimizations we cover in this course are global
optimizations.  They often started out as local optimizations on basic
blocks)

Local CSE (13.1.1) is an example:

Scan instructions maintaining a table of 5-tuples:
<position, operand1, operator, operand2, var>

At each instruction, c <- a OP b 
- we see if there is a 5-tuple of the form:
	<?,a,op,b,?> or <?,b,op,a,?>
  If there is:  We have found a CSE.
- We see if there is a tuple of the form
	 <?,c,op,?,?> or <?,?,op,c,?>
  If there is remove the tuple

If we find a CSE: and var is in tuple, 
   then rewrite c <- a op b with c <- var
   otherwise, create new temp and put in var
	      rewrite instruction at pos to be:  ... <- var
	      insert before pos, var <- a op b
	      rewrite current instruction with c <- var

This can be quite fast if we hash on a,op,b (and make hash function
commutative in a and b)

Control Flow Graphs

Once we have converted sequence of instructions into basic blocks we
can create a control flow graph.  A CFG is a directed graph where

- Each basic block becomes a node in the CFG.  
- Usually the CFG is augmented with two distinguished nodes: Entry and
  Exit.
- There is an edge between two nodes, B1 and B2, B2 can follow B1 in
  some execution of the program, i.e.,
  - B2 is the target of a branch/jump at the end of B1
  - B2 follows B1 and B1 does NOT end in an unconditional jump
- There is an edge from Entry to the block which contains the first
  instr
- There is an edge from any block which can exit the procedure to Exit

If there is an edge from B1 to B2, then 
- B1 is a predecessor of B2
- B2 is a successor of B1

If a node has multiple predecessors, then that node is called a JOIN
node.

Before we look at how to use the CFG let me just mention that there
are other kinds of basic blocks: Hyperblocks, Extended basic blocks.

Hyperblocks (used in Impact) can have multiple entries and exits.
Often used for trace scheduling.

Extended basic blocks have 1 entry and multiple exits.  More formally,
an EBB is a maximal sequence of instructions starting with a leader
and containing no join nodes except possibly for the leader.

Now we have a CFG. So what?
- Useful for data flow analysis, Loop transformations, etc. etc.

Some Examples:
- Unreachable Code Elimination
  Removes all basic blocks that can NEVER be reached in ANY execution
  of the program

  sweep graph marking nodes as you go (depth first, breadth first)
  remove all unmarked nodes

  This is really not so much an optimization as it is a graph
  conditioner.

- Straightening
  Enlarges basic blocks when 
  - B1->B2 
  - B1 is B2's only pred
  - B2 is B1's only succ

    B1
     |
    \|/	->  B1/2
     B2	    / \
    /  \

Have to be careful of how you fuse the two blocks

     B1:   ...
	   goto B2
     B3:   ...
	   goto B7
     B2:   ...
           if (cond) goto B5
     B4:   ...


     B1:   ...
	   goto B2 <- zap
zap->B2:   ...
           if (cond) goto B5
	   [need to insert a goto here: goto B4]
     B3:   ...
	   goto B7
     B4:   ...


more = false
repeat
	Foreach basic block, b,
		if (|(succ(b)| == 1) && (|pred(succ(b))| == 1) 
		   dest = succ(b)
		   more = true
		   // fuse
		   if (b.lastInstruction == ubr) eliminate it
		   if (dest.firstInstruction == label) eliminate it
		   append dest instruction to b
		   if (b.lastInstruction != ubr)
		      make new label, l
		      fall_thru = block dest can fall_thru to
		      append "goto l" to b
		      prepend "l:" to fall_thru
		   succ(b) = succ(dest)
while more

Loops
-----
There is more leverage in optimizing loops than anything else. So we
need an easy way to find loops.  Furthermore, we need to be able to
find inner loops.  To find loops we use dominators.

First some definitions:
A loop in a cfg is a set of nodes, L such that
- one node is distinguished as the header, h
- For all nodes in L, there is a path to h
- There is a path from h to all nodes in L
- the predecessors of every node in L are in L except for h which can
  have predecessors outside of L

--------------------Ex. A -----------------

(Note: this excludes some things you might think are loops)

E.g.,
       1->2, 1->3
       2->3
       3->2

Finding loops: Use dominitors

Dominators: A node d dominates a node n if every path from Entry to n
must go through d.  Every node dominates itself.

-------------------------Ex. B ------------------

Example: 1 dominates 1,2,3,...
	 2 dominates 2,3
	 4 dominates 4,5,6,...
	 12 dominates 12

We say that a node, i, IMMEDIATELY DOMINATES another node, n (i =
idom(n)) if
- idom(n) != n
- idom(n) dominates n
- idom(n) doesn't dominate a dominator of n

Thm: Every node (except entry) has exactly 1 immediate dominator
Proof: Use lemma we are about to prove

Lemma: If d dominates n AND e dominates n then either 
       d dom e or e dom d
Proof: Assume !(d dom e) AND !(e dom d)
       Then there exists a path: Entry->e not through d
       Thus, EVERY path from e->n must go through d (otherwise d
				         wouldn't dominate n) 
       But the same must be true of d->n
       then there must be an infinite loop d->e->d->...
       QED

Proof of above thm:
- For every node, n, except Entry there exists a dominator, d, of n such
  that d != n.
- Assume d = idom(n) and e = idom(n) and e != d
  By lemma, (wlog) e dom d, but then e != idom(n)

We can construct a DOMINATOR TREE by connecting idom(n) to n.  It is a
tree by thm.

How do we find loops:

We call an edge (n,h) where h dom n a BACK EDGE.

Every back edge (n,h) defines a loop (NATURAL LOOP) in the CFG with
header h.

The natural loop defined by (n,h) is the set of nodes x such that
- h dom x
- there exists a path x->n that does NOT contain h

Eg. (10,5)      {5,8,9,10}
    (9,8)	{8,9}
    (3,2)	{2,3}
    (4,2)	{2,4}

Notice that a loop can contain another loop.  I.e., (10,5) contains
(9,8). [We do NOT say that (3,2) or (4,2) contain each other].

Constructing the loop:

Find all nodes in loop defined by n->d.
loop <- { d }
stack.push(n)
while stack not empty
      m <- stack.pop()
      foreach predecessor of m, p
	      if (NOT p in loop)
	      	      stack.push(p)
		      loop = loop UNION {p}
		      

We can find INNER LOOPS as follows:

if there are two loops A and B with headers a and b resp and a != b
and a dom b, then B is NESTED inside A. Or, B is an inner loop and A is
the outer loop.

To find the loop nests:

- Compute dominators
- Construct the dominator tree
- Find all the natural loops (and headers)
- merge all loops with a common header
- Construct a tree such that if h1 dom h2, then h1 is above h2 in the tree.

---------------------- Ex. C ------------------

For convience we can say that the Entry node is a "loop", then we have
a single root.  depth in tree indicates how deeply the loop is nested.

OK, finally, lets compute dominators (simple way, a better way will be
done later when we do SSA)

remember definition of dominators:

A node d dominates a node n if every path from Entry to n
must go through d.  Every node dominates itself.

If a node, n, dominates all pred(m), then n dom m.
Using this we can define a set of equations to determine the
dominators of n.

Lets say that D[n] are all the nodes that dominate n.  Then,

D[Entry] = { Entry }
D[n] = {n} UNION ( INTERSECTION D[p] )		n != Entry
		   p in pred(n)

The idea is to iterate over the graph until no changes are made to D
for any n.  We know it terminates since the computed D[n] is always a
subset of the original one.

So, this becomes:

D[entry] = {entry}
for all n, n != Entry: D[n] = {all nodes}
repeat
      changed = false
      foreach node n (n != Entry)
	      D'[n] = {n} UNION ( INTERSECTION D[p] )
	             		   p in pred(n)
              if (D'[n] != D[n]) 
		 changed = true
		 D[n] = D'[n]
while changed

      
1     {1}
2     {2,3,4,5,6,7,8,9,10,11,12}
3     {2,3,4,5,6,7,8,9,10,11,12}
4     {2,3,4,5,6,7,8,9,10,11,12}
5     {2,3,4,5,6,7,8,9,10,11,12}
6     {2,3,4,5,6,7,8,9,10,11,12}
7     {2,3,4,5,6,7,8,9,10,11,12}
8     {2,3,4,5,6,7,8,9,10,11,12}
9     {2,3,4,5,6,7,8,9,10,11,12}
10    {2,3,4,5,6,7,8,9,10,11,12}
11    {2,3,4,5,6,7,8,9,10,11,12}
12    {2,3,4,5,6,7,8,9,10,11,12}

1     {1}
2     {2,1}
3     {2,3}
4     {2,4}
5     {4,5}
6     {4,6}
7     {4,7}
8     {5,8}
9     {9,8}
10    {9,10}
11    {7,11}
12    {12}

Very fast looking.  We will discuss why this is fast and how to make
sure it happens later. Also, discuss ways to implement this
efficiently later too.

Other dominator relationships: 

POST-DOMINATION: p pdom n iff every path from n to Exit goes through p