\section{Class [[CfgNode]]}

The class [[CfgNode]] is the data structure used for the individual
nodes in the CFG.

\subsection{Node Creation}

New CFG nodes are created with respect to a particular CFG and are
immediately included in the graph's roster of nodes, even though they
may have no connection to other nodes.

<<[[CfgNode]] creation>>=
CfgNode* new_empty_node(Cfg*);
CfgNode* insert_empty_node(Cfg*, CfgNode *tail, CfgNode *head);
CfgNode* insert_empty_node(Cfg*, CfgNode *tail, int succ_pos);
CfgNode* clone_node(Cfg*, CfgNode*);
@

The function [[new_empty_node(cfg)]] returns a new empty and isolated
node of [[cfg]].  The function [[insert_empty_node(cfg, tail, head)]]
inserts a new node along edge ([[tail]], [[head]]), while
[[insert_empty_node(cfg, tail, succ_pos)]] inserts a new node between
[[tail]] and successor
number [[succ_pos]].  Note that the sequence of successors of a node
is an ordered multiset: there may be several edges from node [[tail]]
to some node [[head]].  The second form of [[insert_empty_node]]
allows you to say exactly which edge to split with a new node.
Finally, [[clone_node(cfg, node)]] replicates [[node]] in [[cfg]],
leaving it disconnected.


\subsection{Node Contents}

The main content of a CFG node is a list of instructions.  You can scan and
edit this list using the same function calls that you'd use for the
contents of an [[InstrList]]:

<<[[CfgNode]] bodily functions>>=
int instrs_size(CfgNode*);
InstrHandle instrs_start(CfgNode*);
InstrHandle instrs_last(CfgNode*);
InstrHandle instrs_end(CfgNode*);
InstrHandle prepend(CfgNode*, Instr*);
InstrHandle append(CfgNode*, Instr*);
void replace(CfgNode*, InstrHandle, Instr*);
InstrHandle insert_before(CfgNode*, InstrHandle, Instr*);
InstrHandle insert_after(CfgNode*, InstrHandle, Instr*);
Instr* remove(CfgNode*, InstrHandle);
@

where

\begin{quote}
[[instrs_size(node)]] returns the number of elements in [[node]]. \\
[[instrs_begin(node)]] returns a handle on the first element of [[node]]. \\
[[instrs_end(node)]] returns the sentinel handle for [[node]]. \\
[[prepend(node, instr)]] inserts [[instr]] at the beginning of [[node]]. \\
[[append(node, instr)]] inserts [[instr]] at the end of [[node]]. \\
[[replace(node, handle, instr)]] replaces the element at [[handle]] in
		[[node]] by [[instr]]. \\
[[insert_before(node, handle, instr)]] inserts [[instr]] before [[handle]] in
		[[node]]. \\
[[insert_after(node, handle, instr)]] inserts [[instr]] after [[handle]] in
		[[node]]. \\
[[remove(node, handle)]] removes and returns the instruction at
		[[handle]] in [[node]].
\end{quote}

As with [[InstrList]], the [[instr_]] functions have nicknames because the
are so frequently used.

<<[[CfgNode]] function nicknames>>=
inline int
size(CfgNode *node)
{
    return instrs_size(node);
}

inline InstrHandle
start(CfgNode *node)
{
    return instrs_start(node);
}

inline InstrHandle
last(CfgNode *node)
{
    return instrs_last(node);
}

inline InstrHandle
end(CfgNode *node)
{
    return instrs_end(node);
}
@



\subsection{Graph Node Identification}

These OPI functions help identify a given CFG node, by giving a
code-label symbol representing the start of the code within it or by
printing an ASCII representation of it for debugging.

<<[[CfgNode]] identification>>=
Cfg* get_parent(CfgNode*);
int get_number(CfgNode*);
LabelSym* get_label(CfgNode*);
void fprint(FILE*, CfgNode*, bool show_code = false);
@

Function [[get_parent(node)]] returns the CFG that owns [[node]].  Function
[[get_number(node)]] returns a non-negative integer identifier that is less
than [[nodes_size(get_parent(node))]].  The function [[get_label(node)]]
returns the label of [[node]]'s leading label instruction, inserting a new
label instruction if none was present.  The [[fprint]] function is used to
support printing of the overall CFG.  It prints one line containing the
node number and the nature of its control-termination instruction, if any,
and then its lists of successor and predecessor numbers.  If the boolean
argument [[show_code]] is true, then it prints the node's instructions as
well.


\subsection{Control-Flow Relations}

When a CFG is created from a list of instructions, its edges reflect the
control paths implied by the control-flow instructions of that list.  We
call these {\em normal} edges.  The CFG library also supports the creation
of two other kinds of edges: {\em exceptional} and {\em impossible}.

The assumption usually made when the CFG creator encounters a procedure
call instruction is that it returns exactly once for each time the call is
executed.  In many languages, it is possible to write procedures that may
not return in the normal way, or that may return more than once for each
call.  To reflect such exceptional control paths, we provide a special kind
of successor relation, used to describe flow of control from a call site
directly to the exit node, or from the entry node to the point immediately
after a call.  We call such flow {\em exceptional}, since it is not the
usual thing, and since it is often associated with the raising of an
exception.  We provide functions for creating and recognizing exceptional
edges in a CFG.

Though unusual, exceptional edges are at least possible control paths.
On the other hand, there are important analysis techniques that insert
completely {\em impossible} edges in the CFG \cite{bibmorgan}.  They may
require that every node lies on a path to the exit node, for example.
To handle an infinite loop, they insert an impossible edge from one of
the loop nodes to the exit.  We also provide functions to create and
recognize impossible edges.

When the CFG is constructed, impossible edges are included to be sure
that every node is reachable from the entry, and every node has a path
to the exit.  However, no exceptional edges appear in the CFG when it is
first constructed.  To add them, you call [[set_exceptional_succ]]
explicitly, as described below.


\subsection{Finding Node Neighbors}

Each [[CfgNode]] keeps a list of predecessors and a list of successors
reachable by any kind of edge.  You access these lists through the
following functions.

<<[[CfgNode]] control-neighbor functions>>=
int succs_size(CfgNode*);
CfgNodeHandle succs_start(CfgNode*);
CfgNodeHandle succs_end(CfgNode*);

int preds_size(CfgNode*);
CfgNodeHandle preds_start(CfgNode*);
CfgNodeHandle preds_end(CfgNode*);

CfgNode *fall_succ(CfgNode*);
CfgNode *taken_succ(CfgNode*);
@

\begin{quote}
[[succs_size(node)]] and [[preds_size(node)]] return the number of
	[[nodes]]'s successor and predecessor nodes, respectively. \\
[[succs_start(node)]] and [[preds_start(node)]] return a handle on
	[[nodes]]'s first successor and predecessor, respectively. \\
[[succs_end(node)]] and [[preds_end(node)]] return the sentinel handle for
	[[nodes]]'s successor and predecessor sequences, respectively. \\
[[fall_succ(node)]] and [[taken_succ(node)]] return [[node]]'s first and
	second successors, respectively.
\end{quote}

The sequence of predecessors represents an unordered set,
while the sequence of successors
corresponds to the target(s) of the node's terminating instruction.
The predecessor list is unordered because we could not imagine a useful
ordering to implement; it is a set (and not a multiset) because having a
set seems a simpler abstraction.  The successors sequence is ordered;
positions 0 and 1 in the list have special meaning depending on the kind
of control instruction that terminates the node.  The same target can
appear multiple times in the successors list, so that different cases of
a multiway branch can jump to the same label.  The number of successors
of a [[CfgNode]] depends on the control instruction that ends the node:

\begin{itemize}
\item   Unconditional jumps and fall-through nodes (nodes that have
	no terminating control instruction) have just one successor,
	which is always successor number 0.
\item   Call instructions also have just one successor, the node to
	which the call will return.
\item   Conditional branches have two successors.  The fall-through
	path is always successor number 0, while the taken path is
	successor number 1.
\item   Multiway branches have as many branches as there are slots
	in the branch dispatch table.  Changing the $j$th successor
	(counting from 0) of a multiway branch modifies the $j$th
	dispatch table entry.  (There is no ``default'' target label.)
\end{itemize}

All exceptional and impossible successors occur after the normal
successors.

The functions returning a [[CfgNodeHandle]] permit
individual nodes to be visited in the usual manner.
The [[fall_succ]] and [[taken_succ]] functions are convenient ways to
access successors 0 and 1, respectively.


\paragraph{Making new neighbors.}

You set the normal successors of a node using one of the following.

<<[[CfgNode]] control-neighbor functions>>=
CfgNode* get_pred(CfgNode*, int pos);
CfgNode* get_succ(CfgNode*, int pos);
void set_succ(CfgNode*, int pos, CfgNode *succ);
void set_fall_succ(CfgNode*, CfgNode *succ);
void set_taken_succ(CfgNode*, CfgNode *succ);
@

The function [[get_pred]] returns the [[n]]-th predecessor of the node
to which it's applied (recall that the predecessors are not ordered).
[[get_succ]] returns the node corresponding to the [[n]]-th successor
edge, and different values of [[n]] may yield the same successor node.

Calling [[set_succ]] makes [[succ]] the new [[n]]-th
successor of [[node]].  It disconnects the former [[n]]-th successor (if
any), and updates the predecessor lists of the old and new successors.  If
necessary, it also modifies [[node]]'s CTI, which should be consistent with
the existing successor sequence, to reflect the change.  For example, if
[[node]] ends with a conditional branch, then {\small
\begin{verbatim}
    set_succ(node, 1, succ);
\end{verbatim}}
unlinks any previous ``taken'' successor, sets it to [[succ]], and
replaces the target symbol of [[node]]'s branch instruction with the
first label of node [[succ]].

Functions [[set_fall_succ]] and [[set_taken_succ]] are provided for
mnemonic value; they are equivalent to calling [[set_succ]] with second
argument [[0]] and [[1]], respectively.

Note that making node $e$ a successor of the entry node has the effect
of making $e$ an entry point of the procedure.

\paragraph{Creating exceptional and impossible edges.}

For exceptional and impossible edges, there are special ways of setting a
successor node.

<<[[CfgNode]] control-neighbor functions>>=
void set_exceptional_succ(CfgNode*, int n, CfgNode *succ);
void set_impossible_succ(CfgNode*, int n, CfgNode *succ);
@

If necessary, each of the above functions extends the successor sequence
of the node to accommodate at least [[n]]+1 elements.  An edge created
by [[set_exceptional_succ]] ([[set_impossible_succ]]) is marked as
exceptional (impossible).  Detecting whether a particular numbered
successor represents a normal, exceptional, or impossible edge is
handled by the following predicates.

<<[[CfgNode]] control-neighbor functions>>=
bool is_normal_succ(CfgNode*, CfgNode *succ);
bool is_normal_succ(CfgNode*, int pos);
bool is_exceptional_succ(CfgNode*, CfgNode *succ);
bool is_exceptional_succ(CfgNode*, int pos);
bool is_possible_succ(CfgNode*, CfgNode *succ);
bool is_possible_succ(CfgNode*, int pos);
bool is_impossible_succ(CfgNode*, CfgNode *succ);
bool is_impossible_succ(CfgNode*, int pos);
bool is_abnormal_succ(CfgNode*, CfgNode *succ);
bool is_abnormal_succ(CfgNode*, int pos);
@

The versions of the above predicates that take an integer [[pos]] as their
second argument test whether the [[pos]]-th successor edge is of the
indicated kind.  A ``possible'' successor is one that is either normal or
exceptional, but not impossible.  An ``abnormal'' one is either exceptional
or impossible.


\paragraph{Testing the control-transfer instruction.}

It's often necessary to test whether a node is terminated by a CTI, and
if so, what kind.

<<[[CfgNode]] control-instruction functions>>=
bool ends_in_cti(CfgNode*);	// has CTI
bool ends_in_ubr(CfgNode*);	// has CTI satisfying is_ubr()
bool ends_in_cbr(CfgNode*);	// has CTI satisfying is_cbr()
bool ends_in_mbr(CfgNode*);	// has CTI satisfying is_mbr()
bool ends_in_call(CfgNode*);	// has CTI satisfying is_call()
bool ends_in_return(CfgNode*);	// has CTI satisfying is_return()
@

\paragraph{Adjusting the control-transfer instruction.}

To this point, we have not described any way of changing the number of
successors of a node.  The [[set_succ]] function diverts an existing
edge, and it updates the CTI accordingly.  But sometimes, there's no
substitute for altering the control instruction yourself, and then
asking the library to adjust edges accordingly.  Here are the relevant
functions.

<<[[CfgNode]] control-instruction functions>>=
Instr* get_cti(CfgNode*);
InstrHandle get_cti_handle(CfgNode*);

void invert_branch(CfgNode*);

void reflect_cti(CfgNode*, Instr *cti, CfgNode *implicit_succ = NULL);
@

If the node has a control-transfer instruction, then it is returned by
[[get_cti]].  Function [[get_cti_handle]] is similar, but returns a handle
on the CTI's position in the instruction list, or a sentinel handle is
there is no CTI.

Function [[invert_branch]] changes the polarity of a branch by changing
its opcode to the opposite opcode (e.g. changing [[beq]] to [[bne]]).
If the branch has a [[branch_edge_weights]] annotation, giving the edge
frequencies of the node's two out-edges, then [[invert_branch]] swaps
these as well.

Note that there is no corresponding [[set_cti]] function.  To change the
CTI, you must change the instruction list directly and then call
[[reflect_cti]].  This function records the new CTI, which must already
have been put into the node and changes the node's flow successors to
reflect the new control instruction.  Obviously, if that instruction can
fall through, the function needs to know what to use as its implicit
successor.  In that case, pass it a second argument giving the implicit
successor node.

[[reflect_cti]] preserves the exceptional and impossible successors of the
node that it operates on, but it completely replaces the normal
successors..  If its second argument is [[NULL]], rather than a CTI
instruction, it clears the node's CTI association and makes
[[implicit_succ]] (its third argument) the node's only normal successor.



\subsection{Layout relations}
\label{sec-node-layout}

The CFG library supports basic block placement and layout by allowing
the user to specify an ordering (partial or total) of the nodes in a
CFG and its associated procedure.  The ordering is specified by a
doubly-linked list that runs through the [[CfgNode]]s of the graph.
Null pointers in this linked list indicate ``don't care'' orderings,
while connected components (this document calls them ``clumps'') of
the list will be laid out in order.  This gives you complete
flexibility between specifying no order at all (all layout links null)
to a total order on nodes.

\paragraph{Checking and changing layout constraints.}

To read and modify layout information, the library provides the
following functions.

<<[[CfgNode]] layout functions>>=
CfgNode *get_layout_pred(CfgNode*);
CfgNode *get_layout_succ(CfgNode*);

void clear_layout_succ(CfgNode*);
bool set_layout_succ(CfgNode*, CfgNode *succ);
bool set_layout_succ(CfgNode*, LabelSym *succ_label);
@

A nodes's layout predecessor is returned by [[get_layout_pred]]; its
layout successor, by [[get_layout_succ]].

All of the functions that change a layout constraint between two nodes are
invoked on the first of the two in layout order.  To remove a layout
constraint, use [[clear_layout_succ]].  To establish a layout
constraint, use [[set_layout_succ]], supplying the new layout successor
as its argument.  (For convenience, you can supply a label of the
successor node instead.)

The [[set_layout_succ]] function is careful about branch polarity and
explicit jumps, inverting branches and inserting unconditional jumps
when necessary.  When it needs to invert a branch, this function calls
[[invert_branch]] and then returns [[true]].


\paragraph{Effects of layout constraints on code.}

When a [[Cfg]] is constructed, no instructions are added or removed,
whether or not layout constraints are specified.  Thus a procedure can
be transformed into CFG form and back to instruction-list form with no
changes, whether or not the layout-retention option is selected.

When no layout links have been specified, the library treats the
CFG as a general graph.  You are free to transform it without having
to ensure that there is a linearization of its current instructions
that is in valid executable order.

On the other hand, the library insists that nodes with layout
successors be valid standard basic blocks.  This makes
[[set_layout_succ]] picky about the conditions under which it allows a
link to be set.

\begin{itemize}
\item   It is always legal to set the layout successor of a fall-through
	node or a node ending in an unconditional branch.  If the
	layout successor is also the control-flow successor, then an
	unneeded unconditional branch instruction will be removed.  If
	not, a goto will be created to be the terminating control
	instruction in the node (i.e. it will be returned by
	[[get_cti]]).  These constraints ensure that any necessary
	explicit goto's become visible for timing analysis and
	instruction scheduling.
\item   It is legal to set the layout successor of a conditional
	branch node only if the layout successor is either the
	fall-through successor or the taken successor of the node.
	The CFG library sets the branch polarity so that the layout
	successor becomes the fall-through successor of the node.  If
	you actually want the layout successor of a conditional
	branch node not to be a control-flow successor, then you are
	responsible for creating a new unconditional branch node below
	the conditional branch.  This can be done easily using the
	[[split_critical_edge]] function.
\item	It is legal to set the layout successor of a node ending
	in a call instruction only if the layout successor is also
	the control flow successor.
\item	It is always legal to set the layout successor of a multiway
	branch node.
\item	It is never legal to set a layout successor if adding that
	link to the layout list would create a layout cycle.  Moreover,
	it is never legal to change the fall-through successor of a node
	after its layout successor has been set.  (The library will
	allow you to set the fall-through successor to the same
	value, though).

\end{itemize}

\paragraph{Bounding a node's computation.}

It is sometimes useful to find the first ``real'' instruction in a node,
i.e., the first one other than a label or some other kind of marker.
It's occasionally useful to identify the last non-control-transfer
instruction in a node.  The following functions identify such ``boundary''
instructions within a node.

<<[[CfgNode]] boundary functions>>=
Instr *first_non_label(CfgNode*);
Instr *first_active_instr(CfgNode*);
Instr *last_non_cti(CfgNode*);
@

Function [[first_non_label]] returns the first instruction in a node
that isn't a label.  [[first_active_instr]] is similar, but skips
pseudo-ops and null instructions in addition to labels.  Function
[[last_non_cti]] returns the last instruction before the terminating
control instruction in the node.  Each of these boundary-instruction
functions returns the null pointer if there is no qualifying instruction.



\subsection{Header file for module [[node.h]]}

The [[CfgNode]] class and support data structures are defined in the
module [[node]], which has the following header file:

<<node.h>>=
/* file "cfg/node.h" -- Control Flow Graph Nodes */

<<Machine-SUIF copyright>>

#ifndef CFG_NODE_H
#define CFG_NODE_H

#include <machine/copyright.h>

#ifndef SUPPRESS_PRAGMA_INTERFACE
#pragma interface "cfg/node.h"
#endif

#include <machine/machine.h>
#include <cfg/cfg_ir.h>
#include <cfg/graph.h>

<<[[CfgNode]] creation>>

<<[[CfgNode]] identification>>

<<[[CfgNode]] bodily functions>>

<<[[CfgNode]] function nicknames>>

<<[[CfgNode]] control-neighbor functions>>

<<[[CfgNode]] control-instruction functions>>

<<[[CfgNode]] layout functions>>

<<[[CfgNode]] boundary functions>>

#endif /* CFG_NODE_H */
@
