\section{Class [[Bvd]]}
\label{sec-class-bvd}

The classical iterative, bit-vector-based algorithm for data-flow
analysis is covered in many general-purpose texts on optimizing
compilation.  We assume you are familiar with the basic approach, and we
concentrate on describing how to use the BVD library to develop
instances of this useful paradigm.  Our running example will be the
liveness analyzer, the details of which are described in
Sections~\ref{sec-class-liveness} and~\ref{sec-operand-catalogs}.

\paragraph{Terminology.}

The data-flow analyzers covered here do two things: they identify an
interesting class (or \emph{universe}) of syntactic or semantic program
elements, and for each such element, they identify the points in the
program to which the element, or some aspect of the element, ``flows''.
For an available-expressions problem, the elements are syntactic
expressions evaluated by the program, and an expression $e$ flows to
point $p$ in the program if $e$ is computed (\emph{generated}) on every
path to $p$ without being invalidated (\emph{killed}) by an assignment
to some variable in $e$ before $p$ is reached.  For the
reaching-definitions problem, the universe of interesting elements
consists of the statements that assign to, or otherwise side-affect
storage locations, and these definitions flow to all the program points
reached by paths that don't contain an overriding assignment.  For a
liveness problem, the universe consists of storage cells, such as
variables and registers.  The liveness of a variable is generated by a
use of the variable, and it (the liveness) flows backward along control
paths until killed by a definition of (e.g., an assignment to) the
variable.

In the examples just mentioned, the data-flow behavior of each element
in the chosen universe is independent of the others.  Iterative
data-flow analysis, as implemented in the BVD library, exploits this
fact by dealing with all the elements in parallel.  For each element $e$
and each program point $p$, it wants to produce one bit of information:
true if $e$ flows to $p$ and false if it does not.  The parallelism is
achieved by packing these bits into bit vectors, with one bit position,
or \emph{slot}, per universe element, and then solving the data-flow
equations for all elements at once by computing on the bit vectors
instead of the individual slots.

To avoid the storage cost of allocating a different bit vector for every
point in the program, it is customary to associate them only with the
entry and/or exit points of nodes in the CFG of the program.  It is easy
to propagate from one of these node boundaries through the linear
sequence of instructions of the node when necessary.

We take exactly this classical approach.  We assume that every data-flow
solver determines the universe of interesting elements and assigns a
small non-negative number as the identifier of each element.  We call
this number the \emph{slot} of the element.  Instead of using the
bit-vector or bit-string metaphor when describing flow-analysis results,
we use sets of slots.  That is, instead of describing computations in
terms of bitwise boolean operations on vectors, we use set operations on
sets of small integers.  The two metaphors are conceptually equivalent,
and of course, we make sure that the sets used for data-flow analysis
have the complexity properties that have made ``bit-vector-based''
data-flow analysis effective and popular.  The advantage of the set
metaphor is simply that much of the machinery that works for sets based
on bit vectors carries over smoothly to sets with other performance
properties.

The slot-set types used in this library are derived from [[NatSet]], a
natural-number set class defined in the Utilities section of the
[[machine]] library document \cite{bibmachine}.  You should familiarize
yourself with the properties of those set types if you haven't already.
For data-flow analysis results, we use the class [[NatSetDense]], which
has bit-vector-like complexity traits.  Where a sparse representation is
more suitable, we use the list-based set class [[NatSetSparse]].  The
operations on all the [[NatSet]] classes are the same.  There is an
iterator class for these set types called [[NatSetIter]].


\paragraph{The base class for data-flow solvers.}

The abstract base class from which you derive a data-flow
solver is [[Bvd]]:

<<class [[Bvd]]>>=
class Bvd {
  public:
    virtual ~Bvd() { }

    const NatSet* in_set(CfgNode*)  const;
    const NatSet* out_set(CfgNode*) const;

    virtual void find_kill_and_gen(Instr *) = 0;

    virtual const NatSet* kill_set() const = 0;
    virtual const NatSet* gen_set() const = 0;

    virtual int num_slots() const = 0;
    void print(CfgNode*, FILE* = stdout) const;

  protected:
    enum direction { forward, backward };
    enum confluence_rule { any_path, all_paths };

    Bvd(Cfg*,
	direction = forward,
	confluence_rule = any_path,
	FlowFun *entry_flow = NULL,
	FlowFun *exit_flow = NULL,
	int num_slots_hint = 0);

    bool solve(int iteration_limit = INT_MAX);

<<[[Bvd]] details>>
};
@

\subsection{Accessing data-flow results}

Note that most methods of [[Bvd]] are protected from public use.  Apart
from its destructor, the public methods are all about accessing the
results of data-flow analysis.

The [[num_slots]] method (which must defined by a concrete subclass)
returns the total number of allocated slots once the solver has been
run.  In other words, it is the size of the universe of items found in
the program during data-flow analysis.  At present, this and the
[[print]] method are mainly used for debugging.  The essential public
methods are those that access the sets associated by [[Bvd]] with each
CFG node.

\begin{tabular}{l@{\hspace*{2em}}p{.7\linewidth}}
[[in_set(]]$n$[[)]]	& Returns the slot set for the (control) entry
			  point of node $n$. \\ 
[[out_set(]]$n$[[)]]	& Returns the slot set for the (control) exit
			  point of node $n$.
\end{tabular}

The sets returned are owned by the [[Bvd]] object; you don't need to
worry about deleting them.

To use liveness analysis as an example, [[in_set(]]\Mit{node}[[)]]
indicates which items are live on entry to \Mit{node} and
[[out_set(]]\Mit{node}[[)]] indicates those live at its exit.  (Note
that, even though liveness is a ``backward'' data-flow problem, the
[[in_set]] method refers to a node's control-flow entry, not its exit.)


\subsection{Constructing a data-flow analyzer}

The concrete subclass that derives a specific data-flow solver from
[[Bvd]] supplies the following parameters to its constructor.

\begin{itemize}
\item The CFG of the procedure to be analyzed.

\item The \emph{direction} of the data-flow problem: either [[forward]],
      following edges in the direction of control flow from the entry
      node, or [[backward]], following edges in reverse from the exit
      node.

      Liveness, for example, is a backward problem.  Liveness
      information is \emph{generated} by use occurrences of variables
      (say), and propagates backward against the direction of control
      flow until it is \emph{killed} by definition occurrences (e.g.,
      assignments).

\item The \emph{confluence rule} to be used when data flow from
      multiple nodes is combined: either [[all_paths]] or [[any_paths]].

      The [[all_paths]] rule means that to obtain the slot describing
      data flow entering node $n$, we use \emph{intersection} over the
      sets that represent flow out of $n$'s data-flow predecessors:

      \ifhtml
	\[
	  before[n] = INTERSECTION(p~\in~pred(n))~~after[p]
	\]
      \else
      \begin{displaymath}
        \mathit{before}[n] = \bigcap_{p \in \mathit{pred}(n)}\mathit{after}[p]
      \end{displaymath}
      \fi	

      The [[any_path]] rule is the same, except that it uses
      \emph{union} instead of intersection:

      \ifhtml
	\[
	  before[n] = UNION(p~\in~pred(n))~~after[p]
	\]
      \else
      \begin{displaymath}
        \mathit{before}[n] = \bigcup_{p \in \mathit{pred}(n)}\mathit{after}[p]
      \end{displaymath}
      \fi	

      The set of data-flow predecessors $\mathit{pred}(n)$ depends on
      the direction of the problem.  In a forward problem, they are the
      control-flow predecessors; in a backward problem, the control-flow
      successors.  In a forward problem, $\mathit{before}[n]$ is the
      information at the control-flow entry of $n$ and
      $\mathit{after}[n]$ is that at its exit.  In a backward problem,
      it's the other way around.

      For all nodes $n$, the initial value of $\mathit{before}[n]$
      depends on the confluence rule.  For an any-path problem, these
      sets start empty and accrue information through set union.  With
      an all-paths (intersection) problem, the $\mathit{before}[n]$
      start as the universal set and the final solution is carved out by
      intersection.

      The liveness example is an any-path problem: a variable is live at
      the exit of node $n$ (i.e., the point before data flow propagates
      backward through $n$) if it is live at the entry of any of $n$'s
      control-flow successors (i.e., its data-flow predecessors).

\item Special flow functions for the entry ([[entry_flow]]) and exit
      ([[exit_flow]]) nodes.  Since these nodes contain no code, their
      local data flow is by default represented using the identity
      function on slot sets (Section~\ref{sec-flow-functions}).  If
      there are special boundary conditions for a particular data-flow
      problem, however, use ([[entry_flow]]) and/or
      ([[exit_flow]]) to express them.

\item An optional estimate, [[num_slots_hint]], of the number of slots
      in each slot set of the analysis.  This can be used by the
      implementation to preallocate storage for data-flow results.
\end{itemize}


\subsection{Solving a data-flow problem}

In addition to its constructor, class [[Bvd]] defines the protected
method [[solve]] for use by its subclasses, typically in their own
constructor methods.  [[solve]] initializes the slot sets associated
with nodes and then pre-computes the net effect of each node as a
flow function from slot sets to slot sets.

Next, [[solve]] computes the local flow functions that capture the
effects of the individual nodes.  A flow function is a slot-set
transformer; it captures the total effect of a CFG node $n$ in one data
structure that is applied to $\mathit{before}[n]$, yielding
$\mathit{after}[n]$.  Starting with the plain identity function as the
flow function for the node, [[solve]] scans its instructions in forward
(backward) order for a forward (backward) problem.  It first uses the
$\mathit{kill}[i]$ set for instruction $i$ to alter the flow function so
that it removes the corresponding slots from the argument set.  Then it
uses $\mathit{gen}[i]$ to make the flow function insert the generated
slots.  For the entry (exit) node, it uses the [[entry_flow]]
([[exit_flow]]) parameter to accommodate boundary conditions, as
discussed above.

Finally, [[solve]] runs an iterative propagation algorithm on the slot
sets until it converges at a fixed point or until a caller provided
iteration limit is reached.  In the latter case, [[solve]] returns
[[false]] to indicate abnormal termination.


\subsection{Analyzing individual instructions}

The remaining public member functions are pure virtual methods to be
defined by the subclass for a specific data-flow problem.  They handle
the analysis of single instructions.

\begin{tabular}{l@{\hspace*{2em}}p{.7\linewidth}}
{\tt find\_kill\_and\_gen($i$)}& Prepares to enumerate the \Mit{kill}
			  and \Mit{gen} sets of instruction $i$. \\
[[kill_set()]]		& Returns the \Mit{kill} set
			  of the instruction analyzed
			  by [[find_kill_and_gen]]. \\
[[gen_set()]]		& Returns the \Mit{gen} set
			  of the instruction analyzed
			  by [[find_kill_and_gen]].
\end{tabular}

When the flow function that represents the net effect of a node is being
constructed, the solver calls [[find_kill_and_gen]] once on each
instruction in the node.  It then uses the [[kill_set]] and
[[gen_set]] methods to obtain the results that [[find_kill_and_gen]]
has found, i.e., to scan the slots killed by and those generated by the
current instruction.

In the liveness example,  the slots killed by an instruction are those
for variables or registers that the instruction defines, i.e., writes
to.  The slots generated by an instruction are those for variables or
registers that it uses.  [[find_kill_and_gen]] determines the definition
set and the use set for the instruction and [[kill_set]] and
[[gen_set]] allow the solver to access them.


\subsection{Implementation}

Our implementation uses the following storage:

\begin{itemize}
\item A [[FlowFun]] for each node in the CFG, reclaimed after [[solve]].
\item A [[NatSetDense]]  for the entry of each node.
\item A [[NatSetDense]]  for the exit of each node.
\end{itemize}


The non-public members of [[Bvd]] are declared as follows.

<<[[Bvd]] details>>=
  protected:
    Cfg* _graph;
    FlowFun* _entry_flow;
    FlowFun* _exit_flow;
    int _num_slots_hint;

    direction _dir;
    confluence_rule _rule;

    Vector<FlowFun> _flow;		// slot set transformer for each node
    Vector<NatSetDense> _in;		// node-entry slot set for each node
    Vector<NatSetDense> _out;		// node-exit  slot set for each node

    Vector<NatSetDense> *_before;	// &_in  if forward, &_out if backward
    Vector<NatSetDense> *_after;	// &_out if forward, &_in  if backward

    void compute_local_flow(int);	// subroutines ...
    void combine_flow(CfgNode *);	//   ... of method solve
@


\subsection{Header file [[solve.h]]}

Class [[Bvd]] is defined in header file [[solve.h]], which also
declares the initialization and finalization functions for the
BVD library:

<<BVD library initialization>>=
extern "C" void enter_bvd(int *argc, char *argv[]);
extern "C" void exit_bvd(void);
@

<<solve.h>>=
/* file "bvd/solve.h" -- Iterative bit-vector data-flow solver */

<<Machine-SUIF copyright>>

#ifndef BVD_SOLVE_H
#define BVD_SOLVE_H

#include <machine/copyright.h>

#ifndef SUPPRESS_PRAGMA_INTERFACE
#pragma interface "bvd/solve.h"
#endif

#include <machine/machine.h>
#include <cfg/cfg.h>

#include <bvd/flow_fun.h>


<<class [[Bvd]]>>

<<BVD library initialization>>

#endif /* BVD_SOLVE_H */
@
