\section{Machine Instructions}
\label{sec:machine-instructions}

The central classes in the representation of machine-level code are those
for instruction lists ([[InstrList]]), instructions ([[Instr]]), and
operands ([[Opnd]]).  This section and the next introduce these classes and
the OPI functions that relate to them.

\subsection{Instruction constituents}

Every machine instruction is an instance of [[Instr]].  Each has
an opcode and most have operands, representing their sources
(``arguments'') and destinations (``results'').

\paragraph{Opcodes.}

An opcode is a non-negative integer.  With two exceptions, the opcode of a
machine instruction is meaningful only when interpreted with respect to an
architecture family.  (The exceptions are [[opcode_null]] and
[[opcode_label]], which are the opcodes of any null instruction or label
instruction, respectively.)  The particular numbers used as opcodes are not
dictated by ISAs; they are arbitrarily assigned small integers, useful for
accessing instruction properties by table lookup.  The same integer that
represents [[mov]] in an $x$86 instruction may be used for [[addq]] in an
Alpha instruction.  The architecture-specific libraries assign identifiers
to opcodes; for example, [[MOV]] stands for [[mov]] in the $x$86 library
and [[ADDQ]] stands for [[addq]] in the Alpha library.  (See
Section~\ref{sec:machine-opcodes} for more discussion of opcodes.)

The [[get_opcode]] function returns an instruction's opcode, which is
always set when the instruction is created.

\paragraph{Operands.}

In general, an instruction may have source operands and
destination operands; together, these are its {\em direct} operands.
However, an operand that represents an effective address calculation may
itself contain operands; so an instruction can also involve indirect
operands.

Machine SUIF imposes the constraints that locations read by an
instruction must be explicit in its operands (though possibly
indirectly), and that locations written by an instruction must be
explicit destination operands.  We distinguish the {\em input} operands
of an instruction, which are all the values read during the
instruction's execution, from the {\em source} operands of the
instruction, which simply identify its direct arguments.

As an example, consider an $x$86 instruction that adds the contents of the
[[%eax]] register to the memory location specified by the effective address
(EA) expression [[40(%esp)]].  Even though $x$86 has a two-address ISA, the
Machine-SUIF representation of this instruction has two source operands, an
address-expression operand [[40(%esp)]] and a register operand [[%eax]],
and one destination operand, an address-expression operand [[40(%esp)]].%
\footnote{It actually has two destination operands.  The second is a
   register operand corresponding to the $x$86 condition-code register.}
But this add instruction has five inputs: the three register operands (the
direct source operand [[%eax]] and the two occurrences of [[%esp]] in the
address expressions) and two immediate operands (also in the address
expressions).

The library provides facilities for enumerating and possibly changing all
the input operands of an instruction.  We begin with functions that access
and alter the direct operands of an instruction.

<<[[Instr]] operand accessors and mutators>>=

Opnd	    get_src(Instr*, int pos);
Opnd	    get_src(Instr*, OpndHandle);
void	    set_src(Instr*, int pos, Opnd src);
void	    set_src(Instr*, OpndHandle, Opnd src);
void	    prepend_src(Instr*, Opnd src);
void	    append_src(Instr*, Opnd src);
void	    insert_before_src(Instr*, int pos, Opnd src);
void	    insert_before_src(Instr*, OpndHandle, Opnd src);
void	    insert_after_src(Instr*, int pos, Opnd src);
void	    insert_after_src(Instr*, OpndHandle, Opnd src);
Opnd	    remove_src(Instr*, int pos);
Opnd	    remove_src(Instr*, OpndHandle);
int	    srcs_size(Instr* instr);
OpndHandle  srcs_start(Instr*);
OpndHandle  srcs_last(Instr*);
OpndHandle  srcs_end(Instr*);

Opnd	    get_dst(Instr*);
Opnd	    get_dst(Instr*, int pos);
Opnd	    get_dst(Instr*, OpndHandle);
void	    set_dst(Instr*, Opnd dst);
void	    set_dst(Instr*, int pos, Opnd dst);
void	    set_dst(Instr*, OpndHandle, Opnd dst);
void	    prepend_dst(Instr*, Opnd dst);
void	    append_dst(Instr*, Opnd dst);
void	    insert_before_dst(Instr*, int pos, Opnd dst);
void	    insert_before_dst(Instr*, OpndHandle, Opnd dst);
void	    insert_after_dst(Instr*, int pos, Opnd dst);
void	    insert_after_dst(Instr*, OpndHandle, Opnd dst);
Opnd	    remove_dst(Instr*, int pos);
Opnd	    remove_dst(Instr*, OpndHandle);
int	    dsts_size(Instr*);
OpndHandle  dsts_start(Instr*);
OpndHandle  dsts_last(Instr*);
OpndHandle  dsts_end(Instr*);
@

In the capsule summaries of the above functions, [[place]] represents
a sequence position, which is either a zero-based integer or an
[[OpndHandle]]:

\begin{quote}
[[get_src(instr, place)]] returns the source of [[instr]] at [[place]]. \\
[[set_src(instr, place, src)]] makes [[src]] the source of [[instr]] at
	[[place]]. \\
[[prepend_src(instr, src)]] inserts [[src]] at the beginning of
	[[instr]]'s sources. \\
[[append_src(instr, src)]] inserts [[src]] at the end of [[instr]]'s
	sources. \\
[[insert_before_src(instr, place, src)]] inserts [[src]] before [[place]] in
	[[instr]]'s sources. \\
[[insert_after_src(instr, place, src)]] inserts [[src]] after [[place]] in
	[[instr]]'s sources. \\
[[remove_src(instr, place)]] removes and returns the operand at
	[[place]] in [[instr]]'s sources. \\
[[srcs_size(instr)]] returns the number of source operands of [[instr]].\\
[[srcs_start(instr)]] returns a handle on the first source of [[instr]]. \\
[[srcs_last(instr)]] returns a handle on the last source operand of
	[[instr]]. \\
[[srcs_end(instr)]] returns the sentinel handle for [[instr]]'s sources. \\[1ex]
[[get_dst(instr, pos)]] returns the destination of [[instr]] at
	position [[pos]].  ([[pos]] defaults to 0 if omitted.) \\
[[set_dst(instr, pos, dst)]] replaces the destination of
	[[instr]] at position [[pos]] by [[dst]]. ([[pos]] defaults to 0
	if omitted.)\\
[[prepend_dst(instr, dst)]] inserts [[dst]] at the beginning of
	[[instr]]'s destinations. \\
[[append_dst(instr, dst)]] inserts [[dst]] at the end of [[instr]]'s
	destinations. \\
[[insert_before_dst(instr, place, dst)]] inserts [[dst]] before [[place]] in
	[[instr]]'s destinations. \\
[[insert_after_dst(instr, place, dst)]] inserts [[dst]] after [[place]] in
	[[instr]]'s destinations. \\
[[remove_dst(instr, place)]] removes and returns the operand at
	[[place]] in [[instr]]'s destinations. \\
[[dsts_size(instr)]] returns the number of destination operands of
	[[instr]]. \\
[[dsts_start(instr)]] returns a handle on the first destination of
	[[instr]]. \\
[[dsts_end(instr)]] returns the sentinel place for [[instr]]'s
	destinations.
\end{quote}

The [[srcs_size]] and [[dsts_size]] functions simply return the size of
the underlying operand sequences.  They do not necessarily represent
the semantics of the instruction's opcode.  If [[set_src]] or
[[set_dst]] is called with an index greater than or equal to the
current sequence size, the sequence is extended automatically.

Functions [[get_dst]] and [[set_dst]] each have variants in
which the operand number can be omitted because the case of
single-destination instructions is so common.

The indirect operands of an instruction can be accessed or changed by
first fetching a direct address operand and then operating on that,
using methods to be described in Section~\ref{sec:machine-operands}.

\paragraph{Other constituents.}

Some instructions have constituents that are neither sources nor
destinations.  These are managed by functions in the OPI that operate on
instructions.  Control-transfer instructions usually have target labels,
for example; they are not operands, but separate attributes.  These target
symbols are accessed and modified by the OPI functions [[get_target]] and
[[set_target]] (Section~\ref{sec:machine-instr-accessors-mutators}).  A
code label is represented by a special instruction that marks its position
in the instruction sequence.  It has no operands.  Its constituent label is
accessed and modified by the OPI functions [[get_label]] and [[set_label]].


\subsection{Creators for [[Instr]]}
\label{sec:machine-instr-creators}

We refer to functions in the OPI that create new instances of objects as
``creation functions'', or simply {\em creators}.  A machine instruction
creator takes an opcode and possibly some operands, and it produces a
machine instruction.
						   
We divide instructions into broad categories and provide creators for each.
Here's a summary of the breakdown and the mnemonics used for the
categories.

\begin{itemize}
\item Active instructions (real machine operations)
      \begin{itemize}
      \item ([[alm]]) Arithmetic, logical, and memory instructions
	    \begin{itemize}
	    \item ([[alu]]) Arithmetic and logical
	    \item ([[mem]]) Load and store
	    \end{itemize}
      \item ([[cti]]) Control-transfer instructions
      \end{itemize}
\item Inactive instructions
      \begin{itemize}
      \item ([[label]]) Label instructions
      \item ([[dot]]) Assembler pseudo-operations\footnote{%
	    Many assemblers use a leading period (``[[.]]'') to
	    identify non-code directives.}
      \end{itemize}
\end{itemize}

The purpose of defining categories is two-fold.  It allows us to provide
creators that are convenient to use because their argument types and
numbers are appropriate for the kind of instruction being created.  It also
allows for an implementation in which different categories may have
different representations.

Not every real machine instruction falls neatly into one of the above
categories, of course.  The control-transfer instructions ([[cti]]) are
the instructions that modify the program counter non-trivially.  They
include conditional and unconditional branches and function calls.  Each
contains a symbol that represents the transfer target.

The instruction categories aren't exclusive.  On some machines, a
control-transfer instruction might perform arithmetic, so it might have
side effects other than modifying the program counter.  Furthermore, a
program transformation may change an instruction from a pure ALU
operation to one that both computes a value and stores it in memory.
When creating an instruction, you pick the category that most closely
matches the need.  If an instruction needs more operands than the
provided creation function accepts, you add them separately.  The only
constraints to be aware of are pretty obvious: an active instruction can
never acquire a label, an instruction that needs a transfer target
(i.e., a [[target]] symbol) must be created as a control-transfer
instruction, and so on.

Here's are the prototypes of the instruction creators provided in the
[[machine]] library.  Their names include the category mnemonics listed
above.  Each returns a result of type [[Instr*]].

<<[[Instr]] creation functions>>=
Instr* new_instr_alm(int opcode);
Instr* new_instr_alm(int opcode, Opnd src);
Instr* new_instr_alm(int opcode, Opnd src1, Opnd src2);
Instr* new_instr_alm(Opnd dst, int opcode);
Instr* new_instr_alm(Opnd dst, int opcode, Opnd src);
Instr* new_instr_alm(Opnd dst, int opcode, Opnd src1, Opnd src2);

Instr* new_instr_cti(int opcode, Sym *target);
Instr* new_instr_cti(int opcode, Sym *target, Opnd src);
Instr* new_instr_cti(int opcode, Sym *target, Opnd src1, Opnd src2);
Instr* new_instr_cti(Opnd dst, int opcode, Sym *target);
Instr* new_instr_cti(Opnd dst, int opcode, Sym *target, Opnd src);
Instr* new_instr_cti(Opnd dst, int opcode, Sym *target,
		     Opnd src1, Opnd src2);

Instr* new_instr_label(LabelSym *label);

Instr* new_instr_dot(int opcode);
Instr* new_instr_dot(int opcode, Opnd src);
Instr* new_instr_dot(int opcode, Opnd src1, Opnd src2);
@

Note that the opcode always separates the destination from the sources, if
any.  The only difference between two overloadings with the same name is
that they allow different numbers of operands to be put in the created
instruction.  Of course, the number can always be adjusted later.  To
obtain an instruction with more than one destination or more than two
sources, you must use one of the above creators and then add operands to
the instruction that it returns.

Recall our example of an $x$86 add instruction.  Suppose [[value]] is the
operand representing register [[%eax]] (which holds the value to be added
in) and [[addr]] is the address operand standing for [[40(%esp)]].
Suppose that [[body]] is an [[InstrList*]] that we are extending as we
generate code, created perhaps as follows:

\begin{verbatim}
    InstrList *body = new_instr_list();
\end{verbatim}

Then the add instruction might be created and appended like this:%
\footnote{Still ignoring the setting of the condition-code register.}

\begin{verbatim}
    append(body, new_instr_alm(addr, ADD, addr, value));
\end{verbatim}

(Here [[ADD]] is the $x$86 opcode for addition.)

On a load/store machine like the Alpha, the same operation might take three
instructions and an extra register.  Let [[tmp]] be the operand
representing a scratch register.  Then the sequence to accomplish the same
add might be:

\begin{verbatim}
    append(body, new_instr_alm(tmp,  LDQ,  addr));
    append(body, new_instr_alm(tmp,  ADDQ, tmp, value));
    append(body, new_instr_alm(addr, STQ,  tmp));
\end{verbatim}



\subsection{Predicates for [[Instr]]}
\label{sec:machine-instr-predicates}

The OPI provides Boolean functions to use when your analysis or
optimization pass is trying to detect a particular kind of instruction
(e.g., a move instruction).  Some of these have target-independent
semantics.

Others make use of the prevailing target context to inspect the instruction
passed to them in a machine-specific way.  As an OPI user, you don't have
to know which is which.  As an extender, you may, and this is covered in
Section~\ref{sec:Contexts}.

<<[[Instr]] predicates>>=
bool is_null(Instr*);
bool is_label(Instr*);
bool is_dot(Instr*);
bool is_mbr(Instr*);
bool is_indirect(Instr*);
bool is_cti(Instr*);
bool reads_memory(Instr*);
bool writes_memory(Instr*);
bool is_builtin(Instr*);

bool is_ldc(Instr*);
bool is_move(Instr*);
bool is_cmove(Instr*);
bool is_line(Instr*);
bool is_ubr(Instr*);
bool is_cbr(Instr*);
bool is_call(Instr*);
bool is_return(Instr*);
bool is_binary_exp(Instr*);
bool is_unary_exp(Instr*);
bool is_commutative(Instr*);
bool is_two_opnd(Instr*);
@

Their capsule summaries:

\begin{quote}
[[is_null(instr)]] returns true if [[instr]] is a null instruction. \\
[[is_label(instr)]] returns true if [[instr]] is a label
	instruction. \\
[[is_dot(instr)]] returns true if [[instr]] is a pseudo-op
	instruction. \\
[[is_mbr(instr)]] returns true if [[instr]] is a multi-way branch
	instruction. \\
[[is_indirect(instr)]] returns true if [[instr]] is an indirect-jump
	instruction. \\
[[is_cti(instr)]] returns true if [[instr]] is a control-transfer
	instruction. \\[1ex]
[[reads_memory(instr)]] returns true if [[instr]] reads memory. \\
[[writes_memory(instr)]] returns true if [[instr]] writes
	memory. \\[1ex]
[[is_builtin(instr)]] returns true if [[instr]] requires special
	implementation on each target platform. \\[1ex]
[[is_ldc(instr)]] returns true if [[instr]] is a load-constant
	instruction. \\
[[is_move(instr)]] returns true if [[instr]] is a register-move
	instruction. \\
[[is_cmove(instr)]] returns true if [[instr]] is a conditional move
	instruction. \\
[[is_line(instr)]] returns true if [[instr]] is a source-code location
	instruction. \\[1ex]
[[is_ubr(instr)]] returns true if [[instr]] is an unconditional branch
	instruction. \\
[[is_cbr(instr)]] returns true if [[instr]] is a conditional branch
	instruction. \\
[[is_call(instr)]] returns true if [[instr]] is a call
	instruction. \\
[[is_return(instr)]] returns true if [[instr]] is a return
	instruction. \\
[[is_binary_exp(instr)]] returns true if [[instr]] is a side-effect-free
	binary expression. \\
[[is_unary_exp(instr)]] returns true if [[instr]] is a side-effect-free
	unary expression. \\
[[is_commutative(instr)]] returns true if [[instr]] has two source operands
	that can be exchanged without changing its meaning. \\
[[is_two_opnd(instr)]] returns true if [[instr]] is required to be in
	\emph{two-operand} (or ``two-address'') form, i.e., if its first
	source operand must equal its (first) destination operand.
\end{quote}

A note about [[is_cbr]] above: a ``conditional branch'' instruction is
one that has two targets, one of which is an implicit fall-through
target.  The latter characteristic distinguishes it from a multiway
branch that happens to have two targets.  An instruction may satisfy
[[is_cbr]] even if its branch condition is statically known or if its
fall-through target is the same as its taken target.

A typical instruction satisfying [[is_builtin]] is one representing an
action like [[va_start]] in C, which is often implemented by macro
expansion.

The two-operand constraint is typical of arithmetic instructions on CISC
machines like $x$86.  While the physical instruction uses one operand field
to express both a source and a destination, Machine SUIF uses distinct
source and destination operand fields, but requires the operands they hold
be equal.


\subsection{Accessors and mutators for [[Instr]]}
\label{sec:machine-instr-accessors-mutators}

In addition to the functions of [[Instr]] for accessing and
modifying operands, the OPI provides functions for getting and
setting the fields of particular kinds of instructions:

<<[[Instr]] field accessors>>=
int get_opcode(Instr *instr);
void set_opcode(Instr *instr, int opcode);

Sym* get_target(Instr *instr);
void set_target(Instr *instr, Sym *target);

LabelSym* get_label(Instr *instr);
void set_label(Instr *instr, LabelSym *label);
@

Their summaries:

\begin{quote}
[[get_opcode(instr)]] returns the opcode of [[instr]].	 \\
[[set_opcode(instr, opcode)]] replaces the opcode of [[instr]] with
	[[opcode]].							 \\
[[get_target(instr)]] returns the target of a [[CTI]] instruction.	 \\
[[set_target(instr, target)]] replaces the target of [[instr]] with
	[[target]].							 \\
[[get_label(instr)]] returns the label of a label instruction.		 \\
[[set_label(instr, label)]] replaces the label of [[instr]] with
	[[label]].
\end{quote}


\subsection{Print Function for [[Instr]]}
\label{sec:machine-instr-print}

Printing of instructions in a form acceptable to an assembler is naturally a
target-specific task.  It's the subject of Section~\ref{sec:printer}.
Here's a function that uses the target-specific mechanism to print a single
instruction, perhaps for debugging purposes.

<<[[Instr]] print function>>=
void fprint(FILE*, Instr*);
@


\subsection{Machine-instruction lists}
\label{sec:instr-list}

[[InstrList]] represents a simple sequence of instructions, so most of the
functions related to class [[InstrList]] are sequence manipulators.

<<[[InstrList]] functions>>=
InstrList* new_instr_list();
InstrList* to_instr_list(AnyBody*);
int instrs_size(InstrList *list);
InstrHandle instrs_start(InstrList *list);
InstrHandle instrs_last(InstrList *list);
InstrHandle instrs_end(InstrList *list);
void prepend(InstrList *list, Instr *instr);
void append(InstrList *list, Instr *instr);
void replace(InstrList *list, InstrHandle handle, Instr *instr);
void insert_before(InstrList *list, InstrHandle handle, Instr *instr);
void insert_after(InstrList *list, InstrHandle handle, Instr *instr);
Instr* remove(InstrList *list, InstrHandle handle);
@

Their summaries:

\begin{quote}
[[new_instr_list()]] returns a new [[InstrList*]] containing no
		instructions. \\
[[to_instr_list(body)]] returns a new [[InstrList*]] after moving the
	contents of [[body]] into that new list, leaving [[body]] empty. \\
[[instrs_size(list)]] returns the number of elements in [[list]]. \\
[[instrs_begin(list)]] returns a handle on the first element of [[list]]. \\
[[instrs_end(list)]] returns the sentinel handle for [[list]]. \\
[[prepend(list, instr)]] inserts [[instr]] at the beginning of [[list]]. \\
[[append(list, instr)]] inserts [[instr]] at the end of [[list]]. \\
[[replace(list, instr, handle)]] replaces the element at [[handle]] in
		[[list]] by [[instr]]. \\
[[insert_before(list, handle, instr)]] inserts [[instr]] before [[handle]] in
		[[list]]. \\
[[insert_after(list, handle, instr)]] inserts [[instr]] after [[handle]] in
		[[list]]. \\
[[remove(list, handle)]] removes and returns the instruction at
		[[handle]] in [[list]].
\end{quote}

Type [[InstrHandle]] is an abbreviation for a C++ sequence ``iterator''.
So it behaves like a pointer into a list of instructions.

<<[[InstrHandle]] definition>>=
typedef list<Instr*>::iterator InstrHandle;
@

Function [[to_instr_list]] serves to produce a ``go-between''
representation for the bodies of optimization units.  To convert the body
of an optimization unit, the generic type for which is [[AnyBody*]], to a
specific form that you need (such as [[Cfg*]]), you can apply
[[to_instr_list]] and then build your specific form from the resulting
linear list.

The [[instrs_]] functions are so named because the sequence of instruction
pointers in an [[InstrList]] is called [[instrs]].  These functions are
frequently used; since an [[InstrList]] contains only one sequence, we
can provide shorter names for these handle-related functions without
ambiguity.

<<[[InstrList]] function nicknames>>=
inline int
size(InstrList *instr_list)
{
    return instrs_size(instr_list);
}

inline InstrHandle
start(InstrList *instr_list)
{
    return instrs_start(instr_list);
}

inline InstrHandle
last(InstrList *instr_list)
{
    return instrs_last(instr_list);
}

inline InstrHandle
end(InstrList *instr_list)
{
    return instrs_end(instr_list);
}
@

To print the instructions of an [[InstrList]] using the conventions of the
prevailing target:

<<[[InstrList]] print function>>=
void fprint(FILE*, InstrList*);
@


\subsection{Header file [[instr.h]]}

The header file for instructions and instruction lists has the following
outline:

<<instr.h>>=
/* file "machine/instr.h" */

<<Machine-SUIF copyright>>

#ifndef MACHINE_INSTR_H
#define MACHINE_INSTR_H

#include <machine/copyright.h>

#ifndef SUPPRESS_PRAGMA_INTERFACE
#pragma interface "machine/instr.h"
#endif

#include <machine/substrate.h>
#include <machine/opnd.h>
#include <machine/machine_ir.h>

<<[[Instr]] operand accessors and mutators>>

<<[[InstrHandle]] definition>>

<<[[Instr]] creation functions>>

<<[[Instr]] field accessors>>

<<[[Instr]] predicates>>

<<[[Instr]] print function>>

<<[[InstrList]] functions>>

<<[[InstrList]] function nicknames>>

<<[[InstrList]] print function>>

#endif /* MACHINE_INSTR_H */
@
