Catacomb home page

This document will describe the use of Catacomb.

What is Catacomb?

Advanced compilers, such as parallelizing compilers, often emit calls to runtime libraries to perform complicated tasks. In a parallelizing compiler, one such complicated task is doing the communication analysis and buffer packing/unpacking. The compiler emits a call to a runtime library because the library is much easier to maintain as standalone C code than as code emitted by the compiler, and because the library interface is simple enough that it can be invoked with relatively few parameters. But because the library must be general purpose, it cannot normally be optimized as much as customized code that takes advantage of all the parameters known at compile time. More important, a runtime library is often not a suitable compiler interface for general source language constructs. [talk about the problems with composing multiple invocations of the runtime library]

Catacomb is a tool for generating and optimizing customized runtime libraries. The customized code takes advantage of parameters known at compile time, and solves the problem of composing multiple invocations of a runtime library. Catacomb is a compile-time module that interfaces with the source compiler. Instead of generating a call to a runtime library, the source compiler passes the parameters to Catacomb. Catacomb generates a customized C function to do the operation at run time, and the source compiler emits a call to this C function.

To generate the customized C code, Catacomb reads in external "stub files". The stub files contain both code to be generated and control constructs to direct the code generation. The control constructs, which Catacomb executes at compile time, can test properties of the input parameters, such as whether they are constants, symbols, array references, etc.; the control constructs also do the work (that the source compiler previously had to do) of composing multiple invocations of the algorithm.

The next section describes how to program in the Catacomb language. This assumes that the source compiler has already invoked a Catacomb stub with the appropriate arguments. The section thereafter describes the details of how the source compiler interfaces with Catacomb.

Quick overview

The source compiler invokes Catacomb by calling it with the name of a stub and a list of arguments to be passed to the stub. Catacomb executes the control constructs in the stub, leaving only C code to be emitted and executed at run time.

Catacomb executes the control constructs in a stub in the context of control variables and data variables. Data variables are the normal C variables that will be part of the emitted code. Control variables are only used during Catacomb's execution of the control constructs.

Control constructs consist of the control assignment statement (setting the value of a control variable), the control-IF statement (executing one branch and throwing out the other), the control-WHILE statement (repeat control execution of some code until the condition becomes false), the INCLUDE statement (execute another stub, passing arguments to it), and declarations (declare the type of a data variable, possibly in relation to the type of another variable).

Control variables are either global or local. Local variables are the symbols declared as LOCAL, and the stub arguments. References to other control variables are global. Control variables are statically scoped. When an INCLUDE is executed, Catacomb creates a new local symbol table, and the original local table is restored at the end of the INCLUDE execution. A stub can return values to the caller through "var" parameters. Var parameters are passed using "call by value-result" (i.e., copied in at the beginning and copied out at the end), and non-var parameters are passed using "call by value".

Catacomb language

Catacomb uses stubs to direct the generation of C code. Each stub file contains 0 or more stubs. If a syntax error occurs while reading and parsing a stub, the stub is discarded, but the rest of the stubs in the file are still parsed. As in C, whitespace separates tokens, comments (/* ... */) are ignored, and case is significant.

Catacomb stubs contain normal C code surrounded by control constructs. Not all C constructs are supported, but the syntax and semantics of the supported C constructs are the same as in the C language. The following C constructs are absent from Catacomb.

Stub wrapper

A single stub is a sequence of statements surrounded by a stub wrapper. Here is the syntax for the stub wrapper.
stub:    "STUB" identifier ( stubvar-list ) depth-opt local-opt { stmts }
stubvar: identifier
       | "var" identifier
depth: 	 "DEPTH" integer ;
local: 	 "LOCAL" identifier-list ;
Example:
STUB example(input1, input2, var result)
DEPTH 5;
LOCAL tmp1, tmp2, idx;
{
  /* lines of code */
}
The stub is declared on the first line. The stub name must be a normal identifier, as must its arguments. "var" arguments are copied back to the caller at the end of stub execution. An optional "depth" line specifies the maximum depth that this stub may be recursively included (see the include section below for details). The last line declares an optional list of scratch control variables, which are local to the stub. The "LOCAL" variables plus the stub arguments constitute the set of local control variables.

Control and data variables

Variables referenced in a stub are either control or data variables. Simply put, control variables are variables that appear on the LHS of a control assignment statement, and data variables are variables that appear on the LHS of a regular C assignment statement. Control variables are used to pass expressions as arguments to and from a stub, and they are set, updated, and tested during the course of stub execution. Control variables are either local (declared as LOCAL or a stub argument) or global (all others). Static scoping is used for control variables. Data variables, on the other hand, are considered to be global during the stub execution: two references to a data variable in different stubs actually refer to the same runtime variable.

Control constructs

control_construct: cifblock
                 | cwhileblock
                 | include
                 | decl
                 | ctrlassn
cifblock: "cif" ( expr ) stmt
        | "cif" ( expr ) stmt "else" stmt
cwhileblock: "cwhile" ( expr ) stmt
include: "include" simplevar ( expr_no_comma-list ) ;
decl: <see below>
ctrlassn: expr := expr ;
Examples:
int a[10][20];
int fun1();
double b;
void fun2();
cif (i == 3)
  {
    j := 0;
    cwhile (j < 10)
      {
        include teststub(i*2,j);
        j := j + 1;
      }
  }

Control assignment statement

The control assignment statement simply assigns a value to a control variable. The LHS must be a control variable, but the RHS expression can contain a combination of control and data variables.

Control IF statement

When executing a control-IF statement, the condition is first evaluated. The condition must evaluate to a constant; if not, Catacomb reports an error and exits. Depending on the value of the condition (either zero or nonzero), one branch of the control-IF is thrown out and stub execution continues in the other branch.

Control WHILE statement

When executing a control-WHILE statement, the condition is first evaluated. Like the control-IF, the condition must evaluate to a constant. If the condition evaluates to false, stub execution continues with the next statement after the control-WHILE construct. If the condition evaluates to true, Catacomb executes the body of the control-WHILE loop, and repeats until the condition evaluates to false. If any C code appears in the body, it will appear multiple times in the output file, depending on the number of iterations of the control loop. Note that a small programming error can lead to infinite execution of the loop, which Catacomb cannot in general detect.

A typical use of the control-WHILE construct is to loop over the dimensions of an array, producing a similar piece of code for each dimension.

Include statement

An INCLUDE statement can be thought of as a function call during stub execution. When executing an INCLUDE statement, Catacomb first evaluates the parameter list. Next, it looks up the code for the stub that is being included. If no such stub exists, Catacomb reports an error and exits. Otherwise, the current local control symbol table is put aside, and a new local symbol table is created consisting of the new stub's argument list plus its set of variables declared as LOCAL. The arguments are set to the corresponding values of the input parameter list. If the wrong number of input parameters is present, Catacomb reports an error and exits. Otherwise, it executes the new stub. Afterwards, it copies the "var" arguments back into the original parameters; if any of these parameters is not a simple control variable (e.g., a constant or an expression), Catacomb reports an error and exits. Execution of the original stub then continues where it left off.

A stub is allowed to include itself recursively. (In fact, this is the only way to create a loop nest whose depth is dependent on an input parameter.) Such a recursive include needs to be guarded by a control construct, usually a control-IF. But it is likely that once in a while, the stub programmer will accidentally use a regular IF statement to guard the recursion, which does not guard it at all. This programming error results in infinite inclusion. As a safeguard against infinite inclusion, Catacomb imposes an upper limit on how many instances of a stub may be recursively active. The default maximum depth is 20. Note that this limit applies separately to each stub; e.g., if two stubs are involved in a recursive include chain, the total include depth for the two of them combined can be up to 40. If the default of 20 is too small (or too large), the stub header allows the DEPTH specification, which sets the maximum number of times that this stub may be recursively active.

Declarations

By default, data variables have type int. Control variables are typeless, since they never appear in the output file. To give a variable a different type, use a declaration. Catacomb declarations are different from normal C declarations in the following respects:

Control expressions

The control constructs described above are all statement-level constructs. Catacomb also has two expression-level control constructs. It processes these constructs wherever it finds them, regardless of whether they are part of control statements or normal C statements.

Concatenation operator

Catacomb has an additional operator not present in C: the # operator, or the concatenation operator. It is a left-associative binary operator whose precedence is higher than any other C operator. This operator allows the stub writer to dynamically create new variable names. It is especially useful when looping over the dimensions of an array, creating a separate instance of a variable for each dimension.

The left portion of a concatenation expression must evaluate to a variable name. The right portion must evaluate to either an integer constant or a string constant. If the right portion is an integer constant, a new variable is created by taking the left variable name and appending an underscore character and the integer. If the right portion is a string constant, a new variable is created by taking the left variable name and appending the string. For example, x#3 becomes x_3, x#"blah" becomes xblah, and x#(1+2)#"blah" becomes x_3blah.

External functions

The control constructs presented above are missing an important functionality: the ability to test properties of the internal expression data structures. For example, a stub may need to test whether an expression is actually an array reference, and if so, how many dimensions the array has or the size of a given dimension. Or the stub might want to extract an individual subscript of an array reference. More often, a stub needs to test whether an expression evaluates to a constant, and produce special-case code if so.

Catacomb provides several external functions to handle this need. Each external function has a name (a normal identifier). When Catacomb sees a function call in a stub, it checks whether the function name matches one of the external functions. If so, it processes the function arguments, evaluates the external function with the arguments, and replaces the function call with the result of the external function evaluation. For example, CONSTANT(3) would be replaced by 1, and CONSTANT(x[3]) would be replaced by 0.

A list of the standard external functions, along with documentation, is provided elsewhere. Note that this is not necessarily a complete list, because Catacomb is designed so that new data structures, along with associated external functions for accessing the data structures, can be easily added.

Catacomb execution semantics

When Catacomb executes a stub, it starts with the first line of the stub. It executes the stub statement by statement until the end of the stub. There are only three ways that this control flow can be modified: As the stub is being processed, if control variables appear in any expression, their control values are substituted in, and if any control expressions appear, they are evaluated and substituted.

Even if a control construct appears within a non-control construct that is obviously not executed, Catacomb still executes the control construct. For example, in the following code:

x := 0;
if (0) {
  ...
  x := 1;
  }
The value of control variable x at the end is 1, even though the code in the IF branch clearly will not be executed at run time.

In this sense, Catacomb acts much like a macro processor, in that it follows its own sequential flow of control when doing substitutions. In fact, the INCLUDE construct is similar to the C preprocessor's #include statement, the control assignment statement is similar to #define, and the control-IF is similar to #if or #ifdef.

But Catacomb has a feature that makes it much more powerful than a macro processor. In conjunction with the processing of the control constructs, Catacomb also analyzes the C code and does constant and copy propagation. If a data variable appears within the condition of a control construct, and Catacomb can prove that the data variable has a valid propagation value at that point, the propagation value will be used in the condition instead. For example, consider the code:

x = 3;
cif (CONSTANT(x))
  include special_case(x);
else
  include common_case(x);
In the CONSTANT(x) test, the value of 3 will be substituted for x and the special_case stub will be included, even though x is not a control variable. (CONSTANT() is an external function that tests whether its input evaluates to a constant. It always returns either 0 or 1.)

A useful construct to use in stub programming is:

cif (CONSTANT(v) && v == 3)
  /* Do code for the special v==3 case. */
else
  /* Do the general-purpose code. */
We can't omit CONSTANT(v) from the condition, because if v is not a constant, then "v==3" also will not evaluate to a constant.

Appendix: Writing stub files with Emacs

This information is only valid when using cc-mode under xemacs.

The C-like syntax of the stub files makes it relatively simple to use emacs for automatically indenting and highlighting the stub code. To make this work correctly using cc-mode under xemacs, put the following in your .emacs file.

(setq auto-mode-alist (append '(("\\.ccom$" . c-mode)) auto-mode-alist))
(require 'font-lock)
(add-hook 'c-mode-hook 
  '(lambda ()
     (setq c-conditional-key
       "\\b\\(for\\|if\\|cif\\|do\\|else\\|while\\|cwhile\\|switch\\)\\b[^_]")
     ))
(setq c-font-lock-keywords-2
      (cons
       "[ \t]\\(cwhile\\|cif\\|else cif\\)[ \t\n(){};,]\\|[ \t]include[ \t]"
       c-font-lock-keywords-2))

(defun c-backward-to-start-of-if (&optional lim)
  ;; Move to the start of the last "unbalanced" if and return t.  If
  ;; none is found, and we are looking at an if clause, nil is
  ;; returned.  If none is found and we are looking at an else clause,
  ;; an error is thrown.
  (let ((if-level 1)
	(here (c-point 'bol))
	(case-fold-search nil)
	(lim (or lim (c-point 'bod)))
	(at-if (looking-at "\\(c\\|\\)if\\b[^_]")))
    (catch 'orphan-if
      (while (and (not (bobp))
		  (not (zerop if-level)))
	(c-backward-syntactic-ws)
	(condition-case nil
	    (backward-sexp 1)
	  (error
	   (if at-if
	       (throw 'orphan-if nil)
	     (error "No matching `if' found for `else' on line %d."
		    (1+ (count-lines 1 here))))))
	(cond
	 ((looking-at "else\\b[^_]")
	  (setq if-level (1+ if-level)))
	 ((looking-at "\\(c\\|\\)if\\b[^_]")
	  ;; check for else if... skip over
	  (let ((here (point)))
	    (c-safe (forward-sexp -1))
	    (if (looking-at "\\<else\\>[ \t]+\\<\\(c\\|\\)if\\>")
		nil
	      (setq if-level (1- if-level))
	      (goto-char here))))
	 ((< (point) lim)
	  (setq if-level 0)
	  (goto-char lim))
	 ))
      t)))