Translating ANSI C Into Asynchronous Circuits
Async 2004 Tutorial

April 19, 2004, Hersonissos, Crete, Greece



Presented by: Mihai Budiu, Girish Venkataramani, Tiberiu Chelcea and Seth Goldstein
Computer Science Department, Carnegie Mellon University
{mihaib,girish,tibi,seth}@cs.cmu.edu

Abstract: In this tutorial we present a compilation framework for automatically translating ANSI C programs into pipelined asynchronous circuits. The framework is embodied in the CASH compiler, a Compiler for Application-Specific Hardware. CASH generates dataflow machines implemented as asynchronous circuits that directly implement the source program, without using any interpretative structures. compilation


Contents


Tutorial Structure

This tutorial is composed of three parts. The length of the tutorial is half a day.

1. High-level compilation

The first part describes the compilation methodology and internal representation of CASH. CASH uses Suif for parsing the C source files, but uses Pegasus, a custom internal representation that represents the C program as a dataflow machine; the order of operations with side effects is ensured using explicit synchronization handshakes. CASH relies extensively on predication and speculation for exploiting instruction-level parallelism. CASH performs a wide range of program optimizations, including traditional scalar optimizations (common-subexpression elimination, dead code elimination, strength reduction, etc.), memory optimizations (partial redundancy elimination for memory, register promotion), low-level optimizations (Boolean simplification using Espresso, data width analysis and reduction), and asynchronous circuits optimizations (pipeline balancing, lenient operation implementation).

2. Asynchronous back-end

The second part describes CAB, the CASH Asynchronous Back-end, which translates the Pegasus intermediate representation into asynchronous circuits. CAB's compilation targets medium-grained, non-linear micropipelined implementations, which communicate using four-phase, bundled-data handshaking. CAB also performs some peephole optimizations, builds a memory access network (which arbitrates the operations on the global program memory), and performs technology mapping for selected parts of the circuit (such as the control part in each pipeline stage, and the memory access networks). The output of CAB is a mixture of gate-level technology-mapped circuits (currently, targeted only to the ST Micro .18um commercial library), and behavioral descriptions; the latter are used exclusively for datapath operations, which are synthesized using Synopsys Design Compiler (running on Solaris). The circuits are placed and routed using Cadence Silicon Ensemble Ultra (also running on Solaris). Low-level Verilog simulations show that circuits synthesized from Mediabench kernels sustain high performance (up to 1000 millions of useful (i.e. non-speculative) arithmetic operations per second) with extremely low power (up to 100 arithmetic operations per nanoJoule).

3. Demo

The third part of the tutorial is a hands-on demonstration of the capabilities of CASH. Attendees are shown how to carry selected C kernels through all compilation steps, and how compilation options influence the output. Attendees will also compile and simulate their own C implementation of a DSP application. Documentation about the intended behavior of the DSP application will be provided with the tutorial notes, and the presenters will be available to help the attendees. High-level visual debugging is facilitated by the graphical CASH back-end and by a series of trace-processing Perl scripts, which can provide high-level animations of the resulting circuits. After synthesis, circuit-level simulation can be performed to estimate speed, using Model Technologies' Verilog Simulator vsim, and power, using Synopsys' Design Compiler dc_shell.

4. Hardware and Software Resources

The original tools developed as part of the CASH project are currently running on Linux PC workstations. To compile and install these tools on the available Linux workstations, gcc-2.95 is needed. In addition, for synthesis, place and route, and Verilog simulation, the CASH design flow uses Synopsys Design Compiler, Cadence Silicon Ensemble, and Model Technologies' vsim, all of which run on Solaris SUN Workstations.

Invitation for demo examples

Tutorial participants are invited to submit examples of their own to be compiled to asynchronous circuits with our tool-chain. The examples should be submitted at least one week before the tutorial. Each example should be a complete ANSI C application, with reference input and output. The participants can indicate one leaf function which should be translated to hardware (alternatively, we can support a set of functions that can be inlined to generate a single leaf function). We are placing the following restrictions on the input: Send your examples by email to mihaib+async@cs.cmu.edu.

Bibliography