Reconfigurable Computing: Lab1

Reconfigurable Computing Seminar Carneigie Mellon University

15-828/18-847	Spring 1998	January 12

Lab1

Due: 11:59 PM January 30, 1998

Outline:

Introduction
Tool Flow and Setup
Multiplier HDL and Test Harness
Redesigning the 2-operand Multiplier
Designing a Constant Multiplier
What you should handin

1. Introduction:

The purpose of this lab is to give everyone an introduction to the design flow for commercial FPGAs, including work with hardware description languages, simulation tools, synthesis tools, and place and route tools. In addition, you will explore different multiplier structures and witness the benefits of pipelining and constant propogation.

2. Tool Flow and Setup:

The basic tool flow for this lab is as follows: simulation, synthesis, and physical design. There are two tools available for each one of these steps, and two platforms that they will run on: the rs_aix boxes that are in HH1000 and many ECE grad students' desks, and SPARCs (running Solaris) that are in andrew clusters. The tools that we have for each stage, as well as the platform that it is supported on are listed in the table below. (Note, we have not tested the lab on the wintel boxes.)

Tool	Vendor	AIX	Solaris	WINTEL
Simulation
Verilog-XL	Cadence	X	X
Leapfrog VHDL	Cadence	X	X
Synthesis
synplify	Synplicity		X	X
design_analyzer	Synopsys	X
FPGA Physical Design
dsgnmgr	Xilinx		X	X
XDM	Xilinx	X

Why all the choices?

To spread the compute load.
To vary the results obtained.
To allow people to do as much as possible on their own machine, if they have one.

We will describe how to use each one of these tools. Here is a broad brush comparison of the compatible tools.

The Bottom Line: If you're new to this, and you don't have an AIX box on your desk, we recommend the Solaris tools for synthesis and physical design. You should be able to swap back and forth between platforms. Both Synopsys and Symplify accept VHDL and Verilog, both Xilinx tools should accept the Xilinx Netlist Format (XNF) files that are output from both synthesis tools. This SHOULD work, but in preparing this assignment we only tested the interface between the tools on one platform. You're on your own if you play on both platforms.

To set yourself up for the two platforms, we've developed two shell scripts that should set your paths and environmental variables correctly to run all these programs. Save these scripts from your browswer into your home directory.

For Solaris Machines:
setvar847.sun4_55

For AIX Machines:
setvar847.rs_aix41

If plan to use Synopsys, copy the following file to your working directory:
.synopsys_dc.setup

Everytime that you start working, you should source the setvar file by typing

% source setvar847.`sys`

from your unix prompt. If you're running remotely, make sure you setenv DISPLAY and xhost properly to allow X-windows interfaces to work. If you don't know how to do this, see Appendix B.

3. Multiplier HDL and Test Harness

In the first part of this lab, you will take a 2-operand 12-bit multiplier through the entire simulation and synthesis path. There is minimal design in this section. Its primarily intended to teach the flow through these tools, for those of you who may have never done this before.

A. Simulation: Circuit under test and test harness.

Most of the time when you want to create a simulation model for a design, you create a description of the design, as well as a description of a tester for the design that makes sure things are working correctly. We will break these up into two files:

Verilog:

harness.v
mult1.v

VHDL:

Mult_VHDL.tar.gz

Copy these files into your working directory. If your doing VHDL, uncompress (gunzip) and untar (tar xvf Mult_VHDL.tar.gz)

The interface between the Harness and Mult1 is simple. Mult1 has four inputs: a and b, which are twelve-bits; valid, which is a single bit indicating (when it is one) that the operands a and b are valid; and clk, which is the clock for the design. Mult1 has one output: c, which is the 24-bit product of a and b. Harness drives all the inputs of Mult1, therefore it has four outputs, and receives the single input from Mult1.

The harness has two parameters: latency and period. Latency describes how long it takes the design to output a result of a multiplication, and period describes how often, in clock cycles, the multiplier can be expected to accept operands. Mult1 has a period of one and a latency of four. In general every "period" cycles, the harness asserts valid, indicating that the multiplier should multiply the values that are on the a and b pins. "Latency" cycles later, Harness makes sure that the values coming out of the multiplier on the c bus equal the product of those previous a and b values. Note that if latency is greater than period, there are multiple multiplications going on in the multiplier at the same time. For now, you don't have to modify the values of latency and period, but you will later in the lab.

The Mult1 file describes a multiply circuit with a latency of four. This latency is implemented by having four cascaded registers that delay the product. The multiplication is described using the (*) operator.

Running the simulator:

For Verilog

For VHDL

Questions:

Currently, the multiplication takes place in the same cycle that the inputs are received. Rewrite this module so that the multiplication happens on the second cycle after operands arrive. What effect should this modification have on the maximum clock speed of the implementation? Will it have any effect on the size of the implementation?
Blocking and non-blocking assignments (answer only if you used Verilog): Change all the procedural assignments in the multiplier module to blocking ones (use the = operator rather than the <=). Why doesn't the simulation work? Can you re-write this description so that it works with blocking assignments? (Hint: no new code is needed.) Are there any risks to creating descriptions in this way? (See the on-line documentation for a discussion on non-blocking assignment.)

What you should hand in:

The postscript of the simwave output in simwave.ps
Answers to question 1 in written.txt (head your answer as part 3A.1)
Answers to question 2 in written.txt (head you answers as part 3A.2)
A copy of the verilog using blocking assignment in the file mult2.v

B. Synthesis:

How it works: The * operator in Mult1 gets mapped to internal module generators that generate the netlist for a multiplier. This mapping may vary based on the path used. The c1, c2 and c3 elements get mapped to internal registers. Therefore, the structure of the implementation will depend a great deal on the structure of the specification. This may be not be the right thing for reconfigurable computing, but in the next section you will use this fact to modify the structure of the multiplier.

More detailed instructions for synthesis depend on the tool you are using.

Instructions for Synplify
Instructions for Synopsys

What you should hand in:

In the file written.txt as part 3B: Report the estimated number of CLBs, and registers, FMAPs and HMAPS if you ran Synplify, and the estimated length of the estimated critical path. The reason the CLBs is an estimate is that the synthesis tool does not know what is inside a Xilinx CLB and instead maps the design to four-input LUTs (FMAPs) and three-input LITS (HMAPs) leaving the packing of the CLBs to the Xilinx technology mapper.

`C. Xilinx Physical Design:`

In this final stage of design, the Xilinx tools do a final mapping of logic to CLBS, and place and route the design. You'll do this, as well as run analysis tools to determine the maximum clock frequency that your design can operate on.

Instructions for dsgnmgr

Instructions for xmake and XDM

What you should hand in:

In written.txt as part 3C, report the number of FMAPs, HMAPs, Packed CLBs, registers, and the length of the critical path in nanoseconds.
Extra credit: Resynthesize, place and route the description you wrote for question one. How fast and big is the new design? (Call this file mult3.v and place answers in written.txt as part 3C-extra).

4. Redesigning the Two-Operand Multiplier

Create one redesign of the two-operand multiplier. The multiplier you design must run with the test harness from section 3, although you may select the latency and frequency of the multiplier. You may target any Xilinx 4000E series device, with speed grade -3, that is supported by the tools. We want some variety of implementations. We plan to produce a graph of throughput vs. area of every implementation generated by the class. To encourage variety, the grading for this section will be based, in part, in how far away you are from the convex-hull of the solutions generated by the whole class. Therefore, if you're off in a lunatic portion of the design space (super fast, super small), you'll get a better grade than if you do something more conventional. Your design should be significantly faster or smaller than the Mult1 built in the previous section.

Suggestions:

Highly pipelined array multiplier: In both synthesis tools, the + operator uses the fast carry logic present in Xilinx FPGAs. Build an array multiplier using a set of adders. You may want to experiment with the width of the adders that you use (12 bit adders might not be optimal.)
Pipelined Wallace tree multiplier: Generate the logic for a Wallace tree multiplier and pipeline it. Use AND gates for the partial product generation, full adders for the partial product reduction, and a large adder (using the + operator) for the final adder.
A pipelined Ferrari-Stefanelli multiplier. This uses a bigger multiplier (say 2x2) to generate partial products. Then you can use a Wallace tree reduction and final adder.
Shift-and-add serial multiplier.
Serial-serial multiplier.

The test criteria:

Your Verilog or VHDL MUST run with Harness.v or Harness.vhd. The only thing you may change in Harness is the two parameters: PERIOD and LATENCY.

What you should hand in:

Verilog (or VHDL) description of multiplier in mult4.v. Include as a comment the latency and frequency settings for test harness.
Postscript out of simwaves timing diagram in simwave2.ps
Describe any variations to the synthesis or P&R flow that you used in written.txt part 4. Also report the target FPGA, the number of CLBs, registers, and the length of the critical path. Basically, give us the results, plus enough information to duplicate the work that you did.
Turn in any software that you wrote to assist your design of this task.

5. Designing a Constant Multiplier

If one operand to a multiplier is a constant, the logic required to perform a multiplication is significantly reduced.

Design the verilog for a multiplier for the following twelve bit constants:

3171 (binary: 1100 0110 0011)
2426 (binary: 1001 0111 1010)

You should re-write both the harness and the multiplier files, and simulate with at least 300 random vectors. Like in the last section, you determine period, latency and the targetted FPGA (as long as it is in the Xilinx 4000E family with -3 speed grade). We will again give better grades to more extreme designs in the throughput and area space. Here are implementation some suggestions:

Use multiple adders, taking advantage of the fast-carry logic.
Use look-up tables to create small (4-bit input) constant multipliers, and add the results up.

You will need to modify the test harness so that it tests for constant multiplication.

What you should hand in:

Verilog (or VHDL) description of multiplier in mult4.v. Include as a comment the latency and frequency settings for test harness.
Postscript out of simwaves timing diagram in simwave2.ps
Describe any variations to the synthesis or P&R flow that you used in written.txt part 4. Also report the target FPGA, the number of CLBs, registers, and the length of the critical path. Basically, give us the results, plus enough information to duplicate the work that you did.
Turn in any software that you wrote to assist your design of this task.

6. How You Should Handin Your Work

Run the program /afs/cs/academic/15828/bin/handin -lab 1 [path]. If is a directory the entire directory will be turned in. If it is a file, the file will be turned in. You can run handin as many times as you want. The last copy will be what we evaluate.

Appendices:

A. On-line Documentation:

For Verilog and VHDL information run openbook. The following volumes may be relevant:

Verilog-XL Reference
Verilog-XL Tutorial
Verilog-XL Users Guide
LeapFrog VHDL Simulator Reference
LeapFrog VHDL Simulator User Guide

For Synopsys tools, run iview.

For help with dsgnmgr, run hyperhelp:

hyperhelp (xdsgn) /afs/ece/common/local/usr/supported/xilinx/M13/sol/usenglish/*.hlp

B. Xhosting

Telnet to a Solaris box (machine1) from your X terminal (machine2):

machine2% telnet far-sun4.andrew.cmu.edu
In order to find out the name of the Solaris box (machine1):
machine1% hostname

Klog into ECE:

machine1% klog UID@ece.cmu.edu
Where UID is your ECE user id.

Source the setup file:

machine1% source setvar847.sun4_55

Set the DISPLAY variable:

machine1% setenv DISPLAY machine2:0.0
Remember, machine2 is the hostname of the machine that runs your X display. You may have to add the ".ece.cmu.edu" suffix.

Enable xhosting:

machine2% xhost +machine1
Where machine1 is the hostname of the Solaris box. You may have to add the .andrew.cmu.edu suffix to the name you get from hostname.