Reconfigurable Computing Seminar
Carneigie Mellon University
 
15-828/18-847 Spring 1998 January 12
 
Lab1
 
Due: 11:59 PM January 30, 1998


 
 

Outline:

  1. Introduction
  2. Tool Flow and Setup
  3. Multiplier HDL and Test Harness
  4. Redesigning the 2-operand Multiplier
  5. Designing a Constant Multiplier
  6. What you should handin


1. Introduction:

The purpose of this lab is to give everyone an introduction to the design flow for commercial FPGAs, including work with hardware description languages, simulation tools, synthesis tools, and place and route tools. In addition, you will explore different multiplier structures and witness the benefits of pipelining and constant propogation.


2. Tool Flow and Setup:

The basic tool flow for this lab is as follows: simulation, synthesis, and physical design. There are two tools available for each one of these steps, and two platforms that they will run on: the rs_aix boxes that are in HH1000 and many ECE grad students' desks, and SPARCs (running Solaris) that are in andrew clusters. The tools that we have for each stage, as well as the platform that it is supported on are listed in the table below.  (Note, we have not tested the lab on the wintel boxes.)
 
Tool Vendor AIX Solaris WINTEL
Simulation 
Verilog-XL Cadence X X
Leapfrog VHDL Cadence X X
Synthesis 
synplify Synplicity X
X
design_analyzer Synopsys X
FPGA Physical Design 
dsgnmgr Xilinx X
X
XDM Xilinx X
Why all the choices?
  1. To spread the compute load.
  2. To vary the results obtained.
  3. To allow people to do as much as possible on their own machine, if they have one.
We will describe how to use each one of these tools. Here is a broad brush comparison of the compatible tools.

The Bottom Line: If you're new to this, and you don't have an AIX box on your desk, we recommend the Solaris tools for synthesis and physical design. You should be able to swap back and forth between platforms. Both Synopsys and Symplify accept VHDL and Verilog, both Xilinx tools should accept the Xilinx Netlist Format (XNF) files that are output from both synthesis tools. This SHOULD work, but in preparing this assignment we only tested the interface between the tools on one platform. You're on your own if you play on both platforms.

To set yourself up for the two platforms, we've developed two shell scripts that should set your paths and environmental variables correctly to run all these programs. Save these scripts from your browswer into your home directory.

For Solaris Machines:
setvar847.sun4_55

For AIX Machines:
setvar847.rs_aix41

If plan to use Synopsys, copy the following file to your working directory:
.synopsys_dc.setup

Everytime that you start working, you should source the setvar file by typing

% source setvar847.`sys`
from your unix prompt. If you're running remotely, make sure you setenv DISPLAY and xhost properly to allow X-windows interfaces to work. If you don't know how to do this, see Appendix B.


3. Multiplier HDL and Test Harness

In the first part of this lab, you will take a 2-operand 12-bit multiplier through the entire simulation and synthesis path. There is minimal design in this section. Its primarily intended to teach the flow through these tools, for those of you who may have never done this before.

A. Simulation: Circuit under test and test harness.

Most of the time when you want to create a simulation model for a design, you create a description of the design, as well as a description of a tester for the design that makes sure things are working correctly. We will break these up into two files:

Verilog:

harness.v
mult1.v
VHDL:
Mult_VHDL.tar.gz
Copy these files into your working directory. If your doing VHDL, uncompress (gunzip) and untar (tar xvf Mult_VHDL.tar.gz)

The interface between the Harness and Mult1 is simple. Mult1 has four inputs: a and b, which are twelve-bits; valid, which is a single bit indicating (when it is one) that the operands a and b are valid; and clk, which is the clock for the design. Mult1 has one output: c, which is the 24-bit product of a and b. Harness drives all the inputs of Mult1, therefore it has four outputs, and receives the single input from Mult1.

The harness has two parameters: latency and period. Latency describes how long it takes the design to output a result of a multiplication, and period describes how often, in clock cycles, the multiplier can be expected to accept operands. Mult1 has a period of one and a latency of four. In general every "period" cycles, the harness asserts valid, indicating that the multiplier should multiply the values that are on the a and b pins. "Latency" cycles later, Harness makes sure that the values coming out of the multiplier on the c bus equal the product of those previous a and b values. Note that if latency is greater than period, there are multiple multiplications going on in the multiplier at the same time. For now, you don't have to modify the values of latency and period, but you will later in the lab.

The Mult1 file describes a multiply circuit with a latency of four. This latency is implemented by having four cascaded registers that delay the product. The multiplication is described using the (*) operator.

Running the simulator:

For Verilog

For VHDL

Questions:

  1. Currently, the multiplication takes place in the same cycle that the inputs are received. Rewrite this module so that the multiplication happens on the second cycle after operands arrive. What effect should this modification have on the maximum clock speed of the implementation? Will it have any effect on the size of the implementation?
  2. Blocking and non-blocking assignments (answer only if you used Verilog): Change all the procedural assignments in the multiplier module to blocking ones (use the = operator rather than the <=). Why doesn't the simulation work? Can you re-write this description so that it works with blocking assignments? (Hint: no new code is needed.) Are there any risks to creating descriptions in this way? (See the on-line documentation for a discussion on non-blocking assignment.)

What you should hand in:

  1. The postscript of the simwave output in simwave.ps
  2. Answers to question 1 in written.txt  (head your answer as part 3A.1)
  3. Answers to question 2 in written.txt (head you answers as part 3A.2)
  4. A copy of the verilog using blocking assignment in the file mult2.v

B. Synthesis:

How it works: The * operator in Mult1 gets mapped to internal module generators that generate the netlist for a multiplier. This mapping may vary based on the path used. The c1, c2 and c3 elements get mapped to internal registers. Therefore, the structure of the implementation will depend a great deal on the structure of the specification. This may be not be the right thing for reconfigurable computing, but in the next section you will use this fact to modify the structure of the multiplier.

More detailed instructions for synthesis depend on the tool you are using.

Instructions for Synplify
Instructions for Synopsys
 

What you should hand in:

In the file written.txt as part 3B: Report the estimated number of CLBs, and registers, FMAPs and HMAPS if you ran Synplify, and the estimated length of the estimated critical path.  The reason the CLBs is an estimate is that the synthesis tool does not know what is inside a Xilinx CLB and instead maps the design to four-input LUTs (FMAPs) and three-input LITS (HMAPs) leaving the packing of the CLBs to the Xilinx technology mapper.

C. Xilinx Physical Design:

In this final stage of design, the Xilinx tools do a final mapping of logic to CLBS, and place and route the design. You'll do this, as well as run analysis tools to determine the maximum clock frequency that your design can operate on.

Instructions for dsgnmgr

Instructions for xmake and XDM
 

What you should hand in:

  1. In written.txt as part 3C, report the number of FMAPs, HMAPs, Packed CLBs, registers, and the length of the critical path in nanoseconds.
  2. Extra credit: Resynthesize, place and route the description you wrote for question one. How fast and big is the new design?  (Call this file mult3.v and place answers in written.txt as part 3C-extra).


4. Redesigning the Two-Operand Multiplier

Create one redesign of the two-operand multiplier. The multiplier you design must run with the test harness from section 3, although you may select the latency and frequency of the multiplier. You may target any Xilinx 4000E series device, with speed grade -3, that is supported by the tools. We want some variety of implementations. We plan to produce a graph of throughput vs. area of every implementation generated by the class. To encourage variety, the grading for this section will be based, in part, in how far away you are from the convex-hull of the solutions generated by the whole class. Therefore, if you're off in a lunatic portion of the design space (super fast, super small), you'll get a better grade than if you do something more conventional. Your design should be significantly faster or smaller than the Mult1 built in the previous section.

Suggestions:

The test criteria:

Your Verilog or VHDL MUST run with Harness.v or Harness.vhd. The only thing you may change in Harness is the two parameters: PERIOD and LATENCY.

What you should hand in:

  1. Verilog (or VHDL) description of multiplier in mult4.v.  Include as a comment the latency and frequency settings for test harness.
  2. Postscript out of simwaves timing diagram in simwave2.ps
  3. Describe any variations to the synthesis or P&R flow that you used in written.txt part 4. Also report the target FPGA, the number of CLBs, registers, and the length of the critical path. Basically, give us the results, plus enough information to duplicate the work that you did.
  4. Turn in any software that you wrote to assist your design of this task.


5. Designing a Constant Multiplier

If one operand to a multiplier is a constant, the logic required to perform a multiplication is significantly reduced.

Design the verilog for a multiplier for the following twelve bit constants:

3171 (binary: 1100 0110 0011)
2426 (binary: 1001 0111 1010)

You should re-write both the harness and the multiplier files, and simulate with at least 300 random vectors. Like in the last section, you determine period, latency and the targetted FPGA (as long as it is in the Xilinx 4000E family with -3 speed grade). We will again give better grades to more extreme designs in the throughput and area space. Here are implementation some suggestions:

You will need to modify the test harness so that it tests for constant multiplication.

What you should hand in:

  1. Verilog (or VHDL) description of multiplier in mult4.v. Include as a comment the latency and frequency settings for test harness.
  2. Postscript out of simwaves timing diagram in simwave2.ps
  3. Describe any variations to the synthesis or P&R flow that you used in written.txt part 4. Also report the target FPGA, the number of CLBs, registers, and the length of the critical path. Basically, give us the results, plus enough information to duplicate the work that you did.
  4. Turn in any software that you wrote to assist your design of this task.

6. How You Should Handin Your Work

Run the program /afs/cs/academic/15828/bin/handin -lab 1 [path]. If is a directory the entire directory will be turned in. If it is a file, the file will be turned in. You can run handin as many times as you want. The last copy will be what we evaluate.


Appendices:

A. On-line Documentation:

For Verilog and VHDL information run openbook.  The following volumes may be relevant: For Synopsys tools, run iview.

For help with dsgnmgr, run hyperhelp:

hyperhelp (xdsgn) /afs/ece/common/local/usr/supported/xilinx/M13/sol/usenglish/*.hlp

B. Xhosting

  1. Telnet to a Solaris box (machine1) from your X terminal (machine2):
  2. machine2% telnet far-sun4.andrew.cmu.edu

    In order to find out the name of the Solaris box (machine1):

    machine1% hostname

  3. Klog into ECE:
  4. machine1% klog UID@ece.cmu.edu

    Where UID is your ECE user id.

  5. Source the setup file:
  6. machine1% source setvar847.sun4_55
  7. Set the DISPLAY variable:
  8. machine1% setenv DISPLAY machine2:0.0

    Remember, machine2 is the hostname of the machine that runs your X display. You may have to add the ".ece.cmu.edu" suffix.

  9. Enable xhosting:
  10. machine2% xhost +machine1

    Where machine1 is the hostname of the Solaris box. You may have to add the .andrew.cmu.edu suffix to the name you get from hostname.