18-760 VLSI CAD Project #3
laPlace *: A Recursive Min-Cut Placement Tool

Suraj Sudhir
Dept. of Electrical and Computer Engg
Carnegie Mellon University
Pittsburgh PA 15213
ssudhir@ece.cmu.edu

 
Table of Contents

Problem Formulation
    Introduction
    Design Goals
    Optimizations

Data Structures
    Gates
    Nets
    Pads
    Timing Paths

Min-Cut Strategy
    Component Setup
    Partitioning
    Legalization
       Basic Strategies
       Intelligent Strategies

Four way Partitioning

Implementation

Performance Results and Analysis

Something Cool

Source Code

Contact Information

References


Problem Formulation

Introduction
  The goal of this project is to implement a placer for ASIC designs. There are three basic categories of placement tools for this purpose - iterative improvement, quadratic placement and recursive mincut placement. For this project I implemented a recursive mincut placement technique for the placement of a given netlist. The netlist specification comprises the description of the chip including details such as its dimensions, number of gates, nets and pads, as well ad connectivity information.

    Various placement requirements and constraints have to be taken into account. Constraints include the number of gates that can occupy a single grid site on the chip and the number of pins that can fit into a single pinsite on the corner of the chip. Placement requirements include minimizing the wirelength, delay and congestion. Different chip placement goals may require placement tailored to its needs.

Design Goals
    The program was written in C with a GUI based output display in Tcl/Tk. It can be run in normal or stepping mode. In the normal mode the completely placed chip is displayed, whereas with stepping, on each step of recursion the position of the gates is shown, indicating how the placement takes place. Many configuration parameters were available to control the placement parameters like wirelength, delay and congestion minimization.

     I implemented laPlace with a number of requirements in mind. My initial target was to implement a basic recursive mincut algorithm that placed all gates in legal positions, without concern for optimization of any parameter. I felt that this approach, as opposed to first optimizing and then legalizing, would provide much more flexibility to add other features, without having to legalize taking into account all the different optimization strategies used.

Optimizations
    Once the initial requirement was satisfied, I concentrated on implementing a placer that would handle the bigger benchmarks well, rather than focus on optimizing the smaller ones. For this I defined a set of cost parameters and set up the hypergraph partitioner to generate partitions that yield a placement satisfying the specific requirement.

    The rationale for this is that bigger benchmarks may have special 'structure' in their placement that could not be extracted by a simplified placer that would make too many assumptions to recognize that some benchmarks can be placed better with parameters that may not work well with others.

    For example a benchmark may have a high degree of clustering in 4 different sections of the chip. Such a benchmark would be best placed with a 4 way partitioning scheme together with some clustering techniques than by a naive bipartitioning technique that would not see this structure.

    This kind of view makes sense, because chips are not a homogenous bunch of gates and nets but are definitely partitioned into sections, such as CPU, register files and on-chip caches, all connected by nets. Here's a picture of the Intel Pentium II that shows this. A partitioning approach that can see such a structure better would be more capable of returning a good placement.

    Identifying this structure need not be totally a case of trial and error. It is possible to use some intelligent techniques and tradeoffs that will help achieve e good placement result. The key is to have some way of extracting the locality or groupings of the gates in order to be able to place them close to each other.

    Here are two figures that show how this can matter. Figure A is a placement for fract generated by laPlace with a naive set of parameters, while Figure B is that of a placement with a more detailed set of parameters, which does 4 way partitioning and clustering optimization. For clarity only the critical timing path lines are shown, but the more structured layout in Figure B is quite visible. The placement in Figure B generated much superior results compared to Figure A (wirelength 884 vs 955).

   Thus laPlace allows a number of parameterized specifications, described in Implementation,  that allow the user to generate placements that handle various requirements, from minimizing wirelength to obtaining the best possible congestion flatness. Having many different parameters allowed laPlace to improve placement in benchmarks where the default settings would have resulted in inferior placement.

Data Structures
  This section describes the data structures utilized in laPlace. There are three basic structures which hold data for the the three distinct components comprising the placement task - gates, nets and pads. Further I have implemented another data structure that holds the information about the critical timing paths in the circuit. I describe the data used for each component here.

Gates
The data structure for gates holds the following information about each gate on the chip

  •     partitionID : partition where gate is located during mincutting
  •     gateX, gateY : (x,y) position of gate for wirelength/delay computation
  •     numNets : number of nets connected to this gate
  •     gateNets : list of all nets connected to this gate
  • Nets
    The data structure for nets holds the following information:
  •     numVertices : number of vertices (gates and pads) connected to this net
  •     vertexList : list of all vertices connected. This comprises -

  •           vertexID : index of the vertex
              vertexType : type of vertex ( gate or pad )
  •     netLen : length of the net
  •     netDelay : delay of the net
  • Pads
    The data structure for the pads holds this information:
  •     padX, padY : (x,y) position of the pad
  •     numNets : number of nets connected to this pad
  •     netList : list of all the nets connected to this pad
  • Timing Paths
    The following information about the timing paths is stored in its structure:
  •     numComp : number of components (nets, gates and pads) on this path
  •     pathList : list of component IDs of all devices on this path
  •     Each of these structures are organized as one-dimensional array of structures containing information about each component. As a result access time for any component in its array is O(1). Memory requirement scales linearly with the number of components, or as O(N) for each component, excluding the connected list of each component, which varies in size.

        I made a tradeoff by keeping the computed values of net lengths and delays in the structure (rather than computing them each time) for two reasons:

  • They are needed frequently. The access pattern indicated that they are read more often than they're written. So it made sense not to compute them every time.
  • Though they occupy nearly half of the total space for each net, the total memory usage for this structure is reasonable in comparison to the total available memory. For example the industry2 benchmark has 13000 odd nets which translates to 156KB of storage for this structure, excluding the list of connected gates.

  • Min-Cut Strategy

    laPlace implements both a bipartitioning and a 4 way or quadpartitioning strategy. They are essentially the same, except that how many partitions are created at each step varies. The basic bipartitioning based min-cut strategy is outlined below:

    minCut (Partition, sX, sY, eX, eY)
    {

        if (no gates in Partition)
           return;

        if ((sX == sY) && (eX == eY))
           return;

        setupGatesforVerticalCut;
        setupNets;
        setupPadsforVerticalCut;

        bipartition(Partition);

        wireLen1 = computeWireLength;

        setupGatesforVerticalCut;
        setupNets;
        setupPadsforVerticalCut;

        bipartition(Partition);

        wireLen2 = computeWireLength;

        midX = sX + (eX - sX)/2;
        midY = sY + (eY - sY)/2;

        fix capacity violations within partitions;

        if ( wireLen1 > wireLen2 )
          {
            minCut(Partition*2+1, sX, sY, midX, eY);
            minCut(Partition*2+2, midX+1, sY, eX, eY);
          }
        else if ( wireLen1 < wireLen2 )
          {
            minCut(Partition*2+1, sX, sY, ex, midY);
            minCut(Partition*2+2, sX, midY+1, eX, eY);
          }
        else
          {
            if (previous cut was vertical)
             {
                minCut(Partition*2+1, sX, sY, ex, midY);
                minCut(Partition*2+2, sX, midY+1, eX, eY);
             }
            else
             {
                minCut(Partition*2+1, sX, sY, midX, eY);
                minCut(Partition*2+2, midX+1, sY, eX, eY);
             }
          }

    }


            This bipartitioning mincut algorithm performs both a vertical and horizontal cut and then chooses the cut that gives the best match against requirement (wirelength or delay). If both cuts yield the same length, then the cut chosen depends on the direction of the cut during the previous recursion. Wirelengths were computed using the half-perimeter technique.

          If the previous step had made a vertical cut, the current step will make a horizontal cut, and vice versa. This hopefully, gives a more 'square' cutting and reduces the possibility of partitions being the shape of stripes due to repeated cutting in one direction.

            The partition numbering scheme enumerates the first partition as 0. Its subpartitions are 1 and 2. Further subpartitions are numbered as (n*current partition + i) where n is the number of subpartitions and i = 0..(number of subpartitions-1). This guarantees to give a unique number for each partition for any k-way partitioning scheme.

            The min-cut function has three separate parts - component setup, partitioning and legalization. the component setup phase sets up the cost functions. The partitioning step does the actual hypergraph partitioning and the legalization stage ensures that the gate capacity is not exceeded.

    Component Setup

          The performance of this min-cut algorithm depends greatly on the setupGates, setupNets and setupPads parts. In these functions the weights of the gates, nets and pads are set. How this is done affects how the hypergraph is partitioned and therefore the nature of the result.

          The versatility of the hMetis hypergraph partitioning tool permits a number of tweaks to handle the different partitioning requirements dictated by what we're looking for in the current placement. Further it allows a good implementation of 4-way partitioning.

           laPlace permits the net weights to be set to different values like fanout, wirelength or delay depending on the optimization strategy. These are described in more detail in Implementation. Gate weights were set to a constant value depending on the size of the chip, in order to give the best results.

    Partitioning

          During this step the actual partitioning of the network takes place. For both horizontal and vertical cuts, a cut line is placed at the center. Such a strategy simplifies the partitioning. The 4-way partitioning differs in that it makes two perpendicular cuts and doesn't do the partitioning twice. In order to obtain a partitioning that results in a legal number of gates in each subpartition, I used the UBFactor parameter of hMetis to define the maximum tolerance of disparity while balancing the partitions. This measure was simply

         (largest subpartition area - smallest subpartition area)*100/(average subpartition area) .

          The UBFactor setting gave good partitions that seldom resulted in any subpartition exceeding its capacity. When they did I used legalization strategies outlined in the next section.

    Legalization

          The fix capacity violations in partitions function ensures that the maximum capacity of the partition, defined by its total area in grid site multiplied by the number of gates per site, is always more than or equal to the number of gates currently in that partition. A violation may occur if the partitioning step yields a result that has more gates in a partition than its capacity.

           Different strategies are implemented to handle legalization and the user can pick any one of them in his placement run. Each strategy selects a gate in the current partition that will be moved to the adjacent partition to fit the capacity constraints. The strategies are:

    Basic Strategies
    1. First : pick the first gate in the numerical order of gate IDs in the netlist, that belongs to the current partition. While this is a naive pick, it proved to be quite effective in the smaller toy benchmarks, but showed itself to be less than capable with larger netlists.
    2. Last : pick the last gate in the numerical order of gate IDs belonging to this partition.

    Intelligent Strategies
    3. Least Connected : This strategy has proved to be the most effective in larger benchmarks. It picks the gate that has fewest other gates connected to it AND no pads (or fewest pads) connected to it. This ensures that I dont pull a gate thats part of a cluster accross the partition if I can find another.
    4. Least Decrease in Wirelength : picks the gate that results in the least decrease in wirelength when moved from this partition to the adjacent one.
    5. Most Decrease in Wirelength : picks the gate that results in most decrease in wirelength when moved from this partition to the adjacent one.
    6. Least Connected and Most Decrease in Wirelength : this is a hybrid of strategies 3 and 5, which were found to perform the best. It tries to find gates that satisfies the requirements of both 3 and 5, in order to pick the best gate to move. This strategy was found to yield the best results in nearly all large benchmarks.

           A point to be noted is that at each partitioning step the legalization function ensures that no partition exceeds its capacity. But it does NOT, and need not ensure that none of the gate sites violate their capacity either. This is because the complete recursion will converge to a point where the partition is a gatesite and the nature of the algorithm will ensure that its capacity is not violated. So I implement a simplified scheme where at each partitioning step the gates in that partition are at the 'Center of Gravity' of that partition.

          Various parameters can be set up to configure the hMetis partitioner to yield an effective cut that optimizes congestion while concurrently improving the placement requirement. For example defining the weights of the nets as a function of its length allows laPlace to generate a placement that minimizes wirelength, since it will take the cut with the least weight. The different parameters and their implications are described in Implementation .

    Four-way Partitioning

            While the quadpartitioning strategy comes under the 'Something Cool' category of this report, I will describe its algorithm and implementation here, since it reflects a continuation of the implementation of the basic bipartitioner. Its results and implications are described in detail in the Something Cool section.

            The implementation of quadpartitioning in laPlace is simplified by the structure of the original min-cut algorithm. To implement it, I first define partition information for both vertical and horizontal cuts to set up the pin propagation of pads and gates outside the current partition. With this in place I perform the actual 4-way cut.

          Now the additional problem occurs that there might be a capacity violation among one or more partitions that needs to be rectified. Once again I call one of the 5 legalization strategies mentioned above for bipartitioning. But 4-way partitioning adds a new twist - which partition do I move the gate to ? laPlace implements the following algorithm to settle this:

    legalizeQuadPartition ( Partition )
    {
        while ( 1 )
         {
            if ( gatesInPartition1 <= capacityOfPartition1 )
              {
                 if ( gatesInPartition2 <= capacityOfPartition2 )
                   {
                      if ( gatesInPartition3 <= capacityOfPartition3 )
                        {
                           if ( gatesInPartition4 <= capacityOfPartition4 )
                             {
                                return;
                             }
                           else
                             {
                                victim = pickGateToMove;
                                move to nearest partition whose capacity will not be exceeded;
                             }
                        }
                       else
                        {
                            victim = pickGateToMove;
                            move to nearest partition whose capacity will not be exceeded;
                        }
                   }
                 else
                   {
                      victim = pickGateToMove;
                      move to nearest partition whose capacity will not be exceeded;
                   }
              }
            else
              {
                 victim = pickGateToMove;
                 move to nearest partition whose capacity will not be exceeded;
              }
         }
    }

        In summary, the above algorithm ensures that none of the 4 new partitions during the 4-way partitioning step willl violate its capacity constraint, while also trying to pick those gates that will increase wirelength the least (or decrease it the most). It might be possible to move the gate to the nearest partition without worrying about its capacity being exceeded, but this may result in a race condition in the algorithm with the gates being 'cycled' around. This can be avoided with further algorithm complexity (such as fixing the gate after moving to prevent races) that I preferred to avoid for simplicity.

        Another issue is that 4-way partitioning may not always be possible. Often the current partition may be composed of just one row or column, in which case a 4-way partition does not make sense. laPlace detects this condition and performs the usual 2-way partition just once in this case. If the partition has only one column, it does a horizontal cut, while a one-row partition is cut vertically.

    Implementation
      laPlace is invoked at the UNIX command line as follows:
                laplace -p pinfile -i netlist_file -s step -c legalizing_technique -t num_iterations -I initial_ordering
                           -O optimization_parameter -n net_weighing_scheme -d seed -w num_partitions
                           -C minimize_critical_path -o output_file

        The command line parameters are summarized below:

  • -p : specify the netlist pinfile position relative to the current directory position. no default setting
  • -i  : specify the netlist file. no default setting
  • -s : setting this to 0 will cause only the final placement layout to be exhibited on the GUI display. If this is set to 1

  •       then at each step the intermediate placement is shown. default : 0
  • -c : this specifies the legalization parameter. It can be specified values ranging from 1 to 6, in the same order as

  •       they are defined in the Legalization section. This means, -c 1 uses pick first, -c 2 uses pick last etc. default
          : 3 (least connected)
  • -t : number of times the entire mincut procedure is to be performed. default: 1
  • -I : specifies the initial ordering of the gates before mincut. The strategies are -

  •      1 - First : all gates in gate site (1,1)
         2 - Last : all gates in gate site (n,n)
         3 - Distribute : all gates distributed over the chip, with each gatesite holding as many (or fewer) gates as
                                 its capacity permits.
         4 - Middle : all gates in gate site (n/2, n/2)
       default : 1
  • -O : specifies what to try to optimize for while performing placement. The legal settings are:

  •        1 - Congestion : handle congestion minimization as primary objective.
           2 - WireLength : wirelength minimization is the goal
           3 - WireDelay : shoot for delay minimization
  • -n : specifies how to weight the nets during the Component Setup stage.

  •        1 -Unconstrained NetLength : weights of nets are their lengths
           2 -Constrained NetLength : weight of net is netLength if netlength < 7. Otherwise it is 6.
           others - Constant : all nets are weighted by a constant amount equal to the parameter value
       default : 6.
  • -d : seed value for hMetis partitioner. default: 654321
  • -w : how many partitions. The allowable settings are:

  •        2 - Two-way : perform bipartitioning
           4 - Four-way : perform quadpartitioning
  • -C : whether or not to minimize the critical paths specified in the input netlist. Setting this to 0 will not treat

  •        those paths specially. A '1' setting will give them higher weightage, making them critical.
  • -o : name of the output file to which to write the placement results for the checker.
  •     The necessity and implications of the parameters will now be described. The legalization parameter has a very great effect on the quality of the results obtained, especially with larger benchmarks. The basic legalization strategies carry themselves well for small toy benchmarks but perform badly for larger benchmarks because they are incapable of identifying the connectedness of the gate they pick or implication of moving it.

        The initial_ordering affects the output more in the case of the smaller benchmarks. This is due to the fact that the partitioning algorithm does not always utilize the effects of moving gates in one partition on those in another partition in order to get a good placement. Techniques like the ping-pong technique may help this, but I did not implement them. I however found that in these circumstances increasing the num_iterations parameter caused an improvement in the result often, because this causes the end placement from one iteration to be 'built-upon' for the next iteration, allowing some of the partitioning effects to be seen accross the chip.

        The optimization parameter defines what to optimize for. This parameter works best not in isolation but together with some other parameters, notably legalization and net_weighting_scheme. Optimizing for congestion causes laPlace to set the net weights with a value equal to the number of gates and pads connected to that net. This is motivated by the desire not to cut a clustered set of gates, which will cause the 'guts' of the cluster to fall accross the partition.

        This works especially well with larger benchmarks that have explicitly clustered sets of gates that need to be placed close to get a good wirelength and delay. Setting the optimization parameter to wirelength or delay optimization causes net weights to be set to these values. Further, the net_weighting_scheme affects the result, since hMetis often gave better results with a constrained or constant setting for all net weights.

        The num_partitions setting too affects the result greatly. For larger benchmarks quadpartitioning was found to yield much superior results as compared to bipartitioning. However it also sometimes caused the maximum delay to be very large. This is explained by the fact that more aggresive identification of clustering causes some wires between clusters to be so long as to cause the maximum delay to be high, even though the overall wirelength is quite competitive, as results showed.

        Finally, the critical_path_minimize parameter causes the critical path defined in the netlist to be weighted higher than the other parts of the network, thus causing these paths to have a better wirelength/delay than otherwise. When using this setting is of course makes sense to optimize for wirelength or delay since optimizing for congestion along with this parameter will not be likely to do too well.

    Performance Results and Analysis

    Consolidated Results of Benchmarks:
      The following table shows the checker output for the various project benchmarks :
     

    Benchmark Wirelength CapacityViolations Max Delay Max Congestion Cong. Flatness
    toy1 56.5685 0 123.223 7 208
    toy2 158.761 0 169.552 9 183
    fract 884.266 85.3036 36 154
    primary1 8010.19 0 180.056 150 130
    primary2 40407.6 0 138.071 367 112
    struct  12403.3 0 87.0012 123 127
    industry1 34797.7 0 302.089 287 68
    industry2 227527 0 213.512 867 117
    biomed 58384.7 0 704.886 244 107
     
        These results were obtained with optimization for low wirelength and clustering identification. As described below, these resulted in often higher delay and congestion disparity in obtaining least wirelength. Further most larger benchmarks performed well in comparision to the peer group while smaller benchmarks did not perform as well in giving low wirelengths. This reflects the design tradeoffs made in laPlace as I described earlier. All benchmarks were run on zeta.ece.cmu.edu, a Sun Ultra10.

        Now I examine the effect of the two most dominant parameters of laPlace - the num_partitions and optimization parameters. These two parameters show the most significant effect on the final placement result. To demostrate their effect I set other parameters to the setting that gave best performance.

    Effect of Partitioning on Wirelength and Congestion
        The next graph indicates the effect of quadpartitioning on the wirelength result. Other parameter settings are taken to be same for each benchmarks. All results are normalized by scaling the higher wirelength to 100 because differnet benchmarks have very different total wirelengths and a proper view of the difference in wirelength is not possible without such scaling.


          Figure.1: 2-way and 4-way partitioning

        Fig.1 shows how 4-way partitioning improves the wirelength result for the bigger benchmarks. For the smallest benchmark 4-way cutting was found to be slightly worse probably because of a combination of the other settings and the fact that 2-way gives a better 'granularity' of partitioning so that the end placement is better. On the other hand for larger benchmarks the 4-way cutting detects clustering better and yields a better result for different combinations of other parameters. It was also much faster than bipartitioning.

        Partitioning also affects congestion flatness, but not as directly as wirelength. There is often greater disparity with 4-way partitioning because it keeps gates that are part of a cluster closer together so that some congestion cutlines are 'dense' where it goes thru a cluster, while in other cutlines are 'sparse'. In 2-way partitioning often the clusters are not as closely packed and therefore the cluster is more 'spread out' causing higher wirelength, but also less disparity. This is indicated in the following figure


        Figure.2: Effect of Partitioning on congestion disparity

        Figure 2 indicates how 4-way partitioning results in higher congestion disparity than 2-way partitioning. This holds for all benchmarks except toy1 and fract. For toy1, 4-way results in superior reduction on congestion because it can more effectively partition the circuit. Since it is so small, there are no effective clusters that result in 'dense' areas that cause higher congestion disparity. As a result the better partitioning due to 4-way cutting results in better congestion flatness.

        For all other benchmarks the graph clearly indicates how their congestion flatness measure (congestion disparityx + congestion disparityy) is higher for 4-way cutting than 2-way.

    Effect of Optimization Parameter on WireLength & Congestion
        The next analysis involves the effect of the -O parameter on the results of the placement. Optimizing for congestion gives improved congestion disparity accross all benchmarks without exception. Similarly optimizing for wirelength improves the wirelength figure for all benchmarks. This is indicated in Figures 3 and 4. To avoid cluttering the graphs I am showing only 4 benchmarks in each of them. Other benchmarks followed the same trend.
     

    Figure 3: effect of optimization technique on wirelength
    Figure 4: effect of optimization technique on congestion 
                 disparity
     
        The two figures above indicate clearly how laPlace uses the optimization parameter effectively to reduce the wirelength or the congestion depending on what needs to be minimized. Also improvement in congestion disparity is more marked than the improvement in wirelength when the optimization parameter is changed. This is due to two reasons. First, laPlace tries to achieve a good wirelength even if its optimizing for congestion. As a result optimizing for congestion doesnt significantly hurt wirelength, but it does improve congestion disparity greatly. Secondly, wirelength measures are much higher so the scaled wirelengths shows only a small change, when the actual difference may run into thousands of units (e.g. 227527 vs 241394 for industry2).

    Effect of Number of Iterations
        Increasing the number of iterations of min-cut done was not found to significantly affect the solution, except for smaller benchmarks. For larger benchmarks it made little or no difference. This is because smaller benchmarks are more likely to gain from extra iterations that can exchange gates that may have been initially placed in a less than optimal position.

        For larger benchmarks, I feel further performance enhancement would be possible only by further tuning the partitioning algorithm or by implementing better intermediate- and end-placement using techniques like simulated annealing or quadratic placement. For fract and primary, multiple iterations (2-6) did not give any difference in wirelength, while for industry1 it decreased by an insignificant amount.

    Something Cool

    Some of the cool things in laPlace are:
    Balancing and Legalizing Techniques for Partitions:
        The three intelligent balancing techniques were found to yield far superior placement results compared to the basic legalizing strategies. In most cases the least connected and most decrease in wirelength technique yielded the best performance, while in others, either least connected or most decrease in wirelength was the best technique. These techniques were implemented with clustering in mind. Their effectiveness is seen from the fact that they showed themselves to be much more capable at extracting better placement with larger benchmarks than with smaller ones.

    Clustering:
        The parameters for laPlace can be configured to partition while taking into account clustering within the netlist. This is done by optimizing for congestion while leaving the weighting of nets as unconstrained, which causes all the nets to be scaled by the number of gates and pads connected to them, causing clusters of gates to be as close to each other as possible. The performance of this is further enhanced by 4-way partitioning in most cases, especially with bigger benchmarks.

         When laPlace handles parameters for clustering as above, it also modifies the hMetis partitioning parameters to partition based on clustered vertices. By setting the net weights appropriately using the parameters, hMetis can effectively partition the netlist by taking into account clustered groups of gates which will be placed as closed to each other as possible.

    Four-way partitioning:
        Quadpartitioning was found to significantly improve the placement of most benchmarks. Besides improving wirelength, it also resulted in much faster placement than with bipartitioning. This is attributed to the fact that in bipartitioning the partitioning step is done twice. However even though each 4-way partitioning stage takes much more time than a 2-way cut, the overall speedup is significant, especially for larger benchmarks.

        As mentioned earlier, quadpartitioning gave better wirelengths but worse congestion disparity. It is possible for laPlace to reduce this disparity by using the -I 3 parameter described in Implementation which causes gates to be more distributed on each round. This is however a simple strategy which results in better disparity measure at the cost of worse wirelength. It is possible to extend laPlace to perform a more optimized intermediate placement using quadratic placement, which would probably solve this issue.

        The effect of four-way partitioning and clustering are seen for fract in Figure A which shows a placement that doesnt utilize these features, and Figure B which utilizes these. The second placement yielded an 8% improvement in wirelength for fract.

    Graphics :
        laPlace has a GUI that displays the output of the partitioning. It is also capable of providing a step by step view of the partitioning. The interface is written in Tcl/Tk and is capable of checking whether the output file has any capacity violations and also displays the checker summary. Below is a graphic of the fract benchmark on the GUI. For clarity only the critical timing path nets are shown in this figure.

             The red boxes in the corners are the pads, while the numbers in the squares are the gate numbers.

    Source Code
          laPlace was written in C. The GUI display is in Tcl/Tk and is spawned by a C++ class that sets up a WISH shell and communicates with it through named pipes. Follow this link to see the hyperlinked source code files. The benchmark files and a Sun Solaris executable are also available. A description of the benchmark file organization can be found in the Project Document.

    Contact Information
        Suraj Sudhir
        7108 Wean Hall
        Carnegie Mellon University
        Pittsburgh, PA 15213
        (412) 268 4973
        ssudhir@ece.cmu.edu

    References
    1. G.Karypis and V.Kumar, "hMetis : A Hypergraph Partitioning Package Version 1.5.3" ,  http://www-users.cs.umn.edu/~karypis/metis/hmetis
    2. A.E.Dunlop and B.W.Kernighan, "A Procedure for Placement of Standard-Cell VLSI Circuits", IEEE TCAD 1985
    3. A.E.Caldwell, A.B.Kahng, I.L.Markov, "Can Recursive Bisection Alone Produce Routable Placements", DAC 2000
     

    * The name of this placer was thought up during a momentary fit of creative insanity. It has nothing to do with the more well known Laplace transform i.e !!!