			Documents about
	"Parallel Standard Cell Placement Experimental System"

						Date: 1992-Jul-17

(1) Program Name: Parallel Standard Cell Placement Experimental System

(2) Runs on: Multi-PSI / Pseudo Multi-PSI system with PIMOS version 3.2.

(3) Description of the Problem:

The program solves the placement problem with standard cell LSI.
The placement problem is to search for the minimum cost configurations.
Cost value is calculated by total estimated length of the wires 
which connects the cells.
This problem is known as NP-hard, and some stochastic is required to solve this
problem effectively.
We applied new parallel algorithm which is based on Simulated Annealing algorithm to slove this problem.

(4) Algorithm:

The algorithm for solving the placement of standard cell LSI problem used in
the program is as follows.

This parallel algorithm is based on sequential Simulated Annealig(SA) algorithm, which is known to be effective to slove combinatorial optimization problem.
The most difficult problem in sequential SA is how to control the parameter called "temperature". This is called "cooling schedule problem".
New parallel algorithm is proposed to slove this cooling schedule problem, and
it automatically constructs a cooling schedule.
The detail is as follows.

Each process with distinct temperature value start from random initial configuration, and then executes iterative improvement.
In each iteration, new configuration is generated and tested.
The cost function is calculated by the total value of estimated wiring length.
When the cost value of new configuration is better (lower) than the old one,
the new configuration is accepted.
In other case, new configuration is accepted probabilistically.
The probability is calculated from its parameter value called "temperature" and
 the difference in cost value between two configurations.

Occasionally, the processes with adjacent temperature value try to exchange their configurations.
If the process with higher temperature value has a better configuration, then exchanging is executed.
In other case, the exchanging is executed probabilistically.
The probability is calculated from their temperature parameters and the difference in cost value between two configurations.
In the result, good configuration found in higher temperature is passed to lower temperature.
After many iterations, the well optimized configuration is supposed to be found on the process with lowest temperature.

(5) Process Structure:

When the program is started, the main manager process and the I/O
manager are generated.  These processes remain until the termination of
the program execution.  The main manager process manages the whole
execution of the program.  The I/O manager process manages all I/O
devices and executes io according to the requests from the main manager
process.

In the distributed algorithm, each SA process is connected to its
neighboring SA processes by streams in both directions.

The solution manager keeps the streams to all SA processes and gathers the cost of the configurations, and passed them corresponding to the request from the main manager process.

(6) Load Balancing Scheme:

In our system, we allocate each SA process with distinct parameter to each processor element.
    
(7) Program:

This program is composed of following  14 ESP files and 8 KL1 files.
Followings are brief explanation of this program.  Some explanation is
available in the source programs as comments.

(7.1) KL1 Modules

   (7.1.1) placement
		describes the io manager process for Parallel Cell Placement
		Experimental System
   (7.1.2) main
		describes the main manager and log manager which treats
		the requests between annealing process and io manager process
   (7.1.3) annealing
		main module for annealing routine
   (7.1.4) an
		sub module for annealing routine
   (7.1.5) initialize
		initialize database and fork database processes
   (7.1.6) remove_overlap
		remove overlap between placement modules
   (7.1.7) my_util
		modules for trivial utilities
   (7.1.8) rn0
		generates pseudo random numbers

(7.2) ESP Classes

   (7.2.1) std_cell_placement_system
		driver device to execute placement system without KL1
   (7.2.2) std_cell_parallel_annealing
		support the essential_window for the bestpath_demo_window
   (7.2.3) as_std_cell_placement
		make new instance database and copy value from old one
   (7.2.4) std_cell_graph_manipulator
		manipulate the energy graph
   (7.2.5) std_cell_graph_menu_reader
		dispose the inputs from the menu for the graph
   (7.2.6) std_cell_data_handler
		load placement data from input file and save placement data
		in output file 
   (7.2.7) with_initial_parameter
		initialize the parameter value for the system
   (7.2.8) std_cell_initial_placement
		initialize the placement using pseudo random numbers
   (7.2.9) std_cell_placement_program
		manipulate the I/O between SIMPOS and PIMOS
   (7.2.10) std_cell_position_manipulator
		manipulate the position of the cells
   (7.2.11) std_cell_data_parser
		check and parse the inputs from the user
   (7.2.12) with_std_cell_io_log
		parse the I/O log between SIMPOS and PIMOS
   (7.2.13) vpnr_parser
		parse the VPNR data and make output file for KL1 program
   (7.2.14) std_cell_base_window
		support the main window for the system
   (7.2.15) std_cell_pmacs_window
		support the pmacs_window for the user input
   (7.2.16) std_cell_text_window_editor
		support the text_window_editor for the user input
   (7.2.17) std_cell_manipulator_window
		support the manipulator_window for the graph
   (7.2.18) std_cell_scrolling_on_off_menu
		support the scrolling_on_off_menu for the temperature parameters
(8) Measurement Results:

   We measured two type of experiments.
   One is to search for the relation between exchange rate and the quality of
   solution or execution time.
   Another is for the relation between number of temperatures and the quality
   of solution.
   The effectiveness of this new parallel algorithm was reconfirmed.


 Relation between exchange frequency and quality of solution or execution time
	(Number of temperatures=63,Execution time=4800[sec])
------------------------------------------------------------------------------
 Exchange Frequency  | 1/10 | 1/50 | 1/100 | 1/200 | 1/500 | 1/1000 | 1/5000
---------------------+------+------+-------+-------+-------+--------+---------
Estimated Area[mm^2] | 0.652| 0.626| 0.615 | 0.621 | 0.627 | 0.634  | 0.652
Energy value         |405440|401442|401040 | 396160| 403200| 409307 | 407520
Inner loops count    | 20000| 41000| 46700 | 50000 | 51000 | 52000  | 55000
------------------------------------------------------------------------------

	Relation between number of temperatures and quality of solution
           (Inner loop count=20,000 times,Exchange frequency=1/100)
	---------------------------------------------------------------
	Number of temperature |   8   |  16   |  32   |  48   |  63 
	----------------------+-------+-------+-------+-------+--------
	Estimated Area [mm^2] | 0.692 | 0.664 | 0.608 | 0.646 | 0.615
	Energy                | 471120| 436638| 430320| 431842| 424478
	---------------------------------------------------------------

(9) References:

[1] K. Kimura and K. Taki. Time-homogeneous Parallel Annealing Algorithm.
In Proc. IMACS'91, 1991. pp. 827-828.

[2] C.Sechen, and A.Sangiovanni-Vincentelli, The TimberWolf Placement and Routing Package, IEEE Journal of Solid-State Circuits, vol.SC-20, no.2, 1985, pp.510-522.





