next up previous contents
Next: Transfer Up: Preliminary Results Previous: Task 3: Multi-armed Bandit

STAGE

 

In their simulated-annealing approach to the channel-routing problem (described in Section 5.1, Wong et. al. say,

Clearly, the objective function to be minimized is the channel width w. However, w is too crude a measure of the quality of intermediate solutions. Instead, for any valid partition tex2html_wrap_inline1438 , the following cost function is used:

  equation459

where p is the longest path length of tex2html_wrap_inline2088 [a graph induced by the partitioning], both tex2html_wrap_inline1440 and tex2html_wrap_inline1442 are constants, and ... tex2html_wrap_inline2094 , where tex2html_wrap_inline2096 is the fraction of track i that is unoccupied. [Wong et al. 1988]

They hand-tuned the coefficients and set tex2html_wrap_inline1444 .

To apply STAGE to this problem, we began with not the contrived function C but the natural objective function tex2html_wrap2078 . The additional objective function terms used in Eq. 7, p and U, along with w itself, were given as the three input features to STAGE's function approximator.

Table 2 summarizes the results we have obtained thus far, using STAGE to predict and improve upon the results of stochastic hillclimbing optimization on a large channel-routing problem (YK4). For prediction, we have used simple models: linear regression in the batch-update case, and backprop on MLPs (2 hidden units) in the TD( tex2html_wrap_inline2400 ) case.

 

Performance typical traj. len
ALGORITHM best, mean taken / considered
(*) Global optimum for problem 10
(A) Hillclimbing, patience=100 (N=30) 44, 52.4 93 / 696
(B) Hillclimbing, patience=1000 (N=30) 43, 47.9 99 / 2591
(C) Simulated annealing, tex2html_wrap_inline2112 (N=3) 35, 35 3836 / 126,202
(D) Simulated annealing, f(x)=w (N=3) 14, 14.7 163,401 / 279,073
(E) Modified hillclimbing, patience=1000 (N=5) 19, 19.8 7765 / 11,342
(F) STAGE, TD(0.6) + backprop (N=3) 14, 22.7 388 / 2779
(G) STAGE, batch linear regression (N=28) 13, 17.1 340 / 2293
(H) STAGE, random fit (for validation) 50, 55 190 / 1704
Table 2: Results on routing problem YK4. The first pair of columns gives the best and mean channel size attained over N runs of each algorithm. The second pair counts the steps taken in a typical search trajectory from the starting state to a local optimum (in the case of STAGE, after learning).

 

Our results provide some interesting insights. Experiments (A) and (B) show that hillclimbing easily gets stuck in very poor local optima, even when the ``patience'' parameter is set high (allowing it to evaluate more random moves before terminating). Experiment (C) shows that simulated annealing, as used in [Wong et al. 1988], does considerably better. Very surprisingly, when given plenty of computing time, the annealer of experiment (D) does better still. It seems that the ``crude'' evaluation function f(x)=w allows simulated annealing to effectively do a random walk along the ridge of all solutions of equal cost w, and given enough time it will fortuitously find a hole in the ridge. Hillclimbing modified to accept equi-cost moves performed nearly as well (E).

The hillclimber we gave to STAGE for learning was the fast but weakly-performing one of experiment (A). The results show STAGE learned to optimize superbly, not only bettering the performance of (A) but quickly finding solutions as good as those found by the long simulated annealing runs. The batch fits with linear regression (G) were especially impressive, often producing evaluation functions that performed well after only two or three trajectories of training. This seems too good to be true; did STAGE work according to its design or did it somehow ``cheat''?

We considered and eliminated several hypotheses. (1) Since STAGE concatenates two hillclimbing trajectories, perhaps it simply has twice as much time to explore. This explanation is rejected by experiment (B). (2) The function approximator may simply be smoothing f(x), which helps eliminate local minima and plateaus. No, we tried training V(x) to smooth f(x) directly and running A(V);A(f), but this failed to produce improvement. We also trained V(x) to fit randomly-generated training values from the same range as those generated by STAGE; again, this failed to improve over plain one-stage hillclimbing (Experiment H). (3) The input features happen to be excellent, providing an easy answer. This explanation turns out to be correct; I discuss how at the end of the next section.


next up previous contents
Next: Transfer Up: Preliminary Results Previous: Task 3: Multi-armed Bandit

Justin A. Boyan
Sat Jun 22 20:49:48 EDT 1996