Next: Why do we want Up: The Information Provided by Previous: Gradient estimate distributions

## Other things provided by local weighted models

There are many other useful things we can compute from local weighted models that are not available in version 1.0 of Vizier. In this section we discuss predicting the distribution of future responses, estimating the probability that a local minima exists within a region of interest, and estimating the probability that the steepest gradient lies within a specified solid angle. If you are reading this tutorial strictly to learn about Vizier, you may skip the rest of this section.

Look at the data points in fig. 19a. At least 6 of the data points lie at or outside the confidence intervals, which might seem to contradict the statement that we are 95% sure of the function being between the intervals. Actually, there is no contradiction. When there are lots of data points with lots of noise, the regression can be very confident about what the average response is for a particular input, even though the large amount of noise means that most of the data points are outside its confidence intervals. The mean of those points is still very likely to be inside the confidence intervals. Another question which is meaningful, though, is what are the confidence intervals on where future data points will be? These are specified such that we expect 95% of all future data points to lie within the intervals. Of course, in the high noise, many data points example, the future data points confidence intervals would be wide. These intervals would be useful if we wanted to build a controller that would operate a system safely, even in the presence of high noise.

Questions that may be of interest to an optimization routine are: Is there a local optimum in this region of interest? With what confidence is there one? These questions can be answered with locally weighted learning when quadratic models are used. For a quadratic model it is possible to determine its optimum in closed form (see a basic Calculus text to see how). In order to answer the first question above, it is only necessary to see whether an optimum lies in the region of interest. If we want a probabilistic estimate, we can use Monte Carlo sampling. We can look at the joint t distribution on the coefficients of the quadratic model, and then randomly choose coefficient values by sampling from that distribution. The whole algorithm is as follows:

1. Use locally weighted learning to determine the joint t distribution on the coefficients of a quadratic model in the center of the region of interest.

2. Choose a single value for each coefficient by sampling from the coefficient distribution.

3. Compute the local optimum given the chosen coefficients.

4. Determine whether it falls in the region of interest and increment a counter appropriately.

5. Repeat to step 2 a number of times.

6. Report the percent of times the optimum fell within the region of interest.

An optimization routine may also wish to estimate the probability that the steepest gradient lies within a certain solid angle, or that the gradient in a particular direction is greater than or less than a certain value. If the optimization routine is using a policy of collecting data in a certain region until it is confident about which direction it should shift the region of interest to in order to get better results (part of a technique called Response Surface Methodology), this is exactly the kind of information it needs to know. We have already seen that derivatives can be computed directly from a model. The algorithm for computing probabilistic estimates of those gradients is like the algorithm just given for estimating the probability of a local optimum.

Next: Why do we want Up: The Information Provided by Previous: Gradient estimate distributions

Jeff Schneider
Fri Feb 7 18:00:08 EST 1997