There are many other useful things we can compute from local weighted models that are not available in version 1.0 of Vizier. In this section we discuss predicting the distribution of future responses, estimating the probability that a local minima exists within a region of interest, and estimating the probability that the steepest gradient lies within a specified solid angle. If you are reading this tutorial strictly to learn about Vizier, you may skip the rest of this section.

Look at the data points in fig. 19a. At least 6
of the data points lie at or outside the confidence intervals, which
might seem to contradict the statement that we are 95% sure of the
function being between the intervals. Actually, there is no
contradiction. When there are lots of data points with lots of noise,
the regression can be very confident about what the average response
is for a particular input, even though the large amount of noise means
that most of the data points are outside its confidence intervals.
The mean of those points is still very likely to be inside the
confidence intervals. Another question which is meaningful, though,
is *what are the confidence intervals on where future data points
will be?* These are specified such that we expect 95% of all future
data points to lie within the intervals. Of course, in the high
noise, many data points example, the future data points confidence
intervals would be wide. These intervals would be useful if we wanted
to build a controller that would operate a system safely, even in the
presence of high noise.

Questions that may be of interest to an optimization routine are: *
Is there a local optimum in this region of interest? With what
confidence is there one?* These questions can be answered with
locally weighted learning when quadratic models are used. For a
quadratic model it is possible to determine its optimum in closed form
(see a basic Calculus text to see how). In order to answer the first
question above, it is only necessary to see whether an optimum lies in
the region of interest. If we want a probabilistic estimate, we can
use Monte Carlo sampling. We can look at the joint t distribution on
the coefficients of the quadratic model, and then randomly choose
coefficient values by sampling from that distribution. The whole
algorithm is as follows:

- Use locally weighted learning to determine the joint t distribution on
the coefficients of a quadratic model in the center of the region of interest.
- Choose a single value for each coefficient by sampling from the
coefficient distribution.
- Compute the local optimum given the chosen coefficients.
- Determine whether it falls in the region of interest and increment
a counter appropriately.
- Repeat to step 2 a number of times.
- Report the percent of times the optimum fell within the region of interest.

An optimization routine may also wish to estimate the probability that the steepest gradient lies within a certain solid angle, or that the gradient in a particular direction is greater than or less than a certain value. If the optimization routine is using a policy of collecting data in a certain region until it is confident about which direction it should shift the region of interest to in order to get better results (part of a technique called Response Surface Methodology), this is exactly the kind of information it needs to know. We have already seen that derivatives can be computed directly from a model. The algorithm for computing probabilistic estimates of those gradients is like the algorithm just given for estimating the probability of a local optimum.

Fri Feb 7 18:00:08 EST 1997