There are many other useful things we can compute from local weighted models that are not available in version 1.0 of Vizier. In this section we discuss predicting the distribution of future responses, estimating the probability that a local minima exists within a region of interest, and estimating the probability that the steepest gradient lies within a specified solid angle. If you are reading this tutorial strictly to learn about Vizier, you may skip the rest of this section.
Look at the data points in fig. 19a. At least 6 of the data points lie at or outside the confidence intervals, which might seem to contradict the statement that we are 95% sure of the function being between the intervals. Actually, there is no contradiction. When there are lots of data points with lots of noise, the regression can be very confident about what the average response is for a particular input, even though the large amount of noise means that most of the data points are outside its confidence intervals. The mean of those points is still very likely to be inside the confidence intervals. Another question which is meaningful, though, is what are the confidence intervals on where future data points will be? These are specified such that we expect 95% of all future data points to lie within the intervals. Of course, in the high noise, many data points example, the future data points confidence intervals would be wide. These intervals would be useful if we wanted to build a controller that would operate a system safely, even in the presence of high noise.
Questions that may be of interest to an optimization routine are: Is there a local optimum in this region of interest? With what confidence is there one? These questions can be answered with locally weighted learning when quadratic models are used. For a quadratic model it is possible to determine its optimum in closed form (see a basic Calculus text to see how). In order to answer the first question above, it is only necessary to see whether an optimum lies in the region of interest. If we want a probabilistic estimate, we can use Monte Carlo sampling. We can look at the joint t distribution on the coefficients of the quadratic model, and then randomly choose coefficient values by sampling from that distribution. The whole algorithm is as follows:
An optimization routine may also wish to estimate the probability that the steepest gradient lies within a certain solid angle, or that the gradient in a particular direction is greater than or less than a certain value. If the optimization routine is using a policy of collecting data in a certain region until it is confident about which direction it should shift the region of interest to in order to get better results (part of a technique called Response Surface Methodology), this is exactly the kind of information it needs to know. We have already seen that derivatives can be computed directly from a model. The algorithm for computing probabilistic estimates of those gradients is like the algorithm just given for estimating the probability of a local optimum.