Gradient estimate distributions

Next: Other things provided by Up: The Information Provided by Previous: Noise estimate distributions

Gradient estimate distributions

Gradient estimates are very useful for controllers and optimizers. Gradient descent is a popular optimization method that requires an estimate of the derivatives of the output with respect to each input at a certain point. It uses those derivative estimates to determine which direction in the input space to move in order to get a better output. Similarly, an intelligent controller tries to drive the output of a system toward some target by adjusting the control variables. Again, the derivative of the output with respect to each of the control variables can be used to decide which direction to move the controls.

Many common function approximators have difficulty providing gradient estimates, and even more difficulty putting confidence intervals on those estimates. Often they must resort to making several predictions in the vicinity of the query point. They can approximate the gradient by computing the difference in the predictions as the input is changed. That technique can be computationally expensive, and may also cause wild fluctuations in gradient estimates when the predictions are not smooth or the size of the region from which to make predictions is poorly chosen.

Using regression analysis we can get gradient estimates directly from the fitted model rather than requesting numerous additional predictions. An estimate of the derivative of the output with respect to one input is computed in a way similar to the prediction. A projection is done from the joint t distribution on the coefficients to a single dimensional t distribution for the gradient. A simple case of this is when we want to know the gradient of an output with respect to a particular input in a linear model. Then the distribution of the gradient is the same as the distribution for the coefficient on that input. If we have an averaging model, then the gradient estimate will always be 0. For a quadratic model, the computation of the gradient with respect to one input uses the coefficients on all the terms made from that input.

You may experiment with gradient graphs on your own. Notice that the graph dialog box asks you to specify both an x attribute and a gradient attribute. Often, they will both be the same input variable indicating that you would like to see the derivative of the output with respect to an input as that input varies. However, with multi-dimensional input data, it makes just as much sense to plot the derivative of the output with respect to input 1, as input 2 is varied.

Next: Other things provided by Up: The Information Provided by Previous: Noise estimate distributions

Jeff Schneider
Fri Feb 7 18:00:08 EST 1997