Next: CONCLUSION Up: Quasi-Bayesian Strategies for Efficient Previous: BUILDING A NEW APPROACH

# PLANNING TO OBSERVE WITH GAUSSIAN MEASUREMENTS

ATTENTION:
Due to the difficulty of translating mathematical markup into HTML, this section was greatly summarized. For the whole content, please consult the compressed postscript version.

We now demonstrate our approach to decision-making on the planning to observe problem described as follows:

A series of independent real-valued observations $X$i is available to an agent; each observation costs $c$ units of loss and is normally distributed with known variance $1/r$ and unknown mean $θ$. We indicate this by $X$i   N(xi; θ,1/r). The agent wants to know whether $θ$ is larger or equal, or smaller than zero. At any point, the agent can take a new observation or stop and decide: Smaller ($d$0) and Larger ($d$1). When a decision is made, the loss $L(θ, d$i) is defined by Table 1.

 $θ<= 0$ $θ> 0$ $d$0 0 $θ$ $d$1 $- θ$ 0

Figure 2 illustrates the dynamics of the problem. At any point, the agent is facing the same question as to whether a decision should be made or an observation should be taken. We want to find a sequence of actions for the agent.

Figure: Planning to Observe with Gaussian Measurements

## THE CREDAL SET

Prior beliefs about $θ$ translate into a convex set of probability distributions, the credal set. A realistic model for prior beliefs would take a credal set large enough to represent the non-specificity of the agent's beliefs. Consider the convex set of distributions composed of Gaussian distributions with mean between $μ$1 and $μ$2, and variance $1/τ$:

0 = { Convex Combinations of N( θ μj, 1/τ), with { μj in[μ1, μ2] . }

To create a convex set of distributions, it may be necessary to use convex combinations of distributions. To build some intuition, consider the semi-plane $(τ, μ)$, $μ> 0$ (inverse of variance  mean). A Gaussian distribution can be mapped to a point in this semi-plane. The Gaussian distributions in the credal set above can be mapped to a vertical segment of line at $τ$. Call $Δ= | μ$2 - μ1 | the width of the credal set. After each measurement we use Bayes rule to update each distribution in the credal set; after $n$ measurements $x$i (with mean $x**$), the posterior credal set is [Giron and Rios1980] still a set of Gaussian distributions with reduced width. A fundamental property is that the width of the set shrinks to $(Δ τ/(τ+ nr))$ after $n$ measurements.

The New Quasi-Bayesian objective is to find an E-admissible plan and monitor robustness. E-admissible plans for ``pure'' Gaussians in the credal set will be convenient since the prior and the likelihood are then conjugate [Berger1985]. For each Gaussian distribution in the credal set $N$0, an E-admissible plan can be generated as follows:

Take the semi-plane $(τ, μ)$, $μ> 0$ (inverse of variance  mean). Divide the plane into three decision regions: a Continue region, a Stop0 region and a Stop1 region. The posterior density after measurement $x$n is $N$n, represented by a point in the semi-plane $(τ, μ)$. The plan is: check whether the posterior $N$n is in Continue, Stop0 or Stop1, and respectively take a new measurement, stop and pick $d$0, stop and pick $d$1. The plan is determined by the decision regions, which are created by dynamic programming (value iteration algorithm) [DeGroot1970].

The agent has a prior credal set defined by $μ$1, $μ$2 and τ; as measurements are collected, the decision regions must be constructed for τ + n r, where $n$ starts from zero.

The whole plan is defined by the upper and lower boundaries of the Continue region. The value iteration algorithm essentially brackets this region and converges to the correct boundary, but every iteration requires more effort than the previous iteration. Given any finite amount of computation, the agent, Bayesian or Quasi-Bayesian, faces an Indeterminate region, yet to be explored. The agent can shrink the size of the Indeterminate region at high computational cost, as discussed in Appendix A. A real agent has a region of computational indeterminacy: because a finite amount of effort is available, not all plans can be evaluated.

We have returned to the Archers Fable. We have the plane $(τ, μ)$, and we must hit the boundary of the Continue region.

The Bayesian archer wants to find a single curve. Any lack of precision in the prior models will require a sensitivity analysis or a meta-analysis. This may lead the Bayesian archer to spend a vast amount of computation if the archer is considering a point close to the boundary between Continue and Stop regions.

The New Quasi-Bayesian archer thinks differently. The New Quasi-Bayesian agent recognizes that as soon as there is a point in some region outside the Indeterminate region, the whole situation is characterized. If the credal set is inside the Stop0 region, then stop and pick $d$0; if the credal set is inside the Stop1 region, then stop and pick $d$1; if the credal set is inside the Continue region, then take another observation. But if the credal set intersects more than one region, a non-robust situation has been detected. The agent has an E-admissible action that can be chosen, but robustness has failed. So the New Quasi-Bayesian archer shrinks the Indeterminate region only if all distributions in the credal set are inside this region.

This strategy links the computational indeterminacy of the planning algorithm to the credal indeterminacy of the agent, formalizing a connection between search effort and model building. There are limits to the effort that is worthy spending in search for a given level of imprecision in a probability assessment. This work appears to be the first analysis of this trade-off with the tools of Quasi-Bayesian theory. Some new questions emerge. First, what are the methods that define the agent's behavior when two decisions are computable and admissible? Second, what are the approximation algorithms (value iteration in this case) that admit a relationship between computational and credal indeterminacies? Third, is the approximation algorithm biased, i.e., is it causing the agent to pick some regions more than others?

An analysis of this problem must take into account the possibility that the agent pre-computes the decision regions. In general, suppose the agent wants to pre-compute the relevant decision regions for prior width Δ and prior inverse variance τ. So the agent must pre-compute the regions for all τ + n r, where $n$ starts from zero.

The New Quasi-Bayesian archer has simply to guarantee that the band of ``pure'' Gaussians does not fall entirely inside the Indeterminate region. This can be done in a finite number of iterations given the monotone character of the value iteration algorithm (Appendix A) and the fact that the posterior set of Gaussians has a width that decreases as $Δτ/(τ+ n r)$.

For the New Quasi-Bayesian archer, the question arises as to whether or not we can determine the boundaries of the Indeterminate region without even iterating the value iteration algorithm. Of course, this must depend on the characteristics of the initial credal set. Suppose the agent starts with a given prior variance $1/$τ and a given prior width Δ.

ATTENTION:
We have derived new results that allow the Quasi-Bayesian agent to compute in closed-form the situations where the boundaries can be determined without iteration,. For the whole content, please consult the compressed postscript version.

We can summarize the New Quasi-Bayesian strategy for generating plans:

1. Theorem 1 verifies whether the Quasi-Bayesian plan can be stored in closed-form.
2. If not, then the value iteration algorithm shrinks the Indeterminate region. When the Indeterminate region is smaller than Δτ/τ for every $τ$ larger than τ, the decision regions are defined.

## EVALUATING THE SOLUTION IN A REAL PROBLEM

ATTENTION:
Here we just present the problem and discuss its solution. For the whole content, please consult the compressed postscript version.

Consider the construction of a robotic probe for classification of material based on acoustic signals [Krotkov and Klatzky1995]. The taks is for a robot to decide whether a material belongs to one of two classes based on the tangent of the angle of internal friction, $tanφ$, which is captured from acoustic analysis of impact sounds. This is equivalent to deciding whether a variable $θ$ (linearly related to $tanφ$) is larger or smaller than zero. The losses are given by table 1. The robot is used for a variety of tasks; when the robot is assigned to a task, a cost for robot operation is assigned based on the number of waiting tasks in a queue. So the act of striking a material costs a quantity $c$i which belongs to a finite set of possible costs $c$1, c2, ..., cm , corresponding to the size of the queue. Once the task is initiated with a cost $c$i, the cost remains fixed during that task.

Very sparse knowledge about $θ$ must be translated into a belief model. Trying to model this with a single prior leads to a number of arbitrary choices. Instead, take the prior model to be a Quasi-Bayesian set $N$0 with variance $1/$τ = 4, $μ$1 = -5, $μ$2 = 5 (so that Δ = 10).

For such a problem, very effective plans can be quickly generated using our scheme (for the actual plans, please consult the compressed postscript version. The plans have a satisficing character in which they are admissible in the light of prior beliefs, yet they are open to challenge: during operation, the robustness of any decision may be compared to other admissible decisions and better courses of action can be learned or experimented. There is no need to obtain a single ``best'' plan and then conduct sensitivity analysis on it: the overall strategy already encodes all robust and non-robust situations the agent may face. Since the credal set and the decision regions are available to the agent, robustness questions can be dealt with in a straightforward manner. From this simple example we notice how the questions of model precision, computational effort and robustness can be all tied together in the New Quasi-Bayesian framework.