next up previous
Next: Summary Up: Quantification and QPROP Previous: Subsearch and quantification

Inference and Learning

As in Section 3, working with a modified representation allows certain inference techniques that are not applicable in the Boolean case.

As an example, suppose that we are resolving

\begin{displaymath}\neg a(A,B) \vee \neg b(B,C) \vee c(C)\end{displaymath}

with

\begin{displaymath}\neg c(C) \vee d(C,D)\end{displaymath}

to conclude
\begin{displaymath}
\neg a(A,B) \vee \neg b(B,C) \vee d(C,D)
\end{displaymath} (45)

where the capital letters indicate ground elements of the domain as before and the resolvents are actually ground instances of
\begin{displaymath}
\neg a(x,y) \vee \neg b(y,C) \vee c(C)
\end{displaymath} (46)

and
\begin{displaymath}
\neg c(z) \vee d(z,w)
\end{displaymath} (47)

It is obviously possible to resolve (46) and (47) directly to obtain
\begin{displaymath}
\neg a(x,y) \vee \neg b(y,C) \vee d(C,w)
\end{displaymath} (48)

which is more general than (45). For a procedure that learns new nogoods and uses them to prune the resulting search, the impact of learning the more general (48) can be substantial and can easily outweigh the cost of the unification step required to conclude that $c(C)$ and $\neg c(z)$ resolve if $z=C$. We have also discussed this elsewhere [ParkesParkes1999].

There are two new difficulties that arise when we implement these ideas. The first is a consequence of the fact that resolution can be ambiguously defined for two quantified clauses. Consider resolving

\begin{displaymath}
a(A,x) \vee a(y,B)
\end{displaymath} (49)

with
\begin{displaymath}
\overline a(A,B) \vee b(A,B)
\end{displaymath} (50)

If we unify the first term in (50) with the first term in (49), we obtain $a(y,B) \vee b(A,B)$ as the resolvent; if we unify with the second term of (49), we obtain $a(A,x) \vee
b(A,B)$.

In practice, however, this need not be a problem:

Proposition 5.4   Let $c_1$ and $c_2$ be two lifted clauses, and $g_1$ and $g_2$ ground instances that resolve to produce $g$. Then there is a unique natural resolvent of $c_1$ and $c_2$ of which $g$ is a ground instance.

Proof. If there is more than one pair of resolving literals in $g_1$ and $g_2$ the result of the resolution will be vacuous, so we can assume that there is a single literal $l$ in $g_1$ with $\neg l$ in $g_2$. If $l$ is the $i$th literal in $g_1$ and $\neg l$ the $j$th literal in $g_2$, it follows that we can resolve the original $c_1$ and $c_2$ by unifying the $i$th literal in $c_1$ and the $j$th literal in $c_2$. It is clear that this resolution will be a generalization of $g$.         $\mathchoice{\vcenter{\hrule height.4pt
\hbox{\vrule width.4pt height3pt \kern ...
...vrule width.3pt height1.5pt \kern 1.5pt
\vrule width.3pt}
\hrule height.3pt}}$


What this suggests is that the reasons being associated with literal values be not the lifted nogoods that are retained as clauses, but ground instances thereof that were initially used to prune the search space and can subsequently be used to break ambiguities in learning.

The second difficulty is far more substantial. Suppose that we have the axiom

\begin{displaymath}\neg a(x,y) \vee \neg a(y,z) \vee a(x,z)\end{displaymath}

or, in a more familiar form, the usual transitivity axiom

\begin{displaymath}a(x,y) \wedge a(y,z) \rightarrow a(x,z)\end{displaymath}

This might be used in reasoning about a logistics problem, for example, if it gave conditions under which two cities were connected by roads.

Now suppose that we are trying to prove $a(A,B)$ for an $A$ and a $B$ that are ``far apart'' given the skeleton of the relation $a$ that we already know. It is possible that we use resolution to derive

\begin{displaymath}a(A,x) \wedge a(x,B) \rightarrow a(A,B)\end{displaymath}

as we search for a proof involving a single intermediate location, and then

\begin{displaymath}a(A,x) \wedge a(x,y) \wedge a(y,B) \rightarrow a(A,B)\end{displaymath}

as we search for a proof involving two such locations, and so on, eventually deriving the wonderfully concise
\begin{displaymath}
a(A,x_1) \wedge \cdots \wedge a(x_n,B) \rightarrow a(A,B)
\end{displaymath} (51)

for some suitably large $n$.

The problem is that if $d$ is the size of our domain, (51) will have $d^n$ ground instances and is in danger of overwhelming our unit propagation algorithm even in the presence of reasonably sophisticated subsearch techniques. Some technique needs to be adopted to ensure that this difficulty is sidestepped in practice. One way to do this is to learn not the fully general (51), but a partially bound instance that has fewer ground instances.

Procedure 5.5   To construct $\mbox{\tt learn}(c,g)$, the nogood that will be learned after a clause $c$ has been produced in response to a backtrack, with $g$ the ground reason associated with $c$:
\begin{codebox}
\li \While $c$\ has a ground instance that is $i$-irrelevant
\li...
...riable in $c$\li bind $v$\ to its value in $g$ \End
\li \Return $c$\end{codebox}

We may still learn a nogood with an exponential number of ground instances, but at least have some reason to believe that each of these instances will be useful in pruning subsequent search. Note that there is a subsearch component to Procedure 5.5, since we need to find ground instances of $c$ that are irrelevant. This cost is incurred only once when the clause is learned, however, and not at every unit propagation or other use.

It might seem more natural to learn the general (51), but to modify the subsearch algorithm used in unit propagation so that only a subset of the candidate clauses is considered. As above, the most natural approach would likely be to restrict the subsearch to clauses of a particular irrelevance or better. Unfortunately, this won't help, since irrelevant clauses cannot be unit. Restricting the subsearch to relevant clauses is no more useful in practice than requiring that any search algorithm expand only successful nodes.

Before moving on, let us note that a similar phenomenon occurs in the pseudo-Boolean case. Suppose we have a partial assignment $\{\neg b,c,\neg
d,e\}$ and constraints

$\displaystyle a+d+ \overline{e}$ $\textstyle \geq$ $\displaystyle 1$ (52)
$\displaystyle \overline{a} + b + c$ $\textstyle \geq$ $\displaystyle 2$ (53)

Unit propagation now causes the variable $a$ to be simultaneously true (by virtue of (52)) and false (because of (53)). Resolving these reasons together as in Proposition 3.12 gives us
\begin{displaymath}
b+c+d + \overline{e} \geq 2
\end{displaymath} (54)

The conflict set here is easily seen to be $\{\neg b,\neg d,e\}$, and this is indeed prohibited by the derived constraint (54). But (54) eliminates some additional bad assignments as well, such as $\{\neg c,\neg d,e\}$. Just as in the lifted case, we have learned something about a portion of the search space that has yet to be examined.


next up previous
Next: Summary Up: Quantification and QPROP Previous: Subsearch and quantification
Matt Ginsberg 2004-02-19