# 5. Minimal Unsolvable Subproblems

Our observations on classes of problems with restrictions on the number of solutions they may have shows that the common identification of the peak in solution cost with the algorithm-independent transition in solvability seen in general problem classes does not capture the full generality of the easy-hard-easy pattern.

For solvable problems, this explanation could be readily modified to use a transition in the existence of solutions beyond those specified by the construction of the class of problems and symmetries those problems might have that constrain the allowable range of solutions. This modification is a simple generalization of the existing explanation based on the competition between the number of solutions and pruning. However, our data for solvable problems do not support this explanation, in that the search cost peak and disappearance of the second to last solution coincide only roughly for n=10, and not at all for n=20.

Furthermore, when the number of solutions is held constant, competition between increased pruning and decreasing number of solutions cannot possibly be responsible for a peak in solution cost. The decrease in search cost for highly constrained problems (to the right of the peak) is adequately explained by the prevailing explanation, based on the increase in pruning with additional constraints. But this does not explain why weakly constrained problems are also found to be easy, at least for some search methods. The low cost of unsolvable problems in the underconstrained region is a new and unexpected observation in light of previous studies of the easy-hard-easy pattern and its explanation. This raises the question of whether there is a different aspect of problem structure that can account for the peak in search cost for problems with a fixed number of solutions.

One possibility that is often mentioned in this context is the notion of critically constrained problems. These are problems just on the boundary between solvable and unsolvable problems, i.e., neither underconstrained (with many solutions) nor overconstrained (with none). This notion forms the basis for another common interpretation of the cost peak. That is, these critically constrained problems will typically be hard to search (because most of the constraints must be instantiated before any unproductive search paths can be identified) and, since they are concentrated at the transition [19], give rise to the search peak. This explanation does not include any discussion of the changes in pruning capability as constraints are added. Taken at face value, this explanation would predict no peak at all for solvable problems or when the number of solutions is held constant, because such classes have no transition from solvable to unsolvable problems. Moreover, this description of critically constrained problems is not simply a characteristic of an individual problem but rather is partly dependent on the class of problems under consideration because the exact location of the transition depends on the method by which problems are generated. This observation makes it difficult to characterize the degree to which an individual problem is critically constrained purely in terms of structural characteristics of that problem. By contrast, a measure such as the number of solutions is well defined for individual problem instances, which facilitates using its average behavior for various classes of problems to approximately locate the transition region. Thus, as currently described, the notion of critically constrained problems does not explain our observations nor does it give an explicit way to characterize individual problems.

Figure 8: Mean size of smallest minimal unsolvable subproblem as a function of number of nogoods, for unsolvable problems generated using the ``hill-climbing'' (dotted line) and ``generate-select'' (solid line) methods. Each point is based on 1000 problems, except for the ``generate-select'' method at 30 nogoods, which is based on 100 problems. Error bars showing 95% confidence intervals are included.

A more precisely defined alternative characteristic is the size of minimal unsolvable subproblems. As we mentioned in Section 4.2, a minimal unsolvable subproblem is a subproblem that is unsolvable, but for which any subset of variables and their associated constraints is solvable.

Some problems have more than one minimal unsolvable subproblem. For example, a problem might have one minimal unsolvable subproblem of five variables, and another, different one, of say, six. We computed all minimal unsatisfiable subproblems for all of the 10-variable unsolvable problems we had generated. We found a monotonic positive relationship between mean number of minimal unsolvable subproblems and number of nogoods. For example, problems with 140 nogoods have an average of 35 minimal unsolvable subproblems (range 4 to 64, standard deviation 8.7); those with 90 nogoods have about six (range 1 to 23, standard deviation 3.6); and problems with 50 or fewer nogoods rarely have more than one minimal unsolvable subproblem. Similarly, [9] observed that unsatisfiable problems in the underconstrained region tend to have small and unique minimal unsatisfiable subsets.

The behavior of the size of the smallest minimal unsolvable subproblem as a function of the number of nogoods is shown in Figure 8. Comparing with Figure 2, we see that the peak in the minimum size of minimal unsolvable subproblems matches the location of the search cost peak for unsolvable problems. This result is independent of whether we plot the smallest minimal unsolvable subproblem size, as shown in Figure 8, or medians or means, which we have not shown here. Moreover, the location of the peaks in minimal unsolvable subproblem size for the different generation methods correspond to the location of their respective search cost peaks. The peak in both search cost and minimal unsolvable subproblem size occurs at around 40 nogoods for problems generated using the ``hill-climbing'' method, and significantly higher, around 60 nogoods, for problems generated using the ``generate-select'' method. The strong correspondence between minimal unsolvable subproblem size and search cost is very suggestive that minimal unsolvable subproblem size is a structural characteristic of problems that plays an important role in search cost. By contrast, number of minimal unsolvable subproblems does not match the pattern of search cost. As mentioned above, it increases monotonically with number of nogoods, suggesting that it does not play a primary role in explaining search cost for unsolvable problems.

The behavior of the minimal unsolvable subproblem size as a function of the number of constraints has a simple explanation. Unsolvable weakly constrained problems will generally need to concentrate most of the available constraints on a few variables in order to make all assignments inconsistent. This will tend to give one small minimal unsolvable subproblem. As more constraints are added, this concentration is no longer required and, since problems where most of the randomly selected constraints happen to be concentrated on a few variables are rare, we can expect more and larger minimal unsolvable subproblems. Finally, as more and more constraints are added, the increased pruning is equivalent to the notion that instantiating only a few variables is all that is required to find an inconsistency. This means we can expect a large number of small unsolvable subproblems. This qualitative description corresponds to what we observe in Figure 8.

Our observations of weakly constrained problems suggest that some search algorithms, such as dynamic backtracking, are able to rapidly focus in on one of the unsolvable subproblems and hence avoid the extensive thrashing, and high search cost, seen in other methods. In such cases, one would expect that the smaller the unsolvable subproblem, the easier it will be for the search to determine there are no solutions.

Figure 9: Mean solution cost as a function of size of smallest minimal unsolvable subproblem, for unsolvable problems with 60 nogoods generated using the ``generate-select'' method. Each point is the mean of the median solution costs, based on solving each problem 100 times, for the set of problems with the corresponding smallest minimal unsolvable subproblem size. The points are based on following numbers of problems for each smallest minimal unsolvable subproblem size, totaling 1000 problems: 3 - 1; 4 - 17; 5 - 71; 6 - 156; 7 - 253; 8 - 283; 9 - 165; 10 - 54. Error bars showing 95% confidence intervals are included, except for the single problem at size 3 for which confidence intervals cannot be calculated.

In order to examine the role of minimal unsolvable subproblem in search cost more closely, we plotted mean search cost versus size of smallest minimal unsolvable subproblem for unsolvable problems of 10 variables at each multiple of 10 nogoods from 30 to 140 nogoods. In every case, mean search cost increased as a function of size of smallest minimal unsolvable subproblem. Figure 9 shows an example of one of these plots, at the peak in solution cost for this class of problems, 60 nogoods. It makes sense that the smallest minimal unsolvable subproblem, being the easiest to detect, would play a significant role in search cost. However, the situation is surely more complicated than this, suggested by the fact that there is still variation among problems with the same size smallest minimal unsolvable subproblem. This could be due, for example, to one problem having several small minimal unsolvable subproblems, while another might have one minimal unsolvable subproblem, even smaller. Number and size of minimal unsolvable subproblems are both likely to play a role in search cost.

Number of minimal unsolvable subproblems does not seem to play as significant a role as size of smallest minimal unsolvable subproblem, but its effect can also be demonstrated. For the same sets of unsolvable problems as above, for each multiple of 10 nogoods from 80 to 140 nogoods, search cost correlates negatively with number of minimal unsolvable subproblems. However, for unsolvable problems with 30 to 70 nogoods, where variance in number of minimal unsolvable subproblems is lower (but variance in search cost is higher), there is no relationship between search cost and number of minimal unsolvable subproblems. Additional clarification of the role in search cost of both size and number of minimal unsolvable subproblems is left for further investigation. But size of smallest minimal unsolvable subproblem, which correlates strongly with search cost for (1) unsolvable problems taken as a whole (see Figures 2 and 8) and (2) unsolvable problems with a fixed number of nogoods over the full range of number of nogoods, appears to have the more primary effect.

This discussion of minimal unsolvable subproblems is also relevant to solvable problems: once a series of choices that precludes a solution is made during search, the remaining subproblem is now an unsolvable one. For example, in a 10-variable CSP, suppose values are given to the first two variables that are incompatible with all solutions to the problem. This means that in the context of these two assignments, the remaining eight variables constitute an unsolvable subproblem. The number of search steps required to determine that this subproblem is in fact unsolvable will be the cost added to the search before backtracking to the original two variables and trying a new assignment for one of them. Thus, the cost of identifying unproductive search choices for solvable problems is determined by how rapidly the associated unsolvable subproblem can be searched. As described above, when there are few constraints we can expect that such unsolvable subproblems will themselves have small minimal unsolvable subproblems and hence be easy to search with methods that are able to focus on such subproblems. While the unsolvable subproblems associated with incorrect variable choices in solvable problems may have a different structure, this argument suggests that changes in minimal unsolvable subproblems may explain the behavior of solvable problems with a fixed number of solutions as well. This could also explain observations of thrashing behavior for rare exceptionally hard solvable problems in the underconstrained region [7, 12]; we would expect such problems to have a relatively large unsolvable subproblem to detect given the initial variable assignments. Finally, it would be interesting to study the behavior of local repair search methods for problems with a single solution to see if they also are affected by the change in minimal subproblem size.

Previous: 4. Search Difficulty and Solvability
Next: 6. Conclusions