Configuration is an area where CSP technology has been particularly effective. A configuration problem can be viewed as trying to specify a product defined by a set of attributes, where attribute values can only be combined in predefined ways. Such problems can be modelled as CSPs, where variables correspond to attributes, the domains of the variables correspond to the possible values of the attributes, and constraints specify the predefined ways in which values can be combined. In many configuration problems the constraints are expressed extensionally through lists of allowed (or disallowed) combinations of values. Alternatively, constraints are expressed as rules which can easily be transformed into an extensional representation. Consider the following example adapted from the paper by Subbarayan, Jensen, Hadzic, Andersen, Hulgaard & Moller subbarayan04.

In practice, many solvers for configuration problems are able to interact with the user so that, apart from meeting the given specifications, the user's choices of values for certain attributes are also satisfied. In this study we use configuration instances to compare the non-binary representation to binary encodings on structured realistic problems. Although it would be interesting to investigate the applicability of binary encodings in an interactive configurator, such work is outside the scope of this paper.

We run experiments on five problems taken from CLib, a library of
benchmark configuration problems [CLibCLib2005]. The first thing we
noticed after encoding the five problems as binary and non-binary
CSPs is that they are trivially solvable by all algorithms without
backtracking. A closer look at the structure of CLibs's problems
revealed the reason; their constraint graphs consist of various
unconnected components. Each component consists of very few or, in
some cases, a single variable. As a result, the problems are split
into independent subproblems that are trivially solved by all
algorithms. In order to obtain difficult instances for
benchmarking, we made the graphs connected by adding some
randomness. Each of the five problems was extended by adding to it
6 variables and 8-10 constraints so that the graph became
connected^{8}.
Table 5 shows the total number of variables and
constraints in the modified problems. The added constraints were
of arity 2, 3, or 4 (chosen at random) and the variables on which
they were posted were selected at random, making sure that the
resulting graph was connected. The looseness of each added
constraint was also set at random, and finally, the allowed tuples
of each constraint were chosen at random according to its
looseness.

Table 5 gives the average run times and node visits for algorithms MGAC-2001 in the non-binary representation, MAC-PW-AC in the DE, and MAC-PW-ACd in the double encoding. For each of the five benchmarks we repeatedly generated instances using the model described above. Each generated instance was solved by all three algorithms and was stored if the instance was hard for at least one algorithm. Otherwise, it was discarded. An instance was considered hard if at least one algorithm took more than one second to solve it. Table 5 reports averages over the first 50 hard instances generated from each benchmark. That is, we run 250 hard instances in total. Note that in the binary encodings all constraints of the original problem (even binary ones) were encoded as dual variables.

The experimental results of Table 5 show a very significant advantage in favor of the binary encodings compared to the non-binary representation, both in node visits and run times. The DE is clearly the most efficient model. MAC-PW-AC in the DE can be up to three orders of magnitude faster than MGAC-2001 in the non-binary representation. There was not a single instance among the 250 instances where MGAC-2001 was faster than MAC-PW-AC. The double encoding is also much more efficient than the non-binary representation. The main factor contributing to the performance of the encodings is the strong propagation that is achieved through the constraints between dual variables, which is reflected on the numbers of node visits. There is a number of reasons, related to the structure of these configuration problems, that can justify the strong performance of the encodings:

- The constraint graphs are very sparse. This is typical of
configuration problems since, usually, an attribute of the product
specification has dependencies with only a few of the other
attributes.
- The constraints of high arity are very tight. Moreover, each
value of the variables with large domain sizes has very few
(typically one) supporting tuple in the constraints such variables
participate.
- There are intersecting non-binary constraints with more than one original variable in common. As explained, and demonstrated empirically in Section 6.2, this can have a significant impact on the propagation power of AC in the dual and double encodings.

Nikolaos Samaras 2005-11-09