As previously argued in Theorem 4, minimizing in the setting of Schapire Singer (1998) can be done optimally, but at the expense of complex optimization procedures, with large complexities. One can wonder whether such procedures, to optimize only the computation of (a small part of WIDC), are really well worth the adaptation to our setting, in which more values are authorized. We are now going to show that a much simpler combinatorial procedure, with comparatively very low complexity, can bring optimal results in fairly general situations. The most simple way to describe most of these situations is to make the following assumption on the examples:

- ()
- Each example used to compute has only one ``1'' in its class vector.

Suppose for now that () holds. Our objective is to calculate the vector of some monomial . We use the shorthands to denote the sum of weights of the examples satisfying and belonging respectively to classes . We want to minimize as proposed in eq. (3). Suppose without loss of generality that

otherwise, reorder the classes so that they verify this assertion. Given only three possible values for each component of , the testing of all possibilities for is exponential and time-consuming. But we can propose a very fast approach. We have indeed

Thus, the optimal does not belong to a set of cardinality , but to a set of cardinality . Our algorithm is then straightforward: simply explore this set of elements, and keep the vector having the lowest value of . Note that this combinatorial algorithm has the advantage to be adaptable to more general settings in which particular values are authorized for the components of , for any fixed not necessarily equal to 3. In that case, the complexity is larger, but limited to .

There are slightly more general settings in which our algorithm remains optimal, in particular when we can certify , . Here, denotes the sum of weights of the examples belonging at least to class , and denotes the sum of weights of the examples belonging at least to class , and

Our approximation algorithm is run in the multilabel case by transforming the examples as follows: each example for which is transformed into examples, having the same description , and only one ``1'' in their vector, in such a way that we span the ``1'' of the original example. Their weight is the one of the original example, divided by . We then run our algorithm on this new set of examples satisfying assumption ().

Now, suppose that for any example , we have for some . There are two interesting vectors we use. The first one is , the optimal vector (or an optimal vector) minimizing over the original set of examples, the second one is , the vector we find minimizing over the transformed set of examples. What we want is to estimate the quality of with respect to the optimal value of over the original set of examples, using our notation. The following theorem gives an answer to this problem, by quantifying its convergence towards .

Therefore, in the set of all problems for which for some , , we obtain , and our bound converges to the optimum as increases in this class of problems. By means of words, our simple approximation algorithm is quite efficient for problems with large number of classes. Note that using a slightly more involved proof, we could have reduced the constant ``'' factor in Theorem 6 to the slightly smaller ``''. Now, to fix the ideas, the following subsection displays the explicit (and simple) solution when there are only two classes.