Each experiment consisted of three phases, each phase corresponding to an increase in problem size. Goals were randomly selected for each problem, and, in the case of the logistics domain, the initial state was also randomly varied between problems. In an initial training session that took place at the start of each phase n, 30 n-goal problems were solved from scratch, and each derivation trace was stored in the library. Following training, the testing session consisted of generating problems in the same manner but with an additional goal. Each time that a new (n + 1) goal problem was tried, an attempt was made to retrieve a similar n-goal problem from the library. If during the testing session, a case that was similar to the new problem was found which had previously failed, then the problem was solved in learning, static and from-scratch modes, and it became part of the 30-problem set. With this method, we were able to evaluate the improvements provided by failure-based retrieval when retrieval on the static metric alone was ineffective, and when failure conditions were available.