next up previous
Next: Conclusions Up: Improving the Efficiency of Previous: Summary of Experimental Results

Related Work

The re-implementation of TILDE is related to the work by [22] who were the first to describe the ``examples in outer loop'' strategy for decision tree induction. The query pack execution mechanism, here described from the Prolog execution point of view, can also be seen as a first-order counterpart of APRIORI's mechanism for counting item-sets [1].

Other lines of work on efficiency improvements for ILP involves stochastic methods which trade a certain amount of optimality for efficiency by, e.g., evaluating clauses on a sample of the data set instead of the full data set [30], exploring the clause search space in a random fashion [31], or stochastically testing whether a query succeeds on an example [28]. The first of these is entirely orthogonal to query pack execution and can easily be combined with it.

The idea of optimising sets of queries instead of individual queries has existed for a while in the database community. The typical context considered in earlier research on multi-query optimisation [29] was that of a database system that needs to handle disjunctions of conjunctive queries, or of a server that may receive many queries from different clients in a brief time interval. If several of these queries are expected to compute the same intermediary relations, it may be more efficient to materialise these relations instead of having them recomputed for each query. Data mining provides in a sense a new context for multi-query optimisation, in which the multi-query optimisation approach is at the same time easier (the similarities among the queries are more systematic, so one need not look for them) and more promising (given the huge number of queries that may be generated at once).

[33] describe an algorithm for efficient execution of so-called query flocks in this context. Like our query pack execution mechanism, the query flock execution mechanism is inspired to some extent by APRIORI and is set in a deductive database setting. The main difference between our query packs and the query flocks described by [33] is that query packs are more hierarchically structured and the queries in a pack are structurally less similar than the queries in a flock. (A flock is represented by a single query with placeholders for constants, and is equal to the set of all queries that can be obtained by instantiating the placeholders to constants. Flocks could not be used for the applications we consider here.)

[19] describe work on multi-query optimisation in the context of relational databases. They also consider tree-like structures in which multiple queries are combined; the main difference is that their trees are rooted in one single table from which the queries select tuples, whereas our queries correspond to joins of multiple tables. Further, Dekeyser and Paredaens define a cost measure for trees as well as operators that map trees onto semantically equivalent (but less costly) trees, whereas we have considered only the creation of packs and an efficient top-down execution mechanism for them. Combining both approaches seems an interesting topic for further research.

Finally, other optimisation techniques for ILP have been proposed that exploit results from program analysis [27,8] or from propositional data mining technology [6]. These are complementary to our pack execution optimisation. Especially the approach of [6] can easily be combined with our pack mechanism. The techniques discussed by [27] and [8] involve optimisations for single query execution, some of which can to some extent be upgraded to the pack setting. This is future work.


next up previous
Next: Conclusions Up: Improving the Efficiency of Previous: Summary of Experimental Results
Hendrik Blockeel 2002-02-26