Discussion

Next: Comparison with Other Engines Up: Warmr Previous: Results

Discussion

The execution time of WARMR has a large component that is not used to evaluate queries. This is caused by the fact that WARMR needs to do a lot of administrative work. In particular, theta-subsumption tests should be done on the queries to check wether a query is equivalent to another candidate, or if a query is a specialisation of an infrequent one. In the propositional case (the APRIORI algorithm), these tests are very simple, but in the first order case they require exponential time in the size of the queries. Of course, when using larger datasets, the relative contribution of these administrative costs will decrease proportionally. It can be observed that at deeper levels, these costs are less for the setting using packs. One of the causes is the fact that the no-packs version also uses more memory than the packs setting (and hence causes proportionally more memory management).

Here again, the most important numbers are the speedup factors for the execution of queries. Speedup factors of query execution do not always increase with increasing depth of the packs, in contrast to TILDE where larger packs yielded higher speedups. At first sight we found this surprising; however it becomes less so when the following observation is made. When refining a pack into a new pack by adding a level, WARMR prunes away branches that lead only to infrequent queries. There are thus two effects when adding a level to a pack: one is the widening of the pack at the lowest level (at least on the first few levels, a new pack typically has more leaves than the previous one), the second is the narrowing of the pack as a whole (because of pruning). Since the speedup obtained by using packs largely depends on the branching factor of the pack, speedup factors can be expected to decrease when the narrowing effect is stronger than the widening-at-the-bottom effect. This can be seen, e.g, in the small-mutagenesis experiment, where at the deepest levels queries are becoming less frequent. For the mutagenesis experiment with the medium size language, query execution speedup factors are larger as the number of queries increases much faster. For the mutagenesis experiment with the large language, it is the total speedup that is large, as the language generates so many queries that the most time-consuming part becomes the administration and storage in memory. The packs version is much faster as it stores the queries in trees, requiring significantly less memory.

Next: Comparison with Other Engines Up: Warmr Previous: Results

Hendrik Blockeel 2002-02-26