There are several ways how future work can be based on the presented results. Clearly, a deeper exploration of the effects of the parameter settings of our algorithms is needed. In particular, a better understanding of the -parameter in the noise handling heuristics is necessary for practical applications of the algorithm. Alternative solution attempts for the three problems outlined in section 5.2 should also be investigated, thus maybe entirely eliminating the need for this parameter. As discussed in section 3.4, the applicability of windowing algorithms crucially depends on the presence of some redundancy in the training set. Thus, better methods for characterizing redundant domains are a rewarding topic for further research. The efficiency of the presented algorithms could maybe be further improved by trying to specialize over-general rules, instead of entirely removing them from the current theory and relying on the next windowing iteration to find a better rule. Ideas from theory revision might turn out to be useful in this context.

While our major concern was to demonstrate that significant increases in run-time are possible, it might be promising to further investigate the use of windowing techniques for increasing predictive accuracy or for increasing the effectiveness for learning with limited memory resources. For the former problem, techniques for combining the results of multiple runs of windowing with different random seeds, as implemented in C4.5 [53], are quite similar to bagging and boosting techniques and could produce similar results. For learning in the presence of memory limitations, we have found that in noise-free domains, the curve for the total number of examples submitted to the learning algorithms flattens at the point where 100% training accuracy is reached. Thus the windowing algorithms need asymptotically less memory than the learning algorithms per se, if one assumes that the testing of learned theories can be performed on disk. However, no guarantees can be made whether the use of windowing will allow a given learning problem to be solved without exceeding a given memory bound. A closer investigation of these issues would also be a rewarding topic for further research.

Currently, the applicability of our algorithms is limited to propositional 2-class problems. A straightforward adaption to multi-class problems can be performed in several ways [12,1,19]. A different approach might adapt the algorithm for learning ordered rule sets as in CN2 [13]. For example, one could use a separate windowing process for each rule, so that the rules that are successively added to the final theory can be of different classes. Orthogonally, an extension of windowing for first-order learning techniques is another challenging and rewarding task. Note that the sub-sampling problem in inductive logic programming is considerably harder than for propositional learning. For example, first-order learning systems often allow an implicit definition of training examples [47] or learn from positive examples only by making some form of closed-world assumption [52], which prevents the straight-forward use of sub-sampling. Furthermore, the examples in training sets for ILP algorithms can depend on each other. For example, learning a recursive concept often requires that all instantiations of the target predicate that are used in the derivation of a positive example are present in the training data, a property which is not likely to hold in the presence of sub-sampling.