In Beating the Hold-Out: Bounds for K-fold and Progressive Cross-Validation (1999 Conference on Computational Learning Theory), we give a new analysis giving broad conditions under which the second method is, in fact, to be preferred over the first. We also describe a new method, progressive validation, that enjoys several advantages over these approaches.
In Microchoice Bounds and Self Bounding Learning Algorithms (1999 Conference on Computational Learning Theory), we describe a new analytical approach that can be used within the learning algorithm itself, without need for a holdout set. This improves on similar proposed methods by being much more efficient computationally (in many cases, turning an exponential-time computation into a linear or quadratic one) with little if any loss in performance.
In Noise-tolerant Learning, the Parity problem, and the Statistical Query model (32nd Annual Symposium on Theory of Computing) we developed the first known algorithm for learning a class of concepts in the presence of noise that provably cannot be learned in the Statistical Query model. The specific class is a subclass of parity functions, and this result also implies a (somewhat small) improvement in the best running time known for learning the general class of parity functions (reducing the time from 2^n to 2^{O(n / log n)}. There are a number of connections between these problems and issues in cryptography and approximation algorithms, and we are hopeful that these methods will lead to further improvements in the future.
In FeatureBoost: A Meta Learning Algorithm that Improves Model Robustness (ICML 2000), we develop a formal framework for studying this problem, as well as new algorithms that appear promising in preliminary experiments. These algorithms involve applying a "boosting-like" wrapper around a standard learning algorithms, subjecting them to data in which different subsets of features have been corrupted, and then combining the results in a weighted vote. A number of experiments on synthetic data suggest that this method can provide a substantial amount of robustness even when the original learning algorithm was quite brittle.
In Learning from Labeled and Unlabeled Data using Graph Mincuts (Avrim Blum and Shuchi Chawla, ICML'01) we explore a number of design issues involved with this approach, and demonstrate that it can indeed provide significant benefit when large amounts of unlabeled data are available, as well as when labeled data is noisy.
In Admission control to minimize rejections (Avrim Blum, Adam Kalai, and Jon Kleinberg, WADS '01 (LNCS 2125, pp.155-164, 2001)), we revisit this problem from the point of view of approximately minimizing the number of calls rejected. In a number of cases, we show it is possible to reject at most twice as many calls as the best in hindsight. Thus, if the optimal straegy in hindsight accepted 99% of calls, then this algorithm accepts at least 98%. In contrast, algorithms designed for the "benefit" formulation might accept only an 0.99/log(n) fraction of the calls. Thus, by changing the focus (from the accept ratio to the reject ratio) we can achieve what may be a much better guarantee.