15-859(B) Ideas for Projects

One of the course requirements is to do a small project, which you may do individually or in a group of 2. A project might involve conducting an experiment or thinking about a theoretical problem, or trying to relate two problems. It could even just be reading 2-3 research papers and explaining how they relate. The end result should be a 5ish page report, and a 10 minute presentation. Here are a few ideas for possible topics for projects. You might also want to take a look at recent COLT proceedings.

Topic Ideas

Active learning: The idea of much of the work on this topic is to try to reduce the 1/epsilon term in the number of labeled examples needed to get error epsilon down to log(1/epsilon), by allowing the algorithm to choose which examples in the dataset to have labeled. Sometimes you can do this and sometimes you can't. The theory is still not that fully developed. Some recent papers:

S. Dasgupta, Coarse sample complexity bounds for active learning. NIPS, 2005.
M.-F. Balcan, A. Broder, T. Zhang, Margin Based Active Learning. COLT 2007.
M.-F. Balcan, A. Beygelzimer, J. Langford, Agnostic active learning. ICML 2006.
S. Fine, Y. Mansour, Active Sampling for Multiple Output Identification. COLT 2006.
See also the NIPS 2005 Workshop on Foundations of Active Learning.

Winnow, Maxent, L1 and L2: In class we talked about Maxent and mentioned that Winnow can be viewed as an approximation. You can also go the other way, and there are also connections between Maxent and SVM.

A. Ng. Feature selection, L1 vs. L2 regularization,and rotational invariance, ICML 2004. This shows how maxent with L1 regularization can get bounds similar to Winnow.
M Wainwright, P. Ravikumar J. Lafferty. High-Dimensional Graphical Model Selection Using L1-Regularized Logistic Regression, NIPS'06.
J.Zhang, R.Jin, Y.Yang and A.Hauptmann, Modified Logistic Regression: An Approximation to SVM and its Applications in Large-Scale Text Categorization, ICML 2003.

Efficient agnostic learning: there are not a lot of efficient algorithms for agnostic learning (finding the approximately best h in H, when data is not necessarily consistent), but you can make progress if you make some assumptions on the distribution:

A. Kalai, Y. Mansour, and E. Verbin. Agnostic Boosting and Parity Learning. STOC 2008.
A. Kalai, A. Klivans, Y. Mansour, and R. Servedio. Agnostically Learning Halfspaces. FOCS 2005. (Link points to Adam Kalai's home page. Scroll down for paper & ppt)
W. S. Lee, P. L. Bartlett, and R. C. Williamson. Efficient agnostic learning of neural networks with bounded fan-in. IEEE Trans Info Theory, 1996.

Clustering and related topics:

J. Feldman and R. O'Donnell and R. Servedio. PAC Learning Mixtures of Axis-Aligned Gaussians with No Separation Assumption. COLT 2006, pp. 20--34.
R. Kannan, H. Salmasian, and S. Vempala. "The Spectral Method for General Mixture Models". COLT 2005.
D. Achlioptas and F. McSherry. "On Spectral Learning of Mixtures of Distributions". COLT 2005.
M.F. Balcan, A. Blum, and S. Vempala. A Discriminative Framework for Clustering via Similarity Functions. STOC 2008.

Membership query algorithms:

P. Gopalan, A. Kalai, and A. Klivans. Agnostically Learning Decision Trees. STOC 2008.
J. Jackson, An Efficient Membership-Query Algorithm for Learning DNF with Respect to the Uniform Distribution. Journal of Computer and System Sciences 55(3), 1997. N. Bshouty, J. Jackson, and T. Tamon, More Efficient PAC-learning of DNF with Membership Queries Under the Uniform Distribution. Proceedings of the 12th Annual Workshop on Computational Learning Theory, 1999.
These papers show how to use membership queries to learn DNF formulas respect to the uniform distribution. Combination of fourier analysis with boosting and a clever analysis of the distributions produced.

Connections between learning and crypto:Connections in terms of basic foundations, and also in terms of specific problems and their relation to lattice-based cryptography.

A. Klivans and R. Servedio. Boosting and Hard-Core Sets. Machine Learning 53(3), 2003.
O. Regev. On lattices, learning with errors, random linear codes, and cryptography, STOC 2005.
R. Kumar and D. Sivakumar. On polynomial approximation to the shortest lattice vector length, SODA 2001.

Structured Prediction: This looks at problems where instead of making a binary "positive"/"negative" classification (or even a k-way classification), you instead have a much more complex output space. E.g., say you want to parse an English sentence, or output its translation into a different language. Here are a couple papers:

H. Daume, J. Langford, D. Marcu, Search-based Structured Prediction.
B. Taskar, V. Chatalbashev, D. Koller and C. Guestrin. Learning Structured Prediction Models: A Large Margin Approach. ICML 2005.

Computational hardness results, connections to complexity thoryr: Some recent papers:

L. Fortnow and A. Klivans. Efficient Learning Algorithms Yield Circuit Lower Bounds, COLT 2006.
V. Feldman, P. Gopalan, S. Khot, A. Ponnuswami. New Results for Learning Noisy Parities and Halfspaces, FOCS 2006.
V. Feldman, Optimal Hardness Results for Maximizing Agreements with Monomials, CCC 2006.
V. Guruswami and P. Raghavendra, Hardness of learning halfspaces with noise. FOCS 2006.

Relationship between convex cost functions and discrete loss: These papers look at relationships between different kinds of objective functions for learning problems.

P.L. Bartlett, M.I. Jordan, J.D. McAuliffe. Convexity, classification, and risk bounds. Journal of the American Statistical Association, 101(473):138-156, 2006.
T.Zhang, Statistical behavior and consistency of classification methods based on convex risk minimization.
I.Steinwart, How to compare different loss functions and their risks.

Extension of experts setting to "bandit" or other limited-information settings. Extensions to "internal regret". Portfolio selction. See. e.g.,

E. Hazan, A. Kalai, S. Kale, A. Agarwal. Logarithmic Regret Algorithms for Online Convex Optimization. COLT 2006.
G Stoltz and G. Lugosi. Internal Regret in On-Line Portfolio Selection, MLJ 2005.
P. Auer, N. Cesa-Bianchi, Y. Freund, R. Schapire, The Nonstochastic Multiarmed Bandit Problem, SIAM J. Comput. 32(1): 48-77 (2002).
V. Dani and T. Hayes, Robbing the bandit: Less regret in online geometric optimization against an adaptive adversary. SODA 2006.
A. Blum and Y. Mansour, From External to Internal Regret. COLT 2005.

Learning in Markov Decision Processes: See Michael Kearns's home page and Yishay Mansour's home page for a number of good papers. Also Sham Kakade's thesis.

Learning kernel functions: See, e.g., N. Srebro and S. Ben-David, Learning Bounds for Support Vector Machines with Learned Kernels, 19th Annual Conference on Learning Theory (COLT), 2006. Here is an earlier paper on the topic.

PAC-Bayes bounds and other methods of obtaining confidence bounds. Some papers:

Matti Kääriäinen, Relating the Rademacher and VC Bounds, 2004.
D. McAllester, Simplified PAC-BAyesian Margin Bounds. COLT 2003.
A. Ambroladze, E. Parrado-Hernandez, J. Shawe-Taylor, Tighter PAC-Bayes Bounds. NIPS 2006.
J. Langford and D. McAllester. Computable Shell Decomposition Bounds. COLT 2000.

Last modified: Wed Mar 5 13:20:16 EST 2008