CMU 10-806 Foundations of Machine Learning and Data Science, Ideas for Projects

CMU 10-806 Foundations of Machine Learning and Data Science, Fall 2015

Ideas for Projects

One of the course requirements is to do a project, which you may do individually or in a group of 2 to 3.

You could choose to do a small project (if you prefer the homework oriented grading scheme): this might involve conducting a small experiment or reading a couple of papers and presenting the main ideas. The end result should be a 3-5 page report.
Alternatively, you could choose to do a larger project (if you prefer the project oriented grading scheme): this might involve conducting a novel larger experiment, or thinking about a concrete open theoretical question, or thinking about how to formalize an interesting new topic, or trying to relate several problems. The end result should be a 10-15 page report.
We will also have a day for project presentations (format TBD).

Here are a few ideas for possible topics for projects. You might also want to take a look at recent COLT, ICML, or NIPS proceedings. All the recent COLT proceedings contain a few open problems, some with monetary rewards!

Project Ideas

Noise tolerant computationally efficient algorithms:

A. Daniely. A PTAS for Agnostically Learning Halfspaces. COLT 2015.
U. Feige, Y. Mansour, R. Schapire. Learning and inference in the presence of corrupted inputs. COLT 2015.
P. Awasthi, M.F. Balcan and P. Long. The Power of Localization for Efficiently Learning Linear Separators with Noise. STOC 2014.
P. M. Long and R. A. Servedio. Learning large-margin halfspaces with more malicious noise. NIPS 2011.
P. Awasthi, A. Blum, and O. Sheffet. Improved Guarantees for Agnostic Learning of Disjunctions. COLT 2010.
A. Kalai, A. Klivans, Y. Mansour, and R. Servedio. Agnostically Learning Halfspaces. FOCS 2005.
W. S. Lee, P. L. Bartlett, and R. C. Williamson. Efficient agnostic learning of neural networks with bounded fan-in. IEEE Trans Info Theory, 1996.
M. Kearns and M. Li. Learning in the Presence of Malicious Errors. STOC 1988.

Machine learning lenses in other areas:

K. Amin, R. Cummings, L. Dworkin, M. Kearns, and A. Roth. Online Learning and Profit Maximization from Revealed Preferences. AAAI 2015.
J. Morgenstern and T. Roughgarden. The Pseudo-Dimension of Nearly-Optimal Auctions. NIPS 2015.
M.F. Balcan, A. Daniely, R. Mehta, R. Urner, V. Vazirani: Learning Economic Parameters from Revealed Preferences. WINE 2014.
R. Cole, T. Roughgarden: The sample complexity of revenue maximization. STOC 2014.
M.F. Balcan and N. Harvey. Submodular Functions: Learnability, Structure, and Optimization. STOC 2011.
M.F. Balcan, E. Blais, A. Blum, and L. Yang. Active Property Testing. FOCS 2012.
Y. Chen and J. Wortman Vaughan. Connections Between Markets and Learning. ACM SIGecom Exchanges 2010.
M.F. Balcan, A. Blum, J. Hartline, and Y. Mansour. Mechanism Design via Machine Learning. FOCS 2005.
A. Blum and Y. Mansour. Learning, Regret Minimization, and Equilibria. Book chapter in Algorithmic Game Theory, Noam Nisan, Tim Roughgarden, Eva Tardos, and Vijay Vazirani, eds.

Distributed machine learning:

Y. Arjevani and O. Shamir. Communication Complexity of Distributed Convex Learning and Optimization. NIPS 2015.
Y. Zhang, M. Wainwright and M. Jordan. Distributed Estimation of Generalized Matrix Rank: Efficient Algorithms and Lower Bounds. ICML 2015.
O. Shamir, N. Srebro and T. Zhang. Communication Efficient Distributed Optimization using an Approximate Newton-type Method. ICML 2014.
J.C. Duchi, M. Wainwright, and Y. Zhang. Communication-Efficient Algorithms for Statistical Optimization . JMLR 2013.
M.F. Balcan, S. Ehrlich, and Y. Liang. Distributed Clustering on Graphs. NIPS 2013.
M.F. Balcan, A. Blum, S. Fine, and Y. Mansour. Distributed Learning, Communication Complexity and Privacy. COLT 2012.

Semi-supervised learning and related topics:

A. Balsubramani and Y. Freund. Optimally Combining Classifiers Using Unlabeled Data. COLT 2015.
M.F. Balcan, A. Blum, and Y. Mansour. Exploiting Ontology Structures and Unlabeled Data for Learning. ICML 2013.
Z. Karnin, E. Liberty, S. Lovett, R. Schwartz and O. Weinstein.Unsupervised SVMs: On the Complexity of the Furthest Hyperplane Problem. COLT, 2012.
M.F. Balcan and A. Blum. A Discriminative Model for Semi-Supervised Learning. Journal of the ACM, 2010.
A. Carlson, J. Betteridge, R. C. Wang, E. R. Hruschka Jr., and T. M. Mitchell. Coupled Semi-Supervised Learning for Information Extraction. International Conference on Web Search and Data Mining (WSDM), 2010.
X. Zhu. Semi-Supervised Learning. Encyclopedia of Machine Learning.
L. Xu, M. White, and D. Schuurmans. Optimal Reverse Prediction. Twenty-Sixth International Conference on Machine Learning (ICML), 2009.
X. Zhu, Z. Ghahramani, and J. Lafferty. Semi-supervised learning using Gaussian fields and harmonic functions. The 20th International Conference on Machine Learning (ICML) 2003.

Interactive learning:

K. Chaudhuri, S. Kakade, P. Netrapalli and S. Sanghavi. Convergence Rates of Active Learning for Maximum Likelihood Estimation. NIPS 2015.
G. Dasarathy, R. Nowak and X. Zhu. S2: An Efficient Graph Based Active Learning Algorithm with Application to Nonparametric Classification. COLT 2015.
T.-K. Huang, A. Agarwal, D. Hsu, J. Langford, and R. Schapire, Efficient and Parsimonious Agnostic Active Learning. NIPS 2015.
S. Hanneke. Theory of Disagreement-Based Active Learning. Foundations and Trends in Machine Learning, Vol. 7 (2-3), pp. 131-309.
M.F. Balcan and P. Long. Active and Passive Learning of Linear Separators under Log-concave Distributions. COLT 2013.
M.F. Balcan and S. Hanneke. Robust Interactive Learning. COLT 2012.
M.F. Balcan, A. Beygelzimer, J. Langford. Agnostic active learning. JCSS 2009 (originally in ICML 2006).
A. Beygelzimer, S. Dasgupta, and J. Langford. Importance-weighted active learning. ICML 2009.
V. Koltchinskii Rademacher Complexities and Bounding the Excess Risk in Active Learning. Journal of Machine Learning Research 2010.
S. Hanneke Rates of Convergence in Active Learning. The Annals of Statistics 2011.
See also the ICML 2015 Workshop on Advances in Active Learning - Bridging Theory and Practice.

VC-dimension and other methods of obtaining sample size bounds:

H. Simon. An Almost Optimal PAC Algorithm. COLT 2015.
H. Simon and S. Zilles. Open Problem: Recursive Teaching Dimension Versus VC Dimension. COLT 2015 (open problems track).
A. Ambroladze, E. Parrado-Hernandez, and J. Shawe-Taylor. Tighter PAC-Bayes Bounds. NIPS 2006.
D. McAllester. Simplified PAC-BAyesian Margin Bounds. COLT 2003.
S. Mendelson and P. Philips. On the Importance of Small Coordinate Projection. Journal of Machine Learning Research 2004.
J. Langford and D. McAllester. Computable Shell Decomposition Bounds. COLT 2000.

Online learning:

O. Dekel, J. Ding, T. Koren, Y. Peres: Bandits with switching costs: T^(2/3) regret. STOC 2014.
U. Feige, T. Koren, M. Tennenholtz: Chasing Ghosts: Competing with Stateful Policies. FOCS 2014.
Y. Mansour, A. Slivkins, V. Syrgkanis: Bayesian Incentive-Compatible Bandit Exploration. EC 2015.
A. Rakhlin, K. Sridharan: Online Learning with Predictable Sequences. COLT 2013.
A. Rakhlin, K. Sridharan: Optimization, Learning, and Games with Predictable Sequences. NIPS 2013.

Clustering and related topics:

J. Eldridge, M. Belkin, Y. Wang. Beyond Hartigan Consistency: Merge Distortion Metric for Hierarchical Clustering. COLT 2015.
P. Awasthi and M.F. Balcan. Center Based Clustering: A Foundational Perspective. Book Chapter in Handbook of Cluster Analysis, 2015.
Y. Li, K. He, D. Bindel, and J.E. Hopcroft. Uncovering the Small Community Structure in Large Networks: A Local Spectral Approach. WWW 2015.
M. Ackerman and S. Dasgupta. Incremental clustering: the case for extra clusters. NIPS 2014.
M.F. Balcan, C. Borgs, M. Braverman, J. Chayes, and S. Teng. Finding Endogenously Formed Communities. SODA 2013.
D. Hsu and S. M. Kakade. Learning Mixtures of Spherical Gaussians: Moment Methods and Spectral Decompositions. ITCS 2013.
S. Vassilvitskii and S. Venkatasubramanian. New Developments In The Theory Of Clustering. Tutorial at KDD 2010.
M.F. Balcan and P. Gupta. Robust Hierarchical Clustering. COLT 2010.
M.F. Balcan, A. Blum, and S. Vempala. A Discriminative Framework for Clustering via Similarity Functions. STOC 2008. See also full version.

Hardness of learning:

A. Daniely, N. Linial and S. Shalev-Shwartz. From average case complexity to improper learning complexity. STOC 2014. See also related 1, related 2, related 3.
S. Khot and R. Saket. On Hardness of learning intersection of two halfspaces. JCSS 2011. See also related 1

Multiclass classification:

A. Daniely and S. Shalev-Shwartz. Optimal Learners for Multiclass Problems. COLT 2014.
A. Daniely, S. Sabato, and S. Shalev-Shwartz. Multiclass Learning Approaches: A Theoretical Comparison with Implications. NIPS 2012
A. Daniely, S. Sabato, S. Ben-David, and S. Shalev-Shwartz. Multiclass Learnability and the ERM Principle. COLT 2011.

Relationship between convex cost functions and discrete loss: These papers look at relationships between different kinds of objective functions for learning problems.

P. L. Bartlett, M.I. Jordan, J.D. McAuliffe. Convexity, classification, and risk bounds. Journal of the American Statistical Association, 2006.
T. Zhang, Statistical behavior and consistency of classification methods based on convex risk minimization.
I. Steinwart, How to compare different loss functions and their risks.

Boosting related topics:

A. Beygelzimer, S. Kale and H. Luo. Optimal and Adaptive Algorithms for Online Boosting. ICML 2015.
I. Mukherjee and R. E. Schapire. A theory of multiclass boosting. Journal of Machine Learning Research 2013
P. M. Long and R. A. Servedio. Random classification noise defeats all convex potential boosters. Machine Learning, 2010
V. Feldman. Distribution-Specific Agnostic Boosting. Innovations in Computer Science (ICS), 2010.
A. Tauman Kalai and R. Servedio. Boosting in the Presence of Noise . JCSS 2005.
S. Dasgupta and P. M. Long. Boosting with diverse base classifiers. COLT 2003.

Learning with kernel functions:

M. Warmuth and S. V. N. Vishwanathan. Leaving the Span. COLT 2005.
M. F. Balcan, A. Blum, and N. Srebro. A Theory of Learning with Similarity Functions. Machine Learning Journal, 2008.
N. Srebro and S. Ben-David. Learning Bounds for Support Vector Machines with Learned Kernels. 19th Annual Conference on Learning Theory (COLT), 2006.
G. Lanckriet, N. Cristianini, P. Bartlett, and Laurent El Ghaoui. Learning the Kernel Matrix with Semidefinite Programming, Journal of Machine Learning Research 2004.

Learning in Markov Decision Processes: See M. Kearns's home page and Y. Mansour's home page for a number of good papers. Also S. Kakade's thesis.

Learning in Graphical Models (Bayes Nets)