Course Project

Your class project is an opportunity for you to explore an interesting problem in the context of a real-world data sets. Projects should be done in teams of three students. Each project will be assigned a TA as a project consultant/mentor; instructors and TAs will consult with you on your ideas, but of course the final responsibility to define and execute an interesting piece of work is yours. Your project will be worth 40% of your final class grade, and will have 4 deliverables:

  1. Proposal : 2 pages excluding references (10%)
  2. Midway Report : 5 pages excluding references (20%)
  3. Final Report : 8 pages excluding references (40%)
  4. Presentation : 10 minute talk (30%)

All write-ups should use the ICML style.

Team Formation

You are responsible for forming project teams of 3 people. In some cases, we will also accept teams of 2, but a 3-person group is preferred. Once you have formed your group, please send one email per team to the class instructor list with the names of all team members. If you have trouble forming a group, please send us an email and we will help you find project partners.

Project Proposal

You must turn in a brief project proposal that provides an overview of your idea and also contains a brief survey of related work on the topic. We will provide a list of suggested project ideas for you to choose from, though you may discuss other project ideas with us, whether applied or theoretical. Note that even though you can use datasets you have used before, you cannot use work that you started prior to this class as your project.

Proposals should be approximately two pages long, and should include the following information:

  • Project title and list of group members.
  • Overview of project idea. This should be approximately half a page long.
  • A short literature survey of 4 or more relevant papers. The literature review should take up approximately one page.
  • Description of potential data sets to use for the experiments.
  • Plan of activities, including what you plan to complete by the midway report and how you plan to divide up the work.

The grading breakdown for the proposal is as follows:

  • 40% for clear and concise description of proposed method
  • 40% for literature survey that covers at least 4 relevant papers
  • 10% for plan of activities
  • 10% for quality of writing

The project proposal will be due by 7 PM on Monday, February 20th, and should be submitted via Gradescope.

Midway Report

The midway report will serve as a check-point at the halfway mark of your project. It should be about 5 pages long, and should be formatted like a conference paper, with the following sections: introduction, background & related work, methods, experiments, conclusion. The introduction and related work sections should be in their final form; the section on the proposed methods should be almost finished; the sections on the experiments and conclusions will have the results you have obtained, perhaps with place-holders for the results you plan/hope to obtain.

The grading breakdown for the midway report is as follows:

  • 20% for introduction and literature survey
  • 40% for proposed method
  • 20% for the design of upcoming experiments and revised plan of activities (in an appendix, please show the old and new activity plans)
  • 10% for data collection and preliminary results
  • 10% for quality of writing

The project midway report will be due on Monday, March 27th, and should be submitted via Gradesope.

Final Report

Your final report is expected to be 8 pages excluding references, in accordance with the length requirements for an ICML paper. It should have roughly the following format:

  • Introduction: problem definition and motivation
  • Background & Related Work: backround info and literature survey
  • Methods
    • Overview of your proposed method
    • Intuition on why should it be better than the state of the art
    • Details of models and algorithms that you developed
  • Experiments
    • Description of your testbed and a list of questions your experiments are designed to answer
    • Details of the experiments and results
  • Conclusion: discussion and future work
The grading breakdown for the final report is as follows:
  • 10% for introduction and literature survey
  • 30% for proposed method (soundness and originality)
  • 30% for correctness, completeness, and difficulty of experiments and figures
  • 10% for empirical and theoretical analysis of results and methods
  • 20% for quality of writing (clarity, organization, flow, etc.)

Presentation

All project teams will present their work at the end of the semester. Each team will be given a timeslot during which they will give a slide presentation to the class, similar in style to a conference presentation. If applicable, live demonstrations of your software are highly encouraged.

Project Suggestions:

  • If you are interested in a particular project, please contact the respective Contact person to get further ideas or details.
  • We may add more project suggestions down the road.



(Deep/graphical) modeling of longitudinal medical data

With the availability of electronic health records (EHR) data [1], accurate models of longitudinal processes [2] have have become of high priority for precision medicine [3]. Such models should be able learn from inhomogenous time-series data to confidently predict the evolution of the state of a patient or generate possible outcomes given medical history and/or certain intervention (e.g., a drug). Accurate models of such type can be used for supporting medical decision process or within the reinforcement learning framework. Currently, the common approach is via complex hierarchical latent variable graphical models [4] which, while explicit, might be suboptimal when the goal is to predict or generate the future. The goal of this project may be one of the following:

  • Design a predictive recurrent model that can learn from EHR data. Since the nature of the data is event-based, ideas similar to phased LSTM [5] may work.
  • Design a Variational Autoencoder [6] or Generative Adversarial Network [7] for EHR.
Contact person: Maruan Al-Shedivat

References:

[1] Hoerbst, A., and E. Ammenwerth. ``Electronic health records.'' Methods Inf Med 49.4 (2010): 320-336.
[2] https://en.wikipedia.org/wiki/Longitudinal_study
[3] Collins, F. S., and H. Varmus. ``A new initiative on precision medicine.'' New England Journal of Medicine 372.9 (2015): 793-795.
[4] Schulam, P., and S. Saria. ``Integrative Analysis using Coupled Latent Variable Models for Individualizing Prognoses.'' JMLR 2016.
[5] Neil, D., M. Pfeiffer, and S.-C. Liu. ``Phased LSTM: Accelerating Recurrent Network Training for Long or Event-based Sequences.'' NIPS 2016.
[6] Kingma, D. P., and M. Welling. ``Auto-encoding variational bayes.'' arXiv:1312.6114 (2013).
[7] Goodfellow, I., et al. ``Generative adversarial nets.'' NIPS 2014.


(Deep/graphical) modeling of spatio-temporal data

The goal of this project is to come up with models of spatio-termporal processes that that can be efficienly learned from spatio-temporal time series, including the following applications: crime prediction [1], EEG data [2], etc. The main challenge is that state-of-the-art is often as simple as a linear autoregressive moving average model or kernel density estimator [3, 4]. Your task is to design specific flavors of graphical models, e.g., HMMs/CRFs, or deep recurrent models, e.g., various RNNs, that can efficiently learn from such data and beat the classical baselines.

Contact person: Maruan Al-Shedivat

References:

[1] Gerber, M. S. ``Predicting crime using Twitter and kernel density estimation.'' Decision Support Systems 61 (2014): 115-125.
[2] Williamson, J. R., et al. ``Seizure prediction using EEG spatiotemporal correlation structure.'' Epilepsy & Behavior 25.2 (2012): 230-238.
[3] https://en.wikipedia.org/wiki/Autoregressive_model
[4] Cressie, N., and C. K. Wikle. ``Statistics for spatio-temporal data.'' John Wiley & Sons, 2015.



Bayesian learning of feedforward/convolutional/recurrent neural networks

Plain feedforward neural networks are prone to overfitting. When applied to supervised or reinforcement learning problems these networks are also often incapable of correctly assessing the uncertainty in the training data and so make overly confident decisions about the correct class, prediction or action. We can address these issues through Bayesian learning to introduce uncertainty (expressed and measured by probabilities) in the weights of the networks [1, 2, 3, 4]. Previous work assumed either independent Gaussian prior over each of the weights or used dropout with fixed [5] or learned [6] probabilities. In this project, your goal is to explore and further improve the Bayesian learning methods for neural networks, pontentially including convolutional and recurrent models. E.g., a simple way is, for convolutional neural networks (Convnet), to introduce structures in the Gaussian prior consistent with the Convnet structure. Stochastic backpropagation with VI or MCMC can be used for training.

Contact person: Maruan Al-Shedivat

References:

[1] R. Neal., ``Bayesian Learning for Neural Networks''. Springer 1996
[2] C. Blundell et al., ``Weight Uncertainty in Neural Networks'', ICML 2015
[3] J. Hern'andez-Lobato and R. Adams., ``Probabilistic Backpropagation for Scalable Learning of Bayesian Neural Networks'', ICML 2015
[4] Hinton, G. E. and D. Van Camp. ``Keeping the neural networks simple by minimizing the description length of the weights.'' COLT 1993
[5] Gal, Y. and Z. Ghahramani. ``Dropout as a Bayesian approximation: Representing model uncertainty in deep learning.'' arXiv:1506.02142 (2015).
[6] Kingma, D. P., T. Salimans and M. Welling. ``Variational dropout and the local reparameterization trick.'' NIPS 2015.



Distributed Gaussian process on GPUs

Gaussian processes (GPs) are rich distributions over functions, which provide a Bayesian nonparametric approach to smoothing and interpolation. GP has been widely used in regressian and classification, Bayesian optimization, reinforcement learning, etc. However, GP is typically unable to scale to large modern datasets due to the high training complexity cubic in the number of data points. One research line is to scale up GP with sophisticated parallelization and GPU acceleration (e.g., [1]). On the other hand, a recent work proprosed KISS-GP [2] which reduces learning time complexity from cubic to near-linear order. In this project, your goal is to combine these two lines, by parallelizing KISS-GP on distributed computer clusters, and adapt the computation to use GPUs efficiently.

Contact person: Maruan Al-Shedivat

References:

[1] Z. Dai et al., ``Gaussian Process Models with Parallelization and GPU acceleration,'' NIPS workshop 2014
[2] A. Wilson et al., ``Kernel interpolation for scalable structured Gaussian processes,'' ICML 2015



Efficient deep/recurrent kernel learning for Gaussian processes in TensorFlow

Deep [1] and recurrent [2] kernel learning methods allow to combine flexible deep netwroks with powerful and robust Bayesian nonparameteric Gaussian processes, and have established state-of-the-art results in many regression tasks. The main idea behind the scalability of these methods is called KISS-GP, described in [3]. Unfortunately, the best available implementation of KISS-GP is in MATLAB, which makes the combination of deep nets and GPs a little tricky: while the neural network part of the model can be implemented with Caffe [1] or Tensorflow [2], it further requires combining that with MATLAB-based GPs. The goal of this project is provide a native implementation of KISS-GP in Tensorflow and measure and demonstrate its performance and scalability. GPFlow [4] can be used as an API example.

Contact person: Maruan Al-Shedivat

References:

[1] Wilson, A. G., et al. ``Deep kernel learning.'' AISTATS 2016.
[2] Al-Shedivat, M., et al. ``Learning scalable deep kernels with recurrent structure.'' arXiv 1610.08936 (2016) [3] Wilson , A. G. et al., ``Kernel interpolation for scalable structured Gaussian processes,'' ICML 2015
[4] Matthews, A. G. G., et al. ``GPflow: A Gaussian process library using TensorFlow.'' arXiv preprint arXiv:1610.08733 (2016).



Hyperparameter Selection of Lasso and Its Variants for Variable Selection Task

While Lasso has its clear advantage over linear regressions in terms of inducing sparsity or overfitting-proof, it requires an extra parameter that governs the strength of regularizer to be manually tuned. It is OK to tune this parameter through cross-validation while Lasso is used for its overfitting-proof property for prediction task. However, when Lasso is used for selecting a sparse set of variables, prediction accuracy is not a criteria that is directly related to performance of variable selection, therefore, cross validation is not helpful. The project encourages students to investigate this problem and attempt to find some solutions to select the regularizer weight of Lasso (and its variants) for variable selection task.

Contact Person Haohan Wang

References:

[1] Fu, Fei, and Qing Zhou. "Learning sparse causal Gaussian networks with experimental intervention: regularization and coordinate descent." Journal of the American Statistical Association 108.501 (2013): 288-300
[2] Foygel, Rina, and Mathias Drton. "Extended Bayesian information criteria for Gaussian graphical models." Advances in neural information processing systems. 2010.
[3] Fan, Yingying, and Cheng Yong Tang. "Tuning parameter selection in high dimensional penalized likelihood." Journal of the Royal Statistical Society: Series B (Statistical Methodology) 75.3 (2013): 531-552.



When PGM meets Deep Learning

As we learnt in the class, PGM and deep learning each has their own sets of techniques and advantages. PGM provides an intuitive interface between researchers and the data to be modeled, while deep learning offers a hierarchical structure that can capture more complex structures of data. What if we can integrate the advantages of these two models. Since the VAE’s success of introducing probabilistic view and variational inference into deep learning, many works in this style have been done and there are more to come.

Contact Person Haohan Wang

References:

[1] Kingma, Diederik P., and Max Welling. "Auto-encoding variational bayes." arXiv preprint arXiv:1312.6114 (2013).
[2] Ghosh, Arnab, Viveka Kulharia, and Vinay Namboodiri. "Message Passing Multi-Agent GANs." arXiv preprint arXiv:1612.01294 (2016).
[3] https://openreview.net/pdf?id=B1ckMDqlg



Confounding Correction and Causal Inference

Recently, there is a trend for causal inference models (interpretable models) in the machine learning society due to the massive needs of machine learning for different application areas like biology, medical, economics, education, and psychology etc. Researchers would like to understand what is happening inside the model to interpret the world, and customers of a consulting company would also want to know what makes a machine learning algorithm to draw a certain conclusion from big data before making a payment. These needs force the development of causal inference models. Confounding correction is one of the central tasks for causal inference, and has only been explored minorly. For example, most causal inference models are still in a primitive stage to deal with single response variables, leaving massive chances open. In addition, most of these models are still in a linear form with chances for neural network models to enter.

Contact Person Haohan Wang



High Dimensional Statistics, the Challenges and Solutions for Causal Inference

Believe it or not, high dimensional statistics have not been developed as much as one would expect. People are using Lasso-type regularizers onto high-dimensional data without corresponding theories appropriately proved in high dimension case. This may not raise big problems on general prediction task. However, for knowledge discovery task, like causal inference, the undesirable behavior of sparsity regularizers may lead to significant worse performance than expected. This project encourages the students to look into these problems, like:

  • How to make Lasso select variables in more stably?
  • How to calculate p-values when a sparsity regularizer is introduced

Contact Person Haohan Wang



Implementing Classical Genomic Research Models and Algorithms with Petuum

Genomic research usually deals with human-scale genome data, when the capacity of a computing resource becomes an issue. To help overcome this issue, one can help develope some of these tools with Petuum. This may sound boring, but a well-done project of this type, may easily get you a job in Petuum or other companies, and may lead to a publication in Nature.

Contact Person Haohan Wang