Machine Learning Thesis Proposal
- Gates Hillman Centers
- ADAMS WEI YU
- Ph.D. Student
- Machine Learning Department
- Carnegie Mellon University
Towards Effective and Efficient Learning at Scale: Models and Algorithms
How to enable efficient and effective learning has always been a key problem in AI research nowadays. In this thesis, we will approach it from two perspectives: models and algorithms.
Firstly, we propose a model, LSTM-Jump, that can skip unimportant information in text, mimicking the skimming behavior of human reading. Trained with an efficient reinforcement learning algorithm, this model can be several times faster than a vanilla LSTM in inference time.
Then we introduce a text encoding model that discards recurrent networks, which thus fully supports parallel training and inference. Based on this technique, a new question-answering model, QANet, is proposed. Combined with data augmentation approach via back-translation, this model stays in the No.1 place in the competitive Stanford Question and Answer Dataset (SQuAD) from March to Aug 2018, while being times faster than the prevalent models.
To enable fast training of neural networks, we propose a general gradient normalization algorithm for efficient deep networks training. This method can not only alleviate the gradient vanishing problem, but also regularize the model to achieve better generalization.
When the amount of training data becomes huge, we need appeal to distributed computation. We propose a few algorithms to attack the challenges posed by this setting. We first show that a delay-adaptive distributed stochastic gradient descent algorithm can achieve a sublinear convergence rate. Then by further leverage the structural information of the problem, we propose a doubly stochastic primal-dual method, which has the potential to achieve faster convergence.
Jaime Carbonell (Co-Chair)
Alexander Smola (Co-Chair) (Amazon)
Quoc Le (Google)
Christopher Manning (Stanford University)
Copy of Draft Proposal Document