It is well-known that training deep neural networks is a computationally intensive process. Given the proven utility of deep learning, efficiency is thus an important concern. In the thesis, we will review our previous related work on reducing the communication overhead in distributed deep learning, speeding up learning by boosting the error gradients, and how to implement neural networks efficiently on GPUs. We propose a new and simple method for layer-wise training of deep neural networks, that allows for the incremental addition of layers, such that the final architecture need not be known in advance. In conjunction, we explore a novel optimization method for non-linear regression problems, that uses error deltas instead of gradients, and which performs very well in simulations. We will investigate how this algorithm compares to gradient descent, and how it may be applied to training neural networks. Our end-goal is to make deep network training faster, simpler, and less reliant on expert knowledge.
Roger B. Dannenberg (Co-Chair)
Bhiksha Raj (Co-Chair)
Douglas Eck (Google Brain/Magenta)