Language Technologies Institute Colloquium

  • Remote Access - Zoom
  • Virtual Presentation - ET (Special Date/Time)
  • Assistant Managing Director
  • Microsoft Research Asia

Bring 10x Speedup to NLP Model Training

Thanks to the adoption of deep learning technologies, great progress has been made in the field of NLP in recent years. However, in recent years, the innovation on new machine learning models for NLP has been slowing down, while more attention has been paid to the utilization of larger data to train larger models. For example, GPT-3, the SOTA pre-trained language model, contains 175-billion parameters, which costs around 2 million GPU hours and 12 million dollars for its training. Such a trend may potentially lead to a very high entry barrier in the field of NLP and prevent the majority of researchers from conducting cutting-edge research. To tackle this problem, it is crucial to invent more efficient way to train NLP models. In this talk, we will discuss how to achieve this goal, by a comprehensive exploration from the perspectives of training data, objective function, model architecture, and optimization strategy. With innovations in all these aspects, we successfully accelerated the training of BERT models by an order of magnitude. More importantly, our approach is not restricted to BERT, and has its general implications to the acceleration of many other NLP models.

Tie-Yan Liu is an assistant managing director of Microsoft Research Asia, a fellow of the IEEE, and a distinguished scientist of the ACM. He is also an adjunct faculty member of CMU LTI. His research interests include machine learning, data mining, information retrieval, and computational science. He published 200+ papers in top conferences and journals, with tens of thousands of citations. He has served as general chair, PC chair, local chair, or area chair for a dozen of top conferences including WWW/WebConf, SIGIR, KDD, ICML, NIPS, ICLR, IJCAI, AAAI, ACL, as well as associate editor of ACM Transactions on the Web and IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI). His team released LightGBM in 2017, which has become one of the most popularly used machine learning tools in Kaggle and KDD Cup; his team helped Microsoft achieve human parity in machine translation in 2018 and won 8 champions in the WMT machine translation contest in 2019; his team also built the world-best Mahjong AI, named Suphx, which achieved 10 DAN on the famous Tenhou Mahjong platform in 2019.

The LTI Colloquium is generously sponsored by Abridge.

Zoom Participation. See announcement.

For More Information, Please Contact: