8019 GHC,
5000 Forbes Avenue,
Pittsburgh, PA 15213
I’m a Ph.D. student in the Machine Learning Department at CMU, advised by Eric Xing.
I’ve also spent time at OpenAI (2017) and Google Research (2018/20).
My work is generously supported by the CMLH Fellowship (2018/19) and Google PhD Fellowship (2019/21).
Research Interests:
Probabilistic modeling, deep learning, and massively multi-task learning, with a focus on computational frameworks for adaptation, interpretability, and personalization of statistical models learned from data.
Previously: [a more formal bio]
I hold M.Sc. in Computer Science from KAUST where I worked with Khaled Salama and Gert Cauwenberghs on neuromorphic approaches to machine learning.
Before that I studied Physics at Lomonosov Moscow State University and Data Analysis at Yandex School of Data Analysis.
Professional:
I’m a co-organizer of the Adaptive & Multitask Learning Workshop, a founding editor of the ML@CMU Blog, and a regular PC/reviewer for: ICLR, ICML, JMLR, NeurIPS, UAI, AAAI, IJCAI, AISTATS, and various ML/AI workshops.
Besides my professional activities, I am a member of Carnegie Marathon Club, love doing sports and hiking in beautiful places.
Also, I happened to grow up in Moscow and do speak Russian.
Sep 9, 2019 |
Honored to be part of the 2019 class of Google PhD Fellows in Machine Learning.
Huge thank you to all my mentors, colleagues, and collaborators!
And thank you, Google!
|
May 15, 2019 |
It was a lot of work (and fun!) to help teach PGM 2019 class this past Spring.
Check out an excellent set of lecture notes written by students in distill-like style.
Recordings of all lectures are now available on YouTube.
|
Apr 5, 2019 |
How do we make zero-shot NMT consistent?
Our NAACL 2019 paper on Consistency by Agreement shows how to do that!
Joint work with Ankur Parikh at Google NYC last year.
Update (more resources): arXiv, NAACL19 slides, AI Science Seminar virtual talk.
|
Mar 29, 2019 |
Excited to be co-organizing a workshop on Adaptive & Multitask Learning this year at ICML.
Please consider submitting your latest work!
|
Jan 25, 2019 |
Grateful to be awarded $12,000 in Cloud Credits for Research from AWS.
Time to burn some compute!
|
(recent) selected papers [full list]
-
Federated Learning via Posterior Averaging:
A New Perspective and Practical Algorithms
arXiv preprint (in submission),
2020
Federated learning is typically approached as an optimization problem, where the goal is to minimize a global loss function by distributing computation across client devices that possess local data and specify different parts of the global objective. We present an alternative perspective and formulate federated learning as a posterior inference problem, where the goal is to infer a global posterior distribution by having client devices each infer the posterior of their local data. While exact inference is often intractable, this perspective provides a principled way to search for global optima in federated settings. Further, starting with the analysis of federated quadratic objectives, we develop a computation- and communication-efficient approximate posterior inference algorithm – federated posterior averaging (FedPA). Our algorithm uses MCMC for approximate inference of local posteriors on the clients and efficiently communicates their statistics to the server, where the latter uses them to refine a global estimate of the posterior mode. Finally, we show that FedPA generalizes federated averaging (FedAvg), can similarly benefit from adaptive optimizers, and yields state-of-the-art results on four realistic and challenging benchmarks, converging faster, to better optima.
-
Contextual Explanation Networks
Journal of Machine Learning Research (JMLR),
2020
Modern learning algorithms excel at producing accurate but complex models of the data. However, deploying such models in the real-world requires extra care: we must ensure their reliability, robustness, and absence of undesired biases. This motivates the development of models that are equally accurate but can be also easily inspected and assessed beyond their predictive performance. To this end, we introduce contextual explanation networks (CENs)—a class of architectures that learn to predict by generating and utilizing intermediate, simplified probabilistic models. Specifically, CENs generate parameters for intermediate graphical models which are further used for prediction and play the role of explanations. Contrary to the existing post-hoc model-explanation tools, CENs learn to predict and to explain jointly. Our approach offers two major advantages: (i) for each prediction, valid, instance-specific explanations are generated with no computational overhead and (ii) prediction via explanation acts as a regularizer and boosts performance in low-resource settings. We analyze the proposed framework theoretically and experimentally. Our results on image and text classification and survival analysis tasks demonstrate that CENs are not only competitive with the state-of-the-art methods but also offer additional insights behind each prediction, that are valuable for decision support. We also show that while post-hoc methods may produce misleading explanations in certain cases, CENs are always consistent and allow to detect such cases systematically.
-
Consistency by Agreement in Zero-shot Neural Machine Translation
In Annual Conference of the North American Chapter of the Association for Computational Linguistics (NAACL),
2019
Generalization and reliability of multilingual translation often highly depend on the amount of available parallel data for each language pair of interest. In this paper, we focus on zero-shot generalization—a challenging setup that tests models on translation directions they have not been optimized for at training time. To solve the problem, we (i) reformulate multilingual translation as probabilistic inference, (ii) define the notion of zero-shot consistency and show why standard training often results in models unsuitable for zero-shot tasks, and (iii) introduce a consistent agreement-based training method that encourages the model to produce equivalent translations of parallel sentences in auxiliary languages. We test our multilingual NMT models on multiple public zero-shot translation benchmarks (IWSLT17, UN corpus, Europarl) and show that agreement-based learning often results in 2-3 BLEU zero-shot improvement over strong baselines without any loss in performance on supervised translation directions.
-
Continuous Adaptation via Meta-Learning in Nonstationary and Competitive Environments
Al-Shedivat, M.,
Bansal, T.,
Burda, Y.,
Sutskever, I.,
Mordatch, I.,
Abbeel, P.
In International Conference on Learning Representations (ICLR),
2018
Ability to continuously learn and adapt from limited experience in nonstationary environments is an important milestone on the path towards general intelligence. In this paper, we cast the problem of continuous adaptation into the learning-to-learn framework. We develop a simple gradient-based meta-learning algorithm suitable for adaptation in dynamically changing and adversarial scenarios. Additionally, we design a new multi-agent competitive environment, RoboSumo, and define iterated adaptation games for testing various aspects of continuous adaptation strategies. We demonstrate that meta-learning enables significantly more efficient adaptation than reactive baselines in the few-shot regime. Our experiments with a population of agents that learn and compete suggest that meta-learners are the fittest.