Why did you say that? Explaining neural models of vision and language
Recent advances in neural models of Vision & Language have produced exciting new results in image and video captioning, visual question answering and other related tasks. For example, our previous work has shown that sequence-to-sequence networks can generate fluent captions for input video clips that describe the main activity happening in the video. However, our understanding of how a neural network arrives at a particular caption or question answer is very limited. Despite their excellent performance, neural nets are inherently much less explainable than some of the previously popular machine learning models for these tasks. In my talk, I will first present some of the neural models we have developed in my lab for Vision & Language problems and then discuss our recent work that tries to "look under the hood" of these networks and explain their decisions.
Kate Saenko is an Associate Professor of Computer Science at Boston University, director of the Computer Vision and Learning Group and co-director of the AI Research initiative at BU. Her past academic positions include: Assistant Professor at the Computer Science Department at UMass Lowell, Postdoctoral Researcher at ICSI, Visiting Scholar at UC Berkeley EECS and a Visiting Postdoctoral Fellow in the School of Engineering and Applied Science at Harvard University. Her research interests are in the broad area of Artificial Intelligence with a focus on Adaptive Machine Learning, Learning for Vision and Language Understanding, and Deep Learning.