I'm Klas and I'm a PhD student in the Accountable Systems Lab at Carnegie Mellon University, advised by Matt Fredrikson. My area of research is AI, and particularly, my research concentrates on understanding and demystifying deep learning. I work to improve the generality, transparency, and interpretability of deep neural networks, with a focus on applications in computer vision and machine learning security.

GHC 7004 | kleino cs. cmu. edu | klasleino

Research Interests

My area of research is AI, and particularly, my research concentrates on understanding and demystifying deep learning. I work to improve the generality, transparency, and interpretability of deep neural networks, with a focus on applications in computer vision and machine learning security. Currently, I am most interested in topics with three major themes; namely, explaining black-box neural network behavior, creating a theory of network generalization, and designing generative models for high-level feature interpretation and counterfactual reasoning.

Explaining Black-box Neural Network Behavior

In the recent years, deep neural networks have become increasingly powerful at tasks previously only humans had mastered. Deep learning has become widely used, and while it has many practitioners, its inner workings are far from well-understood. As the application of ML has increased, so has the need for algorithmic transparency, the ability to understand why algorithms deployed in the real world make the decisions they do. Much of my work has addressed the problem of determining which aspects of the network influence particular decisions, in addition to interpreting the influential components. Influence can be used to increase model trust, to uncover insights discovered by ML models, and as a building block for debugging arbitrary network behavior.

A Theory of Network Generalization

Despite having the capacity to significantly overfit, or moreover, memorize the training data, deep neural networks demonstrate an ability to generalize reasonably well in practice. Present hypotheses have failed to explain why this is the case. In fact, it is not well understood how exactly overfitting is manifested in a model. One aspect of my work tries to understand what phenomena give rise to misclassifications, overfitting, and bias in DNNs. Understanding the causes for these problems will also shed light on what leads models to generalize; and may suggest ways of improving generalization. I develop explanations for these problems that have direct applications to membership inference, misclassification prediction, and bias amplification.

Generative Models

It is often difficult to interpret the high-level concepts learned by neural networks. I am interested in the use of generative models to explore the semantic space of deep networks. This would enable interpreting features that cannot be easily understood via their existence in a single instance. Furthermore, it may allow automated, interpretable counterfactual reasoning for DNNs.

Papers

Influence-directed Explanations for Convolutional Neural Networks

We study the problem of explaining a rich class of behavioral properties of deep neural networks. Distinctively, our influence-directed explanations approach this problem by peering inside the network to identify neurons with high influence on a quantity and distribution of interest, using an axiomatically-justified influence measure, and then providing an interpretation for the concepts these neurons represent. We evaluate our approach by demonstrating a number of its unique capabilities on convolutional neural networks trained on ImageNet. Our evaluation demonstrates that influence-directed explanations (1) identify influential concepts that generalize across instances, (2) can be used to extract the "essence" of what the network learned about a class, and (3) isolate individual features the network uses to make decisions and distinguish related classes.

  • Carnegie Mellon University

  • School of Computer Science

  • Computer Science Department

  • Accountable Systems Lab