Title: Providing Case-based Explanations for Non-Case-based Methods Such as Artificial Neural Nets or Complex Decision Trees Speaker: Rich Caruana Abstract: When we use machine learning in medicine, we often run into a nasty problem: machine learning outperforms standard statistical methods such as logistic regression, but doctors are afraid to use it in clinical practice because they can't understand the machine learning models. This unwillingness to trust opaque models is justified. But why bother to use machine learning in medical applications if the resulting models are never going to get used!? There's no sign that machine learning is going to start yielding more intelligible models soon. The opposite seems to be true. Methods such as boosting, bagging, and mixtures of experts make intelligible models opaque. Leo Breiman conjectures that there is a Heisenberg uncertainty-like rule for model complexity: MODEL_ERROR * MODEL_COMPLEXITY > K In other words, low error models are going to be complex. So what do we do? In this talk I'll present a way of making machine learning models more intelligible by showing how they can justify their predictions in a way that doctors like. Doctors like case-based methods like k-nearest neighbor: medical practice emphasises rapid case evaluation. One reason why doctors like kNN is that it can justify its prediction by showing them the k cases used to make the prediction. Unfortunately, kNN rarely performs as well as artificial neural nets or boosted/bagged decision trees. We present a way of providing case-based explanations for models like neural nets and decision trees that aren't case-based. The trick is to use the learned model as a distance metric for kNN explanation. This lets doctors feel they understand the prediction because they see other cases that the model thinks are most similar to the test case. If the model being explained is an artificial neural net, we generate case-based explanations for it by finding the cases in the training set that yield hidden unit activations most similar to the hidden unit activations of the neural net on the test case. This let's us find the cases the neural net thinks are most similar to the test case. This method is applicable to most machine learning methods. All that is needed is a way of treating the learned model as a distance metric. There are subtleties about how this distance metric should and should not be constructed, but so far these do not pose insurmountable obstacles. We're excited about this because it looks like we might now be able to use some of our high performing machine learning models clinically. NOTE: I'm involved in several projects that are developing new machine learning methods for medical decision making. We have funding to support a graduate student during the summer who is interested in working on one of these projects. If you want to learn more about this, let me know.