Title: Providing Case-based Explanations for Non-Case-based Methods
         Such as Artificial Neural Nets or Complex Decision Trees

Speaker: Rich Caruana

Abstract:

When we use machine learning in medicine, we often run into a nasty
problem: machine learning outperforms standard statistical methods
such as logistic regression, but doctors are afraid to use it in
clinical practice because they can't understand the machine learning
models.  This unwillingness to trust opaque models is justified.  But
why bother to use machine learning in medical applications if the
resulting models are never going to get used!?

There's no sign that machine learning is going to start yielding more
intelligible models soon.  The opposite seems to be true.  Methods
such as boosting, bagging, and mixtures of experts make intelligible
models opaque.  Leo Breiman conjectures that there is a Heisenberg
uncertainty-like rule for model complexity:

            MODEL_ERROR * MODEL_COMPLEXITY > K

In other words, low error models are going to be complex.

So what do we do?  In this talk I'll present a way of making machine
learning models more intelligible by showing how they can justify
their predictions in a way that doctors like.  Doctors like case-based
methods like k-nearest neighbor: medical practice emphasises rapid
case evaluation.  One reason why doctors like kNN is that it can
justify its prediction by showing them the k cases used to make the
prediction.  Unfortunately, kNN rarely performs as well as artificial
neural nets or boosted/bagged decision trees.  We present a way of
providing case-based explanations for models like neural nets and
decision trees that aren't case-based.  The trick is to use the
learned model as a distance metric for kNN explanation.  This lets
doctors feel they understand the prediction because they see other cases
that the model thinks are most similar to the test case.

If the model being explained is an artificial neural net, we generate
case-based explanations for it by finding the cases in the training
set that yield hidden unit activations most similar to the hidden unit
activations of the neural net on the test case.  This let's us find
the cases the neural net thinks are most similar to the test case.

This method is applicable to most machine learning methods.  All that
is needed is a way of treating the learned model as a distance metric.
There are subtleties about how this distance metric should and should
not be constructed, but so far these do not pose insurmountable
obstacles.  We're excited about this because it looks like we might
now be able to use some of our high performing machine learning models
clinically.


NOTE:

I'm involved in several projects that are developing new machine
learning methods for medical decision making.  We have funding to
support a graduate student during the summer who is interested in
working on one of these projects.  If you want to learn more about
this, let me know.