If you are like me, who have experienced a hard transition from statistical learning straight towards deep learning, you know how confused I was. Unlike before, when you carefully designed and selected some features from your data, you now visit Webhallen for the latest GPU (thanks to Bitcoin you may need a deeper pocket), format your dataset a bit, stack some layers and let your electric bills do all the work. You then meet your friends, who’ve been thinking long and hard about how amazingly his NonNegative Matrix Factorization (NMF) has done the job, and tell them that you surpassed their Kaggle rankings. “What’s your superpowers again?”, asked your friends. Unfortunately you are no Bruce Wayne and your GPU will not tell you what it did. You tell them today is your anniversary and you leave straight for the library. You realize that you need to know how it works under the hood. Further Speaking, your investors need to trust you on their investments. This brings us to this week’s topic: Interpretable AI/ML. Interpretable AI/ML has been gaining a lot of attention recently, and I will cover this topic in this and the next posts. In this post I will talk about the main concepts and practices in general machine learning, and the next post I will cover recent practices that address interpretability especially in Deep Learning.
What is Interpretable AI/ML?
There is no universal definition of Interpretable AI/ML. If I may, I would like to define it as the ability to understand/interpret what, how, and why machine learning models make such decisions. To be more specific, the ‘what’ answers what is the output of the machine learning system, whether our model is predicting the likelihood of each label, or an embedding which can approximate the similarity between our inputs. The ‘how’ tells us how the models come up with the predictions, i.e., the mathematical intuition behind the network choice and optimization method. The devil lies in why, why the model makes such predictions, which part of the input that the model is focusing on, etc. To answer those questions, let’s first take a deeper dive into the taxonomy of interpretable AI/ML.
Taxonomy of Interpretable AI/ML
AI/ML models interpretability can be classified into Intrinsic or post hoc. Intrinsic interpretability refers to models that are considered interpretable due to their simple structure, for example, rulebased models, linear regression and decision trees etc. Post hoc interpretability refers to the application of interpretation methods after the models are trained. It can also be classified into modelspecific or modelagnostic methods. Modelspecific methods are only limited to a certain set of models, for example, visualization of CNNs, whereas modelagnostic methods can be applied to any AI/ML model and are mostly post hoc. Researchers also classify the interpretation methods into local or global interpretations, depending on whether the interpretation is made for an individual prediction or . In the following section, we will focus on posthoc, modelagnostic interpretation approaches, to ‘generalize’ well on different model architectures. In the next post, we will dive in interpretability in different types of deep neural networks.
Examples of Interpretable Methods

Partial Dependence Plot
The Partial Dependence Plot (PDP) studies the marginal effect of features on the predictions, by averaging the model output over some feature distribution across all training data. Usually only one of two features are examined together. This is usually estimated with the Monte Carlo method. Note that the underlying assumption here is that features should be independent with each other. For tabular data it is fairly straightforward, we can either sample or make our own hypothesis (when the training set is small) over the feature distribution and averaging the model output. Other types of data are less addressed with PDP, but I personally think the intuition can be extended quite easily. For example, the recent work ‘Order Word Matters Pretraining for Little’ studies the effect of word order in masked language model pretraining. Here the training corpus are randomly shuffled but the local distribution information is preserved. The output can either be the sentence embedding, or performance measure on the downstream tasks. Do correct me if I’m misleading!

Global Surrogate Models
The global surrogate model is an intrinsically interpretable model that is trained to approximate the predictions of the original model. Our attempt to interpret the original problem now transfers to the surrogate model. Note that in this case the surrogate model must be understandable. Shortly speaking, you train your surrogate model on the original model outputs and use metrics like Rsquared to measure how well the approximation is. For example, we can train a decision tree model to surrogate a SVM. However, I did not find much theoretical proof why surrogate models should work, and I can’t say that I’m bought with the idea of using a simpler AI to model a complex problem, and drawing conclusions based on that? Boom! Let me know what you think.

SHAP
SHAP (SHapley Additive exPlanations) is another modelagnostic interpretability approach that is based on conditional game theory. It was inspired from the Shapley value, which assigns each player different payout based on their contribution in a cooperative game. In machine learning, features act like different players, working cooperatively to make a prediction. Unlike previous mentioned methods, SHAP focuses on the feature contribution to a single prediction. For example, would a user still be interested in an item, if the color of the item changes. It is calculated by considering all permutations of features, computing the difference of the predictions with and without a feature, and taking the average. In short, it measures the contribution of features in pushing the prediction away from the expected value.
Some Last Words
There are many other general interpretability methods, such as the LIME family, Adversarial Examples or even generative, optimization based approach. This post is only an introduction so that firsttime readers can get familiar with the concepts in interpretable AI/ML. I recommend this book by Christoph Molnar if you want to take a deeper dive into the field. Next post we will cover specific interpretability methods designed for deep neural networks. Looking forward to seeing you there! As always, if you would like to get in touch with us, you can send me an email, or drop us a message on facebook. We would love to hear your amazing ideas!