The Essence of Explainable AI: Interpretability

Ciarán Daly

June 4, 2019

6 Min Read

by Jelani Harper

SAN FRANCISCO - Applications of Artificial Intelligence, machine learning, and deep learning are relatively useless without a lucid understanding of how the outputs of their predictive models are derived. Explainable AI hinges on explainability—a clear verbalizing of how the various weights and measures of machine learning models generate their outputs.

Those explanations, in turn, are determined by interpretability: the statistical or mathematical understanding of the numerical outputs of decisions made by predictive models.

Interpretability is foundational to unraveling some of the more consistent issues plaguing AI today. It’s not only necessary for transparent models, but also fair models bereft of bias. Facilitating interpretability—and using it as the impetus for refining machine learning models and the data on which they’re trained—is indispensable for overcoming the threat of biased models once and for all.

“It usually ends up the problem is in your data, not in your models,” revealed Ilknur Kabul, SAS Senior Manager of AI and Machine Learning Research and Development. “Interpretability is a good diagnostic mechanism to show you what you missed.”

Uncovering bias in training data, not models

Interpretability illustrates an immutable fact of machine learning: there are no inherently biased models. Biased models are only the results of biased training data, or rather training data that doesn’t detail all the aspects of a particular model’s use case. It doesn’t matter whether models are non-linear, complicated deep neural networks, or linear, straightforward approaches like decision trees.

“People are scared of black box models but usually, is the problem in the black box model itself, or is it in the training data?” Kabul asked. “What all these [interpretability] techniques are finding out in the end is you missed [something] in your dataset.” Interpretability methods can pinpoint which specific model features contributed to the approval or denial of loans, or why different groups are treated inequitably by models. In most cases, interpretability identifies shortages in training data where modelers “didn’t get data from a specific distribution,” Kabul explained.

Related: The question of intellectual property and AI

Model specific interpretability

Although there are model agnostic approaches to interpretability, the exactitude of model specific approaches is remarkable—especially for black box, deep neural networks. One particularly effective model specific interpretability technique involves a visual means for “going into the layers of deep learning to see what it learned,” Kabul said. This capability is so significant because one of the difficulties of deploying deep neural networks is the traditional inability to open them and view how they function.

However, contemporary developments in this space involve “generating a topological representation of a neural network layer: a graph, basically,” Kabul divulged. “We colored those graphs and…found out in these certain regions, the models aren’t good for those regions.” Because of the training data used for the model and how it functions, in these instances the prediction accuracy and learning capacity is less in the hypothetical layers Kabul referenced than they are in others—compromising model performance.

Data profiling

In such situations, data profiling denotes exactly which aspects of the data and the model are devaluing its performance. Data profiling is a term indicating that modelers “look at different statistics about features that we used,” Kabul said. “We look at the statistics of how they are related. Like, how the model learned for certain features, and how the predictions are distributed for those.” It’s critical to profile the data after using the model specific visual interpretability method for deep neural networks, because it’s only necessary for profiling “not the whole data, but for the ones that are affecting those regions” of certain model results, Kabul mentioned.

The implications of data profiling can profoundly affect the overall interpretability of deep neural networks while substantially decreasing their propensity for bias. In cases where these techniques indicate “for certain groups our model didn’t learn well, we separate our data into two groups and train different models for it,” Kabul remarked. Each model is focused on the features of the respective groups to induce fairness for them.

Related: Google employees are in open revolt over AI, harassment, and transparency - how did we get here?

Model agnostic interpretability

Still, some of the most cogent interpretability measures are model agnostic. Typically, data scientists benefit from deploying multiple interpretability techniques to maximize understanding of how models work, then use that knowledge to remove bias. Some of the more viable interpretability mechanisms include:

  • The SHAP Method: SHapley values identify which features determined the results of a model’s decision, and whether or not their impact was positive or negative. They reveal this information at the local level; for instance, SHapley values identify which factors (such as income, length of employment, etc.) determined whether or not a specific person received a loan. Kabul mentioned the SHAP method is “computationally expensive”. However, there are ongoing developments to reduce the time required to compute SHapley values “from millions of minutes to just minutes” Kabul said.

  • Partial Dependence Plots: Unlike the SHAP method, PDP delivers insight into how machine learning models function in general, as opposed to at the local or individual level. This graphic means of facilitating interpretability shows “what your model learned, what are the important features that drove [the model’s] decisions and affected it for the whole model and with the whole data,” Kabul said.

  • Local Interpretable Model Agnostic Explanations: Known as LIME, this technique is a form of surrogate modeling in which modelers are able to simplify the mechanisms of complex machine learning models with more straightforward ones, producing clearer interpretability. As its name implies, LIME operates at the local level for individual model decisions.

  • Individual Conditional Expectations: Also called ICE, this interpretability approach relies on a graphic means to explain how models function at the local level for “individual decisions,” Kabul stated. “You look at for each individual decision how they affected it, because different groups may be treated differently. The predictions may be different for different groups, so you may find out this information for different segments.”

Related: AI bias isn't a data issue - it's a diversity issue

Eradicating bias

Whether modelers deploy model specific or model agnostic methods, those that function at the local or at the overall level, these various interpretability approaches are useful for finding bias based on “what you missed in your data,” Kabul said.

Furthermore, they provide this benefit at a mathematical or statistical level applicable to individuals, groups of people, and machine learning models as a whole. Facilitating interpretability and actively using this information as a means to remove bias not only makes AI more fair, but also more dependable for organizations and society as a whole.

The aforementioned techniques make these technologies much more socially responsible—and, by extension, acceptable. However, as Kabul cautioned, “One is not enough to get a really good diagnosis. You need to use all of it and look at it altogether.”  

Jelani Harper is an editorial consultant servicing the information technology market, specializing in data-driven applications focused on semantic technologies, data governance and analytics. 

Keep up with the ever-evolving AI landscape
Unlock exclusive AI content by subscribing to our newsletter!!

You May Also Like