Understanding Local Interpretable Model-Agnostic Explanations

The concept of machine learning and artificial intelligence has grown exponentially over the years due to the increasing interest by technologists, entrepreneurs, and academic-researchers. Techniques such as supervised and unsupervised learning, deep learning, and dynamic programming have enabled advancements in robotics, automated decision-making systems, and vision systems. While these technologies have experienced tremendous success, it remains difficult for users to understand the rationale behind decisions that are made by machine learning models.

This has motivated the development of explainable artificial intelligence or XAI, where the goal is to interpret and explain the behavior of machine learning models to the end user. One approach to XAI is through Local Interpretable Model-Agnostic Explanations (LIME) — a family of explainable algorithms that interpret the behavior of any class of machine learning models by approximating it locally with simpler interpretable models.

The method of LIME allows users to explain complex models as a set of linear decision rules, which enables interpretability of non-linear models such as neural networks or ensembles of random forests. In addition, using LIME makes it possible to explain singular inputs, as well as entire data points. LIME can also be used to identify specific feature inputs that are contributing most to the predictions, as well as analyse their relative importance.

In terms of implementation, let's consider a supervised learning use case with a machine learning model that operates on a single input tensor. To understand this use case, let's assume that we have a single classification model — a simple Sequential Neural Network (SNN) — that takes in a 3-dimensional input tensor and produces a single output prediction. To explain the model predictions, we first have to train the model. After training, a perturbation algorithm is used to generate an approximation of the trained neural network model by creating an interpretable model. The perturbation algorithm is responsible for generating variants of the input tensor and those are evaluated by SNN, which produces the model predictions.

We then convert these predictions into a similarity matrix that measures the similarity of the model’s output prediction for all the perturbations. This matrix, along with the model predictions, are used to create another interpretable model, which is used as the explanation model. By comparing the two models, we are able to identify which perturbations of the input tensor were most important in determining the model’s decision.

Once we have identified the important features, we can explain the model’s predictions in natural language. For instance, let’s say that the model predictions for the input tensor was “this input belongs to the fruit class”. After applying LIME, we would get an explanation such as “the model predicted that this input belongs to the fruit class because its color is red, shape was round and texture was rough”.

In addition to creating explanations in natural language, LIME can be used to debug and improve the performance of your model. By examining the important features of your predictions, you can identify biased data points and work on balancing your dataset. You can also use LIME to identify hard to explain points and use them as training cases for further retraining and improving the model.

Overall, using LIME makes it easier to audit and understand predictions made by machine learning models and even use them to gain insights into model performance. It is a powerful technique for gaining interpretability of model predictions and for understanding and improving data pipelines.