Explainable AI for Improved Heart Disease Prediction

Tue, 09 Jul 2024 00:00:00 +0000

The paper “Optimized Ensemble Learning Approach with Explainable AI for Improved Heart Disease Prediction” focuses on explaining machine learning models in healthcare, similar to my original work in “Ada-WHIPS: explaining AdaBoost classification with applications in the health sciences”. The newer paper combines a novel Bayesian method to optimally tune the hyper-paremeters of ensemble models such as AdaBoost, XGBoost and Random Forest and then applies the now well established SHAP method to assign Shapley values to each feature. The authors use their method to analyse three heart disease prediction datasets, included the well-known Cleveland set used as a benchmark in many ML research papers.

SHAP (Lundberg and Lee) came hot on the heels of the revolutionary LIME method (Ribeiro, Singh and Guestrin), which together delivered a paradigm shift in the usefulness and feasibility of eXplainable Artificial Intelligence (XAI). In fact, LIME was published at exactly the time I was becoming interested in the topic of XAI and served as inspiration for my own Ph.D journey. Both methods fall into the category of Additive Feature Attribution Methods (AFAM) and work by assign a unitless value to each level of the set of input features. The main benefits of AFAM become clear when viewing a beeswarm plot of their responses across a larger dataset, such as the whole training data. Patterns emerge showing which input variables affect the response variable most strongly, and in which direction. This usage is much more sophisticated than classic variable importance plots, which lack direction and mathematical guarantees offered by SHAP.

In the clinical setting, these mathematical guarentees mean that the resulting variable sensitivity information could be used to create a broader diagnostic tool. However, while this approach can provide a general understanding of which variables drive a model’s predictions, it lacks the fine-grained, instance-specific clarity offered by perfect fidelity, decompositional methods.

On the other hand, my original method Ada-WHIPS (firmly within the decompositional methods category) enhances interpretability in clinical settings by providing direct, case-specific explanations, making it a powerful tool for clinicians needing detailed transparency for patient-specific decision-making. Given the choice of an AdaBoost model (or a Gradient Boosted Model, or a Random Forest), it makes sense to use an XAI method that is highly targeted to these decomposable ensembles. Ada-WHIPS digs deep into the internal structure of AdaBoost models, redistributing the adaptive classifier weights generated during model training (and therefore a function of the training data distribution) to extract interpretable rules at the decision node level.

One area where Ada-WHIPS could benefit from the techniques in the new paper is the use of Bayesian methods to tune hyperparameters. Their approach potentially leads to improved model accuracy, a crucial factor in high-stakes environments like healthcare and “juicing up” the model internals for greater accuracy in the generated decision nodes. However, the paper appears to omit any detail about how this approach is deployed. This omission is indeed a great pity because, from what I understood, the Bayesian parameter selection was actually the authors’ novel contribution (the use of ensembles and SHAP on these particular datasets being nothing particularly new).

In conclusion, the SHAP-based approach offers valuable insights at a macro level, the new paper boasts improvements in model accuracy through Bayesian tuning, and my Ada-WHIPS method’s per-instance clarity and actionable insights should prove practical in scenarios where clinicians require detailed explanations of specific cases. I would be delighted to see some confluence of the three ideas, so that the benefits from each can combine and reinforce the use of highly targeted explainability in clinical applications.

How Subsets of the Training Data Aﬀect a Prediction

Sun, 20 Dec 2020 00:00:00 +0000

I was quite excited by the title of a new paper, on pre-publication this month. “Explainable Artificial Intelligence: How Subsets of the Training Data Affect a Prediction” by Andreas Brandsæter and Ingrid K. Glad, at first glance, appeared to have some close alignment to my own work CHIRPS: Explaining random forest classification, published earlier this year in June. It’s generally highly desirable to connect with other researchers with which you share common ground, working contemporaneously. Often, fruitful collaborations are born.

As it turns out, the authors have taken a fairly different approach to mine. The CHIRPS method discovers a large, high precision subset of neighbours in the training data, using a minimal number of constraints, that share the same classification from the model, and returns robust statistics that proxy for precision and coverage. Brandsæter and Glad’s method is a novel approach that works with regression and time series problems, and pre-supposes that there are subsets in the data (that may or may not be adjacent) that can be set up in advance to reveal regions of influence on the final prediction of a given data point. We share a recognition of the importance of interpretability in AI and machine learning, especially in critical applications.

Tthe authors propose a methodology that uses Shapley values to measure the importance of different training data subsets in shaping model predictions. Shapley values, originating from coalitional game theory, are adapted here to quantify the contribution of each subset of training data as if each subset were a “player” influencing the outcome of the model’s prediction. This approach offers a fresh perspective by directly associating predictions with specific training data subsets, which can reveal patterns or biases that feature-based explanations might miss.

The paper delves into the theoretical framework of Shapley values in a coalitional game context and extends this to analyze subset importance. The authors describe how their methodology can pinpoint the impact of specific subsets on predictions, facilitating insights into model behavior, training data errors, and potential biases. By using subsets rather than individual data points or features, this approach is particularly well-suited to models that rely on large, high-dimensional datasets where feature importance alone may not fully capture influential patterns. This method is demonstrated to be useful in understanding how similar predictions may stem from different subsets of data, emphasizing the complex interactions within training data that influence predictions.

Through several case studies, the paper demonstrates how Shapley values for subset importance can be applied in real-world scenarios. For example, in time series data and autonomous vehicle predictions, subsets of training data based on chronological segmentation reveal how specific periods contribute to model outputs. This approach is shown to be valuable for identifying anomalies or segment-specific patterns that could affect model accuracy or introduce biases. Additionally, by explaining the squared error for predictions, the authors illustrate how this methodology can also diagnose errors in training data, which could improve overall model reliability.

The authors discuss limitations and challenges, particularly around the computational complexity of retraining models on multiple subsets to calculate Shapley values. They suggest that, while computationally intensive, this process can be optimized with parallel processing and may not need to be repeated for each new test instance. They also propose potential applications of this methodology in tailoring training data acquisition strategies, such as for cases where predictions are most critical, which can improve model performance by selectively sampling from influential subsets.

In conclusion, Brandsæter and Glad’s paper represents a significant advancement in explainable AI by emphasizing the training data’s impact on model predictions. By shifting focus to data-centric explanations, their approach highlights how subsets within the data contribute directly to individual predictions, expanding the interpretative toolkit beyond traditional feature importance. This approach aligns with my own work on CHIRPS, underscoring the notion that providing contextual information from training data strengthens model transparency and interpretability. Using training data as a reference framework enables explainable AI methods to draw on established statistical theory, which ultimately lends robustness to explanations, even in black-box models. Together, these methods suggest a promising direction for explainable AI, wherein training data subsets serve as crucial elements to understand and elucidate model behavior effectively.

Additive Feature Attribution Methods on XAI Today

Explainable AI for Improved Heart Disease Prediction

How Subsets of the Training Data Aﬀect a Prediction