Column
Data Science and AI

Why AI interpretations might be lying to you

Close up image of a mature man concentrating on a computer screen.

Claim your CPD points

In this Responsible AI column, Dr Fei Huang explains why AI interpretations might be misleading and why that matters for actuarial decision-making.

As AI systems are increasingly deployed in high-stakes decision-making, such as insurance pricing and mortgage approvals, interpretability is often presented as the solution to concerns about trust and accountability. But there is an uncomfortable question that needs to be answered. Can we actually trust AI interpretations?

Imagine being told that your insurance premium has increased by 30% at renewal or that your mortgage application has been declined. Naturally, you ask why.

The response is polite, confident and technical:

“According to the SHAP interpretation tool, this customer’s age increased the predicted premium by $20 relative to the average predicted premium over the chosen reference population.”

But is this interpretation meaningful to a customer? Can it inform risk-reduction behaviour or help someone understand the pricing decisions? And most importantly - should we trust it?

Long before modern machine learning, banks and insurers relied on statistical models that were largely invisible to customers. Yet there was a crucial difference. Traditional statistical models typically had simple, structured forms – often additive or multiplicative. Even if customers never saw the equations, an actuary or credit officer could plausibly explain how a single factor, such as claims history, driving record or age, was associated with the decision.

In the age of AI, this link between decision, interpretation and accountability has become fragile

Today’s AI interpretation tools describe how models assign importance or contributions to features, but those outputs reflect the chosen method’s assumptions and goals, and may not directly answer the questions most relevant to end users.

As emphasised by Molnar (2023), interpretability methods answer specific, method-defined questions and should not be confused with causal explanations or policy guarantees. Without understanding the underlying assumptions and limitations of these methods, practitioners can misinterpret or overclaim what the interpretations mean — creating governance and accountability risks.

As AI systems increasingly influence access to financial services, housing and insurance, this gap between what interpretation tools provide and what users expect is becoming more visible.

Can AI interpretations be manipulated to hide discrimination?

AI interpretation tools, such as partial dependence plots, SHAP or feature importance measures, are widely used to make complex models appear more transparent and explainable. Like any tool, they come with strengths, limitations and underlying assumptions. However, these assumptions are not always well understood when such tools are used to demonstrate transparency or to reassure stakeholders.

Take partial dependence plots (PDPs) as an example. PDPs are commonly used to show how a model’s predictions change with respect to a single variable, such as age. A key assumption behind PDPs is that the variable being examined can be varied independently of all other features. In real-world data, especially in areas like insurance or credit, this assumption is often violated, because many features are associated with one another.

Recent research by Xin, Hooker and Huang (2025) shows that this weakness can be deliberately exploited. The authors demonstrate that it is possible to modify a black-box model so that it behaves in two very different ways at the same time. For real individuals, the model can continue to make highly discriminatory outcomes, for example, decisions that vary systematically with a particular variable. But when the model is examined using an interpretation tool like a partial dependence plot, it can appear neutral and non-discriminatory.

The key idea is to exploit regions with very few ‘real’ observations (due to correlations between variables) that still get considered in the PDP, and tailor the model in those regions to balance out any undesirable effect.

Two side-by-side line graphs comparing age-based insurance predictions. The left graph shows actual model predictions f(x) increasing sharply with age from 0.140 to 0.160. The right graph shows the partial dependence plot interpretation remaining nearly flat around 0.142, creating a false impression of age neutrality despite discriminatory pricing in practice.

Figure 1: Partial dependence plots for a focal feature before (left) and after (right) manipulation. Although the underlying model’s predictions for real observations remain largely unchanged, the manipulated plot appears flat and non-discriminatory via manipulation. Source: Xin, Hooker, and Huang (2025).

The result is not an absence of transparency, but something more subtle and concerning - a sense of transparency that gives a false impression.

As shown in Figure 1, a model can appear interpretable, pass standard audits and satisfy formal documentation requirements, while continuing to generate systematically biased outcomes in real-world decision settings. The interpretation looks reassuring, but it no longer faithfully reflects how the model behaves on the data that actually matters.

For actuaries responsible for pricing governance or fairness reviews, this means that a model can appear compliant under standard interpretability checks while still producing discriminatory outcomes in practice.

This does not mean that interpretation tools are useless. Rather, it highlights the importance of understanding what these tools are designed to show, what assumptions they rely on, and what they cannot guarantee. Without that understanding, interpretations that look rigorous and precise may unintentionally obscure, rather than reveal, how AI systems truly behave.

For a non-technical overview of this work, watch the author's All Actuaries Summit presentation at the end of this article, which discusses the findings and their implications for practitioners and regulators.

A pathway to responsible AI interpretation

Below are several guiding principles that I recommend for developing responsible AI interpretations in practice.

Be explicit about requirements: Before choosing a model or an interpretation tool, we must first be clear about the purpose of interpretation. Is the goal internal governance, model monitoring or consumer contestability? An AI system used to improve operational efficiency may not require deep or user-facing interpretations. By contrast, an AI system that determines the price of a person’s financial security clearly does. Interpretability requirements should be defined before model development, not retrofitted afterwards.

Match models to the stakes: Not all use cases justify the same level of model complexity. In some settings, highly flexible black-box models may be appropriate. In others, simpler and structurally interpretable models are preferable, especially when their performance remains comparable to more complex alternatives. In responsible AI practice, interpretability requirements should guide model choice as part of professional judgement, rather than being treated as an add-on applied after the fact.

Sense-check interpretations where risks exist: Where interpretation tools are applied in settings with potential adversarial or strategic incentives, standardised interpretability outputs should not be treated as sufficient evidence of faithful or fair model behaviour. Practitioners should actively test whether interpretations reflect model behaviour on realistic data rather than extrapolation or averaging artefacts, for example by inspecting individual-level responses, examining selected real-world cohorts, and triangulating results across alternative interpretation methods.

Interpretation for social good: Interpretation should not be limited to justifying prices or decisions after they are made. In insurance, its deeper value lies in informing risk mitigation, helping individuals and communities understand how risks arise and how they might be reduced. This perspective reframes insurance from being solely about risk transfer to supporting better risk outcomes, where interpretations enable safer behaviour, prevention and resilience.

Understand the fundamentals: Perhaps the most responsible action we can take today is to return to the fundamentals, to genuinely understand the assumptions, mechanics and limitations of the AI methods we choose to use. As AI reshapes professional practice and the skills it demands, this foundational understanding becomes more — not less — important. Accountability in the age of AI does not come from adopting the latest tools, but from using them with judgment, clarity, and a deep appreciation of what they can — and cannot — tell us.

References

Molnar, C. (2023). Interpretable Machine Learning (2nd ed.). https://christophm.github.io/interpretable-ml-book/

Xin, X., Hooker, G., & Huang, F. (2025). Pitfalls in machine learning interpretability: Manipulating partial dependence plots to hide discrimination. Insurance: Mathematics and Economics, 103-135. https://doi.org/10.1016/j.insmatheco.2025.103135

About the authors
Dr Fei Huang
Dr Fei Huang is an Associate Professor of Risk and Actuarial Studies at UNSW Business School. Her research sits at the intersection of responsible AI, insurance, and data-driven decision-making, with a focus on fairness, sustainability, and accountability in insurance and retirement income systems. She works with industry and regulators on responsible AI in insurance, bridging research, policy and practice. For more information, visit www.feihuang.org.

Latest articles

Be informed. Stay ahead. Subscribe.

Receive industry-leading perspectives, straight to your inbox.

Two people climbing a snowy mountain