Claim your CPD points
Insurance pricing is an area that is full of potential use cases for large language models (LLMs). AI tools will change how we design, build and validate pricing processes.
One area of particular interest is the application of LLMs in feature engineering. It’s a relatively nascent space despite the low barrier to entry.
Feature engineering in its common form, is simply adding columns (features) to a dataset to further describe each row (observation). If useful, the features can be incorporated into the pricing model, improving performance. These new features can be derived from existing features, from new information pulled from some external data source or some combination of the two. LLMs give us many new ways of generating such features, both by incorporating information held within the parameters of the LLM itself and also by leveraging unstructured data from external sources.
Given the risk of LLM hallucination, it may initially sound surprising to pursue incorporating LLM outputs directly in modelling. However, the natural validation processes built into model fitting offers significant protection from low-quality outputs.
Through offering additional data and improvements in predictive power to insurance pricing, they can shift risk from data scarcity and anti-selection to governance, bias and operational.
In this section, I walk through four different ways LLMs can be used to generate new features for an insurance pricing model.
Not only could we estimate the number of bedrooms, but we could also ask:
What can an LLM see that a traditional pricing model can't? A property image like this one contains a wealth of observable risk signals — from roof condition to what's parked in the driveway.
To be clear, I am not advocating the use of Google Street View for such purposes, but conceptually it represents an important advancement in using other data types.
LLM features can be used across any form of modelling work. In insurance pricing, this usually spans customer behaviour modelling (conversion & retention models) and loss cost models (frequency, severity, burn cost). Incorporating LLMs still takes design and judgment; a natural initial set is brainstorming and consideration of the true risk factors. Take car insurance, true risk factors are things like ‘driving ability’, ‘propensity to take risks’, ‘likelihood to speed’, etc. This will help guide the prompting to ensure there are intuitive connections between new LLM features and our knowledge of risk.
So how do these ideas translate into something you’d be able to incorporate into a pricing model?
To create a simple LLM feature, you basically give the levels of a factor to an LLM and ask a pointed question, ideally with desired response levels prescribed in your prompt.
As a small example of feature engineering, see the example below, a table of car models. I asked Co-Pilot to attribute a Risk Score (Low to High), Boy Racer Likelihood (Very Low to Very High) and a Coolness score (1 – 10).
I now have in effect a mapping table for these LLM-derived features which I can use to map my modelling data with these new features. Note the ‘Medium-High’ level, which was not an option I gave to the LLM, so some of my own cleaning is likely beneficial here (or better prompting).
In most cases, you should be doing some form of mapping table creation. An API is useful where you have 1,000's of datapoints, but you should still use the result to create a static mapping table as this will be way more cost-effective than repeatedly running on new observations.
There are some cases where this is not practical. Like where there is high proportion of unseen new levels (e.g. address) in a production environment. Impacts on speed of algorithm and cost needs to be considered here and has the potential to offset any gains on pure predictiveness.
There is a high probability that the LLM is wrong, and features are 'incorrect' in the narrow sense. You can, however, validate these new features’ predictiveness using traditional statistical methods, just as you would validate incorporating any new feature into a model. If the new feature does not improve performance, then it can be deprecated from the model through standard processes.
With this in mind, you do not need to spend too much time ‘fact-checking’ the features themselves. One key thing to check is whether the LLM as abided by your prompt restrictions. (see table above as example – creating new category ‘Medium-High’ reduces the ability to model). We do recognise that even if a new feature is beneficial overall, it still has the potential to incorrectly classify an individual row, meaning individual policy prices could become more volatile.
While finding a predictive feature may prove relatively straightforward, you need to be extremely careful to avoid discrimination when taking this approach. For example, letting an LLM derive features based on someone's name or other personal information is a recipe for disaster.
Considerations of fairness are further complicated by biases inherent in an LLM. Take our example of car models again. Generating features that depend on stereotypes (e.g. perceived propensity to speed). One might be tempted to create a feature to infer the ‘type of person likely to drive such a car’ or try to capture a true risk factor like ‘propensity to speed’ or simply ‘driving ability’. It is to be expected that unhelpful stereotypes or known differences of protected classes have significantly biased any LLM we use to derive a feature.
Thorough consideration and investigation of any LLM-derived factors is imperative when determining the ethical and legal implications of using such factors in rating.
These types of advances will have important long-term effects on pricing.
The opportunities flagged in this article suggest that the unstoppable trend of increased pricing sophistication will continue. Pricing segmentation and accuracy will improve and insurance for some segments of the population will become less affordable. Calls for further insurance pricing regulation to address the affordability of higher-risk policies will likely grow louder.
The reliance on bespoke data providers will potentially reduce as internal teams, with the aid of their LLM sidekick, become empowered to access a new wealth of insights previously unavailable.
The collision of LLM outputs with fair pricing considerations and indirect discrimination will become a serious operational risk. Opaqueness of LLM derived features and complex inherent bias compound such risks.
And, I hope, GLMs have been extended a lifeline, at least for now.
From model governance to machine learning, explore actuarial perspectives on data science and artificial intelligence
Subscribe to Actuaries Digital for free and receive the latest actuarial analysis, research, and commentary direct to your inbox