Lost cause: Getting at causation in our datasets

Hugh Miller

We live in a world where we often have incredibly good data, but limited ability to use it to directly answer the questions that we most care about. Can we find the impact of legal representation if more severe claims are more likely to be represented? What is the impact of an operational change if it is only applied to a certain subset of the population? What fraction of an improvement in outcomes is attributable to management versus external factors? Answering such questions requires a deeper understanding of the data than straight descriptive statistics.

Traditional predictive modelling will assign effects, but many of these will be correlative rather than causative. However, significant progress has been made in types of causal modelling which aim to get at actual effects. While formal experiments such as randomised controlled trials remain the gold standard for many applications, other types of causal estimates can be made from quasi-experimental designs such as regression discontinuity, instrumental variable, comparative interrupted time series or propensity scoring. This paper introduces quasi-experimental designs and provides some examples of how we have applied them in actuarial contexts. Examples will draw on quasi-experimental evidence in injury schemes, welfare policy and housing.

Being able to estimate the actual impacts of underlying factors significantly improves actuaries’ ability to provide quality advice.