Claim your CPD points
In this edition of Normal Deviance, the story of the UK algorithm to assign high school grades following exam cancellations, teaches an important lesson for everyone building models where questions of individual fairness arise.
There have been many consequences of the pandemic. While health and employment concerns are rightly prominent, education is another domain that has seen significant disruption. One recent story intersecting with modelling and analytics is the case of school grade assignment in the UK. With final year exams cancelled due to the pandemic, the Office of Qualifications and Examinations Regulation (Ofqual) was presented with the challenge of assigning student grades, including the A-level grades that determine eligibility for university entrance.
Part of the challenge is that centre-assessed grades (grades issued by schools based on internal assessment) are always optimistic overall compared to actual exam grades, so the process required choosing the best way to move grades closer to historical patterns. An algorithm was created to produce predicted grades across the whole student cohort.
However, when results were posted out there was student outrage at the perceived unfairness of people who received a lower grade than they expected. Pressure led to all governments across the UK backflipping and announcing that centre-assessed grades would be recognized instead of the algorithmic grades. While a win for many students who felt they deserved higher grades, it does raise significant further questions and represents a poke in the eye for those who stood by the robustness of the algorithmic grades.
In many ways the Ofqual algorithm for adjusting grades ticked all the right boxes:
However, with the benefit of hindsight, it was clear that effort was not enough. The main factors contributing to the government backdown:
Unsurprisingly, the final solution (adopting the centre-assessed grades) will create its own problems. Teacher 'optimism bias' is unlikely to be uniform across schools, so students with more realistic teacher grading will be relatively disadvantaged. Teacher grades may be subject to higher levels of gender or ethnic bias. The supply of university will not grow with the increased demand implied by higher grades; in some cases, this may be handled through deferrals which may have knock-on effects for availability for 2021 school finishers. And overall confidence in Ofqual has taken a substantial hit.
I think there are some important lessons here for data analytics more generally. First, models cannot achieve the impossible; in this case, it is impossible to know which students would have achieved a higher or lower mark. In a high-stakes situation, such limitations can break the implementation of a model. Second, it raises the point that something that appears 'fair' in aggregate can look very unfair at the individual level.
In situations where individual-level predictions have a significant impact, we should spend time understanding how results will look at that granular level, and who the potential 'losers' of a model are. Finally, an algorithm will often become an easy target. As we've also seen in COMPASS and robodebt coverage, a faceless decision-making tool carries a high burden of proof to establish its credibility; this requirement applies from initial model design through to results and communication. Appropriate use of modelling is something we will need to continue to strive for in our work.
#InTheNews- "England exams row timeline: was Ofqual warned of algorithm bias?" from@guardianhttps://t.co/MjCKgyYc9V#NAPCE#pastoralcare#schools#education#teachers#exams#childwelfare#studentwelfare#covid19#gcses#alevelspic.twitter.com/ruL5QUaBrs
UK ditches exam results generated by biased algorithm after student protestshttps://t.co/ZQtWT1iqJepic.twitter.com/G6RAldar59