News

Watchdog fears public bodies will avoid statistical models following exam grading backlash

exam

Public bodies may be “less willing” to use statistical models to support decisions in future after the system to award exam grades last year failed to “command public confidence”, the statistics watchdog has warned.

A review by the Office for Statistics Regulation, the regulatory arm of the UK Statistics Authority, found the grading system prompted “widespread public dissatisfaction”, and that “limitations” of the statistical models used were not fully communicated.

Although it found qualification regulators and exam boards “worked with integrity” to develop the system, the watchdog warned that “guidance and support” from government should be improved.

Exams were replaced by a system of centre-assessment grades standardised by computer algorithm last year, but the approach was abandoned after a fierce backlash over the downgrading of results, with pupils instead issued with the grades provided by their schools.

This week, in a report on lessons learned from the fiasco, the OSR said it feared public bodies will now be “less willing to use statistical models to support decisions in the future for fear of a public acceptability backlash, potentially hindering innovation and development of statistics and reducing the public good they can deliver”.

The watchdog said this was illustrated by the emphasis placed on an algorithm-free approach in 2021, with education secretary Gavin Williamson promising in January to “put our trust in teachers rather than algorithms”.

Regulators and boards ‘worked with integrity’

In the wake of last year’s fiasco, ministers were widely reported to have attempted to lay the blame with exams regulator Ofqual. Chief regulator Sally Collier resigned shortly afterwards.

The prime minister Boris Johnson also sought to blame the “mutant algorithm” for the problems when he addressed students in the summer.

exam
Johnson blamed the mutant algorithm for the problems last year

But the OSR found that teams in regulators and exam boards in all four UK nations “worked with integrity to try to develop the best method in the time available to them”.

“In each country there were aspects of the model development that were done well, and aspects where a different choice may have led to a different outcome.

“However, none of the models were able to command public confidence and there was widespread public dissatisfaction of how the grades had been calculated and the impact on students’ lives.”

The OSR’s main conclusion is that achieving public confidence in statistical models is “not just about the technical design of the model – taking the right decisions and actions with regards transparency, communication and understanding public acceptability throughout the end to end process is just as important”.

It also concluded that guidance and support for public bodies developing models “should be improved”.

Government has a “central role to play in ensuring that models developed by public bodies command public confidence”, the OSR said.

This “includes directing the development of guidance and support, ensuring that the rights of individuals are fully recognised and that accountabilities are clear”.

‘Limitations’ not fully communicated

The OSR said regulators and exam boards faced “numerous challenges” in developing the system last year, which meant it was “always going to be difficult for a statistical algorithm to command public confidence”.

However, the “limitations of statistical models, and uncertainty in the results of them, were not fully communicated”.

“More public discussion of these limitations and the mechanisms being used to overcome them, such as the appeals process, may have helped to support public confidence in the results.”

And while regulators undertook activities to communicate information about the models to those affected by them and published technical documentation on results day, full details around the methodology to be used “were not published in advance”.

“This was due [to] a variety of reasons, including short timescales for model development, a desire not to cause anxiety amongst students and concerns of the impact on the centre assessed grades had the information been released sooner.

“The need to communicate about the model, whilst also developing it, inevitably made transparency difficult.”

‘Limited professional statistical consensus’

Although regulators drew on expertise in the qualifications and education sector, there was “limited professional statistical consensus on the proposed method”.

The methods were “not exposed to the widest possible audience of analytical and subject matter experts, though we acknowledge that time constraints were a limiting factor in this case”.

There was also “limited public discussion ahead of the release of results about the likely historical patterns in the underlying data and how they might impact on the results from the model”.

Regulators carried out equality impact analyses, which were “based on the premise that attainment gaps should not widen, and their analyses showed that gaps did not in fact widen”.

Despite this analytical assurance, there was a “perception when results were released that students in lower socio-economic groups were disadvantaged by the way grades were awarded”.

“In our view, this perception was a key cause of the public dissatisfaction.”

‘Key lessons’ for government

The OSR said there were “key lessons to be learned for government and public bodies looking to develop statistical models to support decisions”.

It said that for statistical models used to support decisions in the public sector to command confidence, the bodies developing them need guidance and support “to be available, accessible and coherent”.

“Our review has found that there is a fast-emerging community that can provide support and guidance in statistical models, algorithms, AI and machine learning.

“However, it is not always clear what is relevant and where public bodies can turn for support – the landscape is confusing, particularly for those new to model development and implementation.”

Ofqual said it welcomed the OSR’s work to “build public confidence in statistical approaches”, and said the report “recognises the challenging task Ofqual – and our counterparts in Wales, Northern Ireland and Scotland – faced in awarding grades in the absence of exams last summer”.

“We have learned lessons from last summer. We continue to work with other government departments to make data available for wider scrutiny and we recently set out, jointly with the DfE, our approach to awarding grades in 2021, after our largest-ever public consultation.”

The DfE was approached for comment.

Your thoughts

Leave a Reply to Huy Duong Cancel reply

Your email address will not be published. Required fields are marked *

One comment

  1. Huy Duong

    It was obvious that it was not going to work. I repeatedly tried to communicate with Ofqual, including stating,

    “Data from Matthew Arnold School in Oxford (MAS) suggests that for A-levels the Exceptional Arrangements as published so far has virtually no chance of providing grades to the students in a way that satisfies the double criteria of being fair to the individuals and controlling grade inflation nationally. This problem affects every A-level subject at MAS, and is likely to affect most A-level subjects at hundreds of comparable schools across the country. The risk to the students is that fairness to the individuals might be sacrificed.”

    (https://committees.parliament.uk/writtenevidence/8239/html/)

    But Ofqual charged on, advertising to the public that the calculated grades would be consistent between schools and between years (keeping grade inflation down to 2%). By keeping grade inflation down to 2%, Ofqual had to go far beyond what is statistically valid, entailing high risks of giving the wrong grades to the wrong students, which is unjust.

    The lesson for statistical models is not not go beyond what is statistically valid when getting it right or wrong matters. The public was right in not having confidence in the calculated grades.

    Granted that the problem was difficult or impossible to solve, but the principle of Do No Harm should have been respected.