Roger Taylor, the former chair of Ofqual, has spent much of his life thinking and writing about transparency and data. Here he says why no algorithm would have worked – and why Ofqual didn’t share it
“An unsolvable problem.” That is how Roger Taylor, former chair of Ofqual, describes what the regulator was tasked with in 2020. To devise a statistical model for awarding grades when nobody was sitting exams that was accurate, did not cause inflation and was acceptable to the public. “An unsolvable problem,” he repeats.
Since resigning at the end of last year, Taylor has kept quiet. But since last week he has been talking. Because he was not an employee, he is not forbidden by contract from speaking out – a rare privilege among ex-top decision-makers.
It’s a power Taylor has wielded before. He publicly demanded that the education secretary Gavin Williamson stop taking credit for Ofqual’s decision to switch to teacher-moderated grades (which worked). Taylor also published the non-disclosure agreement Ofqual was asking algorithm experts to sign, after being criticised for its contents. In contrast to ex-chief regulator Sally Collier, who has been almost silent, Taylor has been free to speak up, and even call people’s bluff.
But why speak now? It’s six months since Taylor resigned. No one has been blaming him much for last year’s grading fiasco: despite trying to wriggle out of it, Williamson has taken a lot of the blame.
Yet Taylor has taken the rather unusual step of publishing an essay last week with the Centre for Progressive Policy, titled: “Is the algorithm working for us?”. Chapter One looks at “The 2020 exam debacle: how did it happen?”. Of course, by producing a defence, Taylor risks bringing criticism back on to his head. It’s a bold move.
When we meet virtually this week, Taylor is sitting at a desk behind which hangs a tasteful impressionist painting and with two shelves packed with vinyl records. He has a lively, intelligent face and unpacks his ideas rapidly. He’s a PPE graduate from Oxford, with an MSc in economics, and is a former Financial Times journalist who reported on tech before moving into data technology businesses.
“It was very, very intense,” he says of the period when Ofqual started to design a grading model. “There was an incredible effort made by everyone to try to make something that was workable.”
But here is Taylor’s point. He believes it was never workable. “The point in my paper was, the constraints set at the beginning […] necessarily involve telling people who would have passed their exams, that they haven’t. And that was the issue that was not adequately considered.”
He points out that Ofqual “is constitutionally obliged under law to prevent grades from inflating”. It’s also widely known that education ministers, particularly Nick Gibb, were adamant: no grade inflation.
But Taylor writes: “From the point of view of the individual citizen, the problem looks different. They see that the government has denied them the chance to demonstrate that they deserve a university place […] It has put their future at risk.”
Policymakers assumed they should offer the same number of university places as normal, and fill them as accurately as possible. Instead, Taylor argues that inflation (inaccuracy) should have been allowed, and more university places offered.
His argument is about the difference between accuracy and legitimacy. “People are not willing to accept their lives being affected by a decision-making process driven by predictive algorithms,” says Taylor. “We risk missing this very basic lesson, if we comfort ourselves with the idea that the algorithm malfunctioned.”
In a way, Taylor is saying the mistake was basically a PR one; a failure to understand human psychology. “In terms of what you might call outcomes-based fairness, the algorithm is the appropriate approach. But the point is, that was never going to fly.”
“Teacher-assessed grades are in many ways more biased than the moderated grades,” he continues. “Their advantage is not that they are less biased, the advantage is that they allow for a significant amount of inflation.”
So why didn’t Ofqual spot the PR problem sooner?
“I do feel it was possible to work out earlier on that this wasn’t going to work,” Taylor says. “That is something everyone involved needs to reflect on.” He points out Ofqual’s consultation showed a degree of consensus. When asked about “the relative weight that the model should place on historical evidence of centre performance” (a bone of contention for many improving schools) 54 per cent agreed, with fewer (33 per cent) against.
Yet even if Ofqual didn’t spot the problem earlier, they were told about it later. After Ofqual’s consultation response in April, dissident voices became more insistent: the education select committee published a strongly worded warning in July and school leaders echoed it. Still Ofqual persisted. Why not drop the model?
Taylor has a curious answer to this. “My view on that is you very quickly risk the regulator getting involved in what are properly political decisions. My own stance on that is quite conservative: politics is for politicians.” The answer is a tricky one, as Ofqual is an independent body, accountable to parliament – not a blind executioner of DfE will.
Just before the decision to axe the algorithm last year, Williamson also forced Ofqual to pull its guidance on the proposed triple-lock appeals policy. So is Ofqual really independent? Taylor points me to meeting minutes that show both he and Collier “advised against the board changing its position”, declining to answer whether the DfE’s intervention was a problem.
The question of independence continues, as just this week Williamson appointed his own policy adviser to become chief regulator. Meanwhile, the government’s go-to person to lead expert reviews, Ian Bauckham, has taken over from Taylor as chair.
Another big criticism levied at Ofqual during this whole process was a lack of transparency. If Taylor draws a line on how “political” Ofqual should have been, he also draws a line on how transparent.
If you tell everyone about it, there is a risk of it leading to gaming,
He himself is an author of a book on transparency, of which he self-deprecatingly says “about three people have read it”. Published in 2016, it is called Transparency and the Open Society.” It makes the case, says Taylor, for transparency with certain limits. In a sense, it’s the same approach Taylor took with the algorithm itself.
Why didn’t Ofqual share the algorithm model?
“If you tell everyone about it, there is a risk of it leading to gaming,” responds Taylor. But surely sharing it with a group of expert statisticians is not the same as sharing it with “everyone”. Yet Taylor is unconcerned by this decision, because he holds that no algorithm would have worked. They are simply too unpalatable to the individual.
He is frank, meanwhile, about the focus on GCSEs and A-levels rather than vocational qualifications, such as BTECs. He notes that general qualifications were Ofqual’s remit when the regulator was set up in 2010, although he says since then, expertise in vocational qualifications has improved. However, he adds that “primarily because of the consequences around university admissions” there was “a lot more focus on general qualifications […] at a political level.”
Overall, Taylor deserves real credit for trying to make us think about the possibilities and limitations of algorithms, and the difference between accuracy and legitimacy. He cares about digital technology in public services. He previously founded a company, Dr Foster, which drew data together about hospitals, and he has worked for the Careers and Enterprise Company. He becomes passionately frustrated as he explains the DfE should urgently ensure every student has a “digitised record” of themselves, their achievements and qualifications.
It would allow students to keep all their qualifications in one place (the biggest request from students to Ofqual are for copies of exam certificates, he says) while others who see the record “don’t just look at grades, but look at them in context”. It could particularly help disadvantaged pupils, who often have a “thinner file”.
If people had richer individual education records … it might be a less pressurised situation
Such records might even help alleviate the 2020 situations of the future, says Taylor. “In a great many years’ time, if people had richer individual education records and realised their fate didn’t hang on a single grade, but a more nuanced judgment, it might be a less pressurised situation.”
His belief in the smart use of data in public services made the grading fiasco “quite painful”, Taylor reflects.
“I’ve spent most of my life looking at […] how do we use data that is fair to people and particularly in ways that empower individuals.” Instead the debacle was “an example of the government using data in a way that was deeply and massively insensitive to individuals. It was quite painful to me personally to have been involved.” The son of a philosophy academic, it seems Taylor has been genuinely mulling the philosophical problems – and opportunities – of statistical modelling in education since he departed.
There are some holes in his answers. There is also a hole in his solution: just this week, schools are warning that students with top grades have received no university offers, because universities awarded too many places last year. In a way it goes to show, algorithm or no algorithm, every solution was deeply flawed.
I ask Taylor why he stepped down.
“Whatever you think about 2020, my view is that Ofqual is a world-class organisation. There’s not many organisations that understand assessment.” He laughs. “2021 is going to be a difficult year. It wasn’t going to help Ofqual’s case to have the same grey, old bloke in place.”
Perhaps, however, Ofqual has lost one of its most open communicators.
“Policymakers assumed they should offer the same number of university places as normal, and fill them as accurately as possible. ”
There is an implication here that the “gold standard” exam system achieves a desirable level of “accuracy”, whereas last year’s process did not.
But in her evidence to the Select Committee on 2 September 2020, Ofqual’s then Chief Regulator, Dame Glenys Stacey, acknowledged that exam grades “are reliable to one grade either way”. Which, in practice, is saying that an A level certificate showing ABB really means “any set of grades from A*AA to BCC, but no one knows which”.
Is this level of “accuracy” (Mr Taylor’s word) or “reliability” (Dame Glenys’s) acceptable? How does a student, awarded ABB but denied entrance for not having been awarded AAB, feel about that?
The process in summer 2020 was a mess, and the jury is still out on 2021. But what is the jury’s verdict on “gold standard” exams which deliver grades “realiable to one grade either way”?
If your boss or client asks you to solve an unsolvable problem, you have a professional duty to tell them that it’s unsolvable. They might not know that. If your they want to you solve that problem by sacrificing fairness to individuals, you have an ethical duty to tell them that it’s not right.
In June 2020, I sent Ofqual a version of the document below, in which I stated,
“Data from Matthew Arnold School in Oxford (MAS) suggests that for A-levels the Exceptional Arrangements as published so far has virtually no chance of providing grades to the students in a way that satisfies the double criteria of being fair to the individuals and controlling grade inflation nationally. This problem affects every A-level subject at MAS, and is likely to affect most A-level subjects at hundreds of comparable schools across the country. The risk to the students is that fairness to the individuals might be sacrificed.”
https://committees.parliament.uk/writtenevidence/8239/html/
When Ofqual not only went along with the a solution that sacrificed fairness to the individuals, but presented a rosy picture of it, it was not a PR problem but an ethical one. The issue was unethical use of algorithms.
Regarding transparency, it was right for Ofqual not to to disclose the algorithm to the public, for the reason that Mr Taylor said. However, it should have disclosed that in testing, compared with grades awarded by exams, their best model got A-level Biology grades wrong around 35% of the time and French grades wrong around 50% of the time, while for GCSEs, it awarded around 25% wrong Maths grades and around 45% wrong History grades, etc. Did it also hide its test results from ministers?