Ofsted’s Lee Owston has said that they will “think again” if opposition to their new report cards is sector-wide. I hope he means it, because the plans are badly flawed.
Examining is a science. It even has its own scientific discipline: psychometrics, whose insights I want to apply to Ofsted today.
Psychometrics tells us that longer exams are more reliable than shorter exams. A five-minute driving test would be hopeless.
It also tells us that increasing the number of grades reduces the chance that the assigned grade is correct. If GCSEs had 26 grades, no one would expect an examiner to tell a J in history from a K.
Finally, reliability and validity are in tension. The driving theory driving test is very reliable, but it does not prove you are safe on the road. Hence, we have the practical test – less reliable, but more valid.
Assessing school performance has the same issues. Aggregate exam results are relatively reliable, but exam results alone are not sufficient for a valid judgement. We also need to know if a school is sacrificing everything for that extra GCSE grade. We need to know about safeguarding, wellbeing and British values.
Hence Ofsted. It examines schools. But unless everything is wonderful, or everything is terrible, two inspectors may legitimately come to different conclusions about the quality of education. In other words, its inspections can never be fully reliable.
The inspection framework mitigates that risk, which is why we see complaints of stale reports constructed from sentences plucked from a databank. Soulless, perhaps, but more reliable.
The proposed new framework makes life much harder. Currently, inspectors have to choose one of four grades across four areas of judgment. This is relatively reliable and, in most cases, uncontentious. Indeed, Sir Jon Coles reports that 98 per cent of inspections of United Learning schools either agreed with the school, or the school accepted that the result was borderline.
Further consultation is surely now inevitable
Now, however, Ofsted proposes to rate as many as 11 aspects of a school, using five grades. It’s as a matter of psychometric certainty that this will be less reliable. There simply won’t be time to assess a school accurately in so many aspects, or to the required degree of precision.
The results will inevitably be more arbitrary, to the point of endangering validity. A school might be confident they are ‘strong’ on teaching quality, but if the inspector judges they “check pupils’ understanding systematically”, they will only be ‘secure’.
What the difference between that and “expert at checking pupils’ understanding” is beyond me. According to Schools Week’s investigation, it is beyond most of my readers too. And since Ofsted draws most of its inspectors from the sector, I doubt they will be any more capable of accurate discernment.
The one clear improvement in the proposals is the plan to make safeguarding a simple yes/no, separate from other school judgements. Beyond that, though, the new framework cannot lead to consistently reliable judgements, so it is no surprise that school leaders are up in arms about it.
The chief inspector made his name with the Big Listen, and the sector is speaking with one voice. Lee Owston says they will listen again, and that is great news for anyone who cares about children or teachers or justice.
The problem Ofsted faces is that Labour are committed to report cards, premised on the very idea of giving a broader picture of school performance. In other words, more data points.
And so the inspectorate is caught between the political rock of a manifesto commitment and the psychometric hard place of inspection validity.
Further consultation is surely now inevitable. Hopefully, it will gather enough evidence for Sir Martyn to go to the secretary of state and confront her with the reality that her plans are flawed.
And if she will not listen, then he will need to say that she has made his position untenable. You cannot serve as a chief inspector if you know that the system you preside over does not work.
Politics is less predictable than psychometrics, but I’m reliably certain that Ofsted isn’t out of the woods yet.
Your thoughts