The pledge to ‘go further’ to ensure the consistency of inspection in a Schools Week opinion piece this week was a welcome step forward.
It recognises the complexity of the issue and outlines practical measures to strengthen and test reliability.
But if inspection is a form of assessment – and it is – then Ofsted must go further. They need to instigate a serious, open debate about what consistency really means – and be honest about what would be lost if consistency becomes the ultimate goal.
When thinking of reliability in examinations, one central question we ask is about inter-rater reliability: would two examiners award the same mark to the same script?
The inspection equivalent is whether two inspectors – or two inspection teams – would reach the same judgments based on the same evidence. Ideally, yes. But what would that take?
Just as we could turn a history GCSE into a series of unambiguously right or wrong multiple-choice questions, we could turn inspection into a checklist of binary indicators: Is Progress 8 above zero? Is attendance above the national average? Are suspensions declining?
All unambiguously yes, or no. But in doing so, we’d strip away the very nuance and professional insight the system is meant to capture.
Variation is not a flaw to be fixed
Variation in judgement isn’t a flaw to be fixed – it’s an inherent feature of complex evaluation. Even with rigorous training and calibration, some degree of difference will always exist.
So the real challenge is not to eliminate variation by ever narrowing the field of view, but to find the sweet spot – the level of inconsistency we can tolerate while preserving validity.
Too much variation undermines trust. Too little judgment turns inspection into a formulaic exercise.
If we’re serious about consistency, we have to be honest about that trade-off.
In any assessment, where the sweet spot lies depends on how the assessment outcome will be used. The lower the stakes, the more variability we will naturally accept.
It’s easy to see why people call for absolute inspection consistency. In a high-stakes system, consistency feels like fairness.
But perfect uniformity comes at a cost. The only way to achieve absolute consistency in inspection outcomes is to remove professional judgment altogether – a move that would deliver predictability, but at the cost of validity.
In chasing perfect consistency, we risk turning inspection into a tick list, rewarding what’s easily measured and sidelining much that truly matters.
The result? Distorted school behaviours, narrowed priorities, and inspections that miss the point of providing a rounded picture of school effectiveness, to complement published data.
We need clarity from Ofsted
We need clarity about the threshold Ofsted will apply when reviewing judgments. In the past, students could request a re-mark of their exam script: a senior examiner would re-mark the paper, and their view would replace the original.
Today, students can only request a review, where the question asked is not ‘what mark would I give this work?’, but “is this mark justifiable?”. If yes, the original mark stands. A much lower threshold for consistency.
So which threshold will Ofsted use?
Does the senior inspector’s view override all others, with any deviation from it labelled as inconsistency? Or will a judgement be deemed consistent if it is defensible (even if it differs from the senior inspector’s opinion)?
The difference matters. If Ofsted is using the latter, it should say so. If it’s using the former, it should explain why.
More importantly, we need an open, professional debate about where the sweet spot lies – the point at which consistency is strong enough to reassure, yet the process is flexible enough to preserve validity.
That conversation must include inspectors, school leaders, researchers and policymakers. It’s not a technical detail. It’s a question of trust, fairness and educational integrity.
We need a shared understanding of what consistency should mean. Otherwise, we risk chasing uniformity at the expense of insight – and losing sight of what inspection is for.
Your thoughts