Ofsted

Ofsted reform should focus on inspection reliability first

Our new research casts doubt on the reliability of 'Inadequate' Ofsted judgments, explains Sam Sims, but any attempt at reform should aim to understand this better

Our new research casts doubt on the reliability of 'Inadequate' Ofsted judgments, explains Sam Sims, but any attempt at reform should aim to understand this better

2 Feb 2023, 17:30

New Ofsted research looks at the role of MATs in inspections

School inspections add value to data-driven school performance metrics by sending an experienced educator to collect first-hand evidence from inside a school. The human element of an Ofsted visit is a feature, not a bug.

But each inspector comes with their own unique set of experiences and priorities. This can lead to inconsistency. Two inspectors might reach different conclusions about the same school.

Given that perfect reliability is not desirable, how much reliability should we expect?

The American Educational Research Organisation argues that the higher the stakes of any assessment, the more reliable it should be. Big decisions require reliable judgements.

It is well known that Ofsted ‘Inadequate’ judgments can lead to school closures or heads losing their jobs. So when it comes to the lowest Ofsted judgements, we should expect good reliability.

Christian Bokhove, John Jerrim and I have just released new Nuffield-funded research comparing the judgements reached by 1,376 different inspectors across 35,751 schools between 2012 and 2019.

We found that primary schools assigned a female lead inspector are around one-third more likely to receive an ‘Inadequate’ judgement. Just under 6 per cent of judgements reached by female inspectors were inadequate versus 4.5 per cent by male inspectors.

Maybe female inspectors tend to get sent to weaker schools? But we found that this pattern held even when we compared male and female inspectors sent to inspect schools with the same prior Ofsted inspection rating, exam results, levels of pupil absences, pupil intake, and in the same region of the country.

‘Inadequate’ judgments may not be reliable

Of course, we can’t definitively establish that there were no differences between the schools to which male and female lead inspectors were assigned. Maybe there were subtle differences – visible to the inspectors, but not in our data.

The only way to definitively establish the reliability of Ofsted inspections is to send two Ofsted inspectors to the same school, and check whether they agree. Indeed, you may remember that Ofsted did just such a study back in 2016 and found that the two inspectors tended to agree.

But this research had some important limitations. Crucially, the inspected schools were all previously rated ‘Good’, meaning they were subject to a short inspection in which the presumption was that they remained ‘Good’ unless proven otherwise. The inspections were also conducted by more senior inspectors, known as HMIs.

At the time, Amanda Spielman described this study as a “first step” and said that Ofsted should “routinely be looking at issues of consistency and reliability”. Ofsted has conducted a range of research since. However, there have been no more of these gold-standard two-inspector-one-school studies since.

Crucially, there has been no research on the critical ‘Inadequate’ judgements. These are big decisions, but we do not have any evidence to suggest that they are reliable. Indeed, our new research provides some evidence to suggest they may not be.

Spielman’s term as Chief Inspector comes to an end in January 2024. And current polling suggests the government may lose power in the general election soon after. This creates a window of opportunity for modernising Ofsted. But what should be done?

Labour has recently dropped its Corbyn-era policy of abolishing Ofsted, promising instead to reform the inspectorate and focus it more directly on school improvement. Retaining Ofsted will likely be popular with parents. But Bridget Phillipson was heckled by teachers when she announced the plan at a union conference this week.

I would advise the shadow secretary of state to announce a series of new Ofsted reliability studies. These should use the gold-standard two-inspector-one-school methodology. And there should be four studies, focusing on schools in each of the four categories.

This would likely be popular with teachers who demand to know whether the methods by which they are held to account are reliable. It should also be popular with parents who will learn about how much weight to place on judgements.

Importantly, the results would also provide the information policymakers need to make an informed decision about whether we have struck the right balance between the consequences of inspections and their reliability.

More from this theme

Ofsted

Ofsted to explore how AI can help it make ‘better decisions’

Watchdog sets out how it will use artificial intelligence, including training inspectors on the technology

Samantha Booth
Ofsted

DfE: ‘No plans’ to change single-phrase Ofsted judgments

The government will respond to MPs' concerns this Thursday

Lucas Cumiskey
Ofsted

Ofsted criticises ‘limited and poor quality’ RE lessons

5 key findings from Ofsted's religious education subject report

Lucas Cumiskey
Ofsted

Former Ofsted chief to lead watchdog’s independent review 

Inquiry will look at whether Ofsted's policies for responding to tragic incidents need to be revised after Ruth Perry...

Lucas Cumiskey
Ofsted

Ofsted rolls out key complaints process changes

Watchdog also beefs up its policy on pausing inspections

Lucas Cumiskey
Ofsted

Keegan refuses to retract remarks about punching Ofsted inspectors

Education secretary says comment was 'light-hearted' and she 'clearly would not be punching anyone, or advocating anyone else do...

Lucas Cumiskey

Your thoughts

Leave a Reply

Your email address will not be published. Required fields are marked *

One comment

  1. “Big decisions require reliable judgements.”

    The same is true for GCSE, AS and A level grades too, for which reliability is even more important – being awarded a wrong grade can be life-changing. As happened in August 2022 for about 23,000 students who received certificates showing grade 3, fail, when, had a senior examiner marked their scripts, they would have been awarded grade 4, pass.

    Unreliable grades do great damage, as discussed in FE Week a few days ago https://feweek.co.uk/gcse-re-sits-wrong-grades-drain-students-and-resources/