Opinion: Accountability

Ofsted reform should focus on inspection reliability first

Our new research casts doubt on the reliability of 'Inadequate' Ofsted judgments, explains Sam Sims, but any attempt at reform should aim to understand this better

Our new research casts doubt on the reliability of 'Inadequate' Ofsted judgments, explains Sam Sims, but any attempt at reform should aim to understand this better

2 Feb 2023, 17:30

New Ofsted research looks at the role of MATs in inspections

School inspections add value to data-driven school performance metrics by sending an experienced educator to collect first-hand evidence from inside a school. The human element of an Ofsted visit is a feature, not a bug.

But each inspector comes with their own unique set of experiences and priorities. This can lead to inconsistency. Two inspectors might reach different conclusions about the same school.

Given that perfect reliability is not desirable, how much reliability should we expect?

The American Educational Research Organisation argues that the higher the stakes of any assessment, the more reliable it should be. Big decisions require reliable judgements.

It is well known that Ofsted ‘Inadequate’ judgments can lead to school closures or heads losing their jobs. So when it comes to the lowest Ofsted judgements, we should expect good reliability.

Christian Bokhove, John Jerrim and I have just released new Nuffield-funded research comparing the judgements reached by 1,376 different inspectors across 35,751 schools between 2012 and 2019.

We found that primary schools assigned a female lead inspector are around one-third more likely to receive an ‘Inadequate’ judgement. Just under 6 per cent of judgements reached by female inspectors were inadequate versus 4.5 per cent by male inspectors.

Maybe female inspectors tend to get sent to weaker schools? But we found that this pattern held even when we compared male and female inspectors sent to inspect schools with the same prior Ofsted inspection rating, exam results, levels of pupil absences, pupil intake, and in the same region of the country.

‘Inadequate’ judgments may not be reliable

Of course, we can’t definitively establish that there were no differences between the schools to which male and female lead inspectors were assigned. Maybe there were subtle differences – visible to the inspectors, but not in our data.

The only way to definitively establish the reliability of Ofsted inspections is to send two Ofsted inspectors to the same school, and check whether they agree. Indeed, you may remember that Ofsted did just such a study back in 2016 and found that the two inspectors tended to agree.

But this research had some important limitations. Crucially, the inspected schools were all previously rated ‘Good’, meaning they were subject to a short inspection in which the presumption was that they remained ‘Good’ unless proven otherwise. The inspections were also conducted by more senior inspectors, known as HMIs.

At the time, Amanda Spielman described this study as a “first step” and said that Ofsted should “routinely be looking at issues of consistency and reliability”. Ofsted has conducted a range of research since. However, there have been no more of these gold-standard two-inspector-one-school studies since.

Crucially, there has been no research on the critical ‘Inadequate’ judgements. These are big decisions, but we do not have any evidence to suggest that they are reliable. Indeed, our new research provides some evidence to suggest they may not be.

Spielman’s term as Chief Inspector comes to an end in January 2024. And current polling suggests the government may lose power in the general election soon after. This creates a window of opportunity for modernising Ofsted. But what should be done?

Labour has recently dropped its Corbyn-era policy of abolishing Ofsted, promising instead to reform the inspectorate and focus it more directly on school improvement. Retaining Ofsted will likely be popular with parents. But Bridget Phillipson was heckled by teachers when she announced the plan at a union conference this week.

I would advise the shadow secretary of state to announce a series of new Ofsted reliability studies. These should use the gold-standard two-inspector-one-school methodology. And there should be four studies, focusing on schools in each of the four categories.

This would likely be popular with teachers who demand to know whether the methods by which they are held to account are reliable. It should also be popular with parents who will learn about how much weight to place on judgements.

Importantly, the results would also provide the information policymakers need to make an informed decision about whether we have struck the right balance between the consequences of inspections and their reliability.

Latest education roles from

Student Support and Attendance Officer

Student Support and Attendance Officer

Solihull College and University Centre

Group Director of Information Technology (IT) – The Bedford College Group

Group Director of Information Technology (IT) – The Bedford College Group

FEA

GCSE English Teacher

GCSE English Teacher

Barnsley College

Tutorial Learning Mentor

Tutorial Learning Mentor

Barnsley College

Tutor of Engineering : Fabrication & Welding

Tutor of Engineering : Fabrication & Welding

York College

Lecturer in Construction – Carpentry & Joinery

Lecturer in Construction – Carpentry & Joinery

Castleford College

Sponsored posts

Sponsored post

Bridging the Skills Gap: Recognising Self-Awareness and Wellbeing

ASDAN renews the six core skills at the heart of its learner-led approach and development of personal effectiveness qualifications.

SWAdvertorial
Sponsored post

Cybersecurity in Education: Building Trust and Integrity

Schools, academies, colleges and, universities in particular, are expected to provide state-of-the-art facilities, blending advanced technology with academic excellence...

SWAdvertorial
Sponsored post

Ensuring Learning Never Stops: Portakabin Supporting Schools Affected by RAAC

In recent months, the discovery of reinforced autoclaved aerated concrete (RAAC) in over 230 schools across England has presented...

SWAdvertorial
Sponsored post

Text-based programming tools for young learners

The Raspberry Pi Foundation’s Code Editor helps make learning text-based programming simple for children aged 9 and up. Learn...

SWAdvertorial

Your thoughts

Leave a Reply

Your email address will not be published. Required fields are marked *

One comment

  1. “Big decisions require reliable judgements.”

    The same is true for GCSE, AS and A level grades too, for which reliability is even more important – being awarded a wrong grade can be life-changing. As happened in August 2022 for about 23,000 students who received certificates showing grade 3, fail, when, had a senior examiner marked their scripts, they would have been awarded grade 4, pass.

    Unreliable grades do great damage, as discussed in FE Week a few days ago https://feweek.co.uk/gcse-re-sits-wrong-grades-drain-students-and-resources/