Report card reforms run critical reliability risks

Ofsted’s planned changes to school inspections are wide-ranging. In the face of such change, there is a danger of losing sight of inspection’s core purpose: to provide judgments of school quality. If those judgments are not sufficiently reliable, they undermine the very granularity the changes are aiming to provide.

One clear positive from a reliability and consistency perspective is that ungraded inspections will no longer take place from next year. They didn’t have a lot of resource devoted to them, and research suggested that ‘Good’ judgments based on graded and ungraded inspections were not comparable.

That said, reliability is a single point of failure for school inspections and it remains risky for the proposed new system. Here are four ways it could become a critical issue, and what Ofsted can do to mitigate some of these risks.

The sum and the parts

It is well-known that scores awarded to sub-domains are less reliable than subject grades overall. For example, we are less confident in pupils’ skills in different aspects of mathematics (geometry, statistics, algebra) than in their mathematics ability overall.

The same almost certainly holds true for school inspection. So, by moving the focus away from an overall effectiveness grade to (at least) eight evaluation areas, Ofsted may be inadvertently worsening the reliability of the information they are providing.

If not measured with sufficient reliability, this could lead parents (and those responsible for school improvement) to focus their attention on the wrong things.

Correlation matters

One of the problems with the four sub-judgments in Ofsted’s current model is that they are so highly correlated that they tend to be of little use. Ofsted should therefore make clear how correlated they believe the scores across their new evaluation areas should be, and then see how this stacks up when their changes are being tested.

For instance, our view is that personal development scores should be more strongly correlated with behaviour and attitudes than (for instance) curriculum.

Were this not the case, we think it might bring into question how inspectors are awarding scores across the different evaluation areas, and whether the proposed changes can reliably offer the nuanced detail they are meant to bring.

Either way, now is the time for Ofsted to set out exactly what patterns they expect to see.

Inconsistent reliability across areas

It seems likely that some evaluation areas will be measured with greater reliability than others. For instance, if achievement and attendance draw more heavily on quantitative data, then we anticipate they will be rated more consistently than personal development.

Ofsted need to explore and clearly document such matters. If parents are meant to use this information to inform their choices, they need to know which judgment they can have greater (and lesser) confidence in.

Reliable enough for the job

Lets be clear. No school inspection regime will be 100 per cent reliable. What matters is whether it is sufficiently reliable for the purpose that it is being used.

For that reason, it is important for Ofsted to make clear what level of reliability they would deem to be acceptable versus what would cause them concern.

This is likely to vary according to the consequences attached. As there are still sanctions associated with the lowest grades (particularly for leadership), the level of reliability here needs to be higher than – for instance – those at the margin of being rated ‘Secure’ versus ‘Strong’.

In sum, it is unclear if the proposed new grading system will be reliable enough to be useful.

Ofsted should therefore do three things:

set out what they consider to be an adequate level of reliability for each of the criteria in their framework – particularly for those that are high-stakes.
commit to an independent and transparent programme of research to test the reliability of inspection
be prepared to make changes if the inspection grades are found not to meet Ofsted’s standards for reliability.

We can spend months discussing the detail of the new framework. But unless the judgments inspectors produce are sufficiently reliable, it will all be for nought.

This article was co-authored with Dr Sam Sims, Associate professor, UCL Centre for Education Policy and Equalising Opportunities

Principal & Chief Executive – Bath College

Report card reforms run critical reliability risks

Latest education roles from

IT Technician

Teacher of Geography

Lecturer/Assessor in Electrical

Director of Management Information Systems (MIS)

Exams Assistant

Lecturer Electrical Installation

Sponsored posts

From Provision to Purpose: Making Internal AP Work for Every Pupil

Dream Big Day: Empowering Every Pupil to Imagine, Create, and Flourish

Reframing digital skills for the workforce of tomorrow

Safe to speak, ready to act: SaferSpace tackles harassment, misconduct and safeguarding concerns in schools

Your thoughts

Leave a Reply Cancel reply