Opinion

Question level analysis is a waste of your time… so stop

When used correctly data can help inform important school decisions, but there is always a danger one reads too much into the numbers

When used correctly data can help inform important school decisions, but there is always a danger one reads too much into the numbers

18 Nov 2025, 5:00

Our education system is awash with data. When used correctly this can help inform important school decisions, but there is always a danger one reads too much into the numbers, spotting trends and patterns that are not really there.

The use of assessment results falls into this category. Given their role in accountability, school stakeholders obviously take a keen interest in the results (particularly when it comes to key stage 2 SATs and GCSEs).

But some might take their analyses too far, making inferences and basing decisions on somewhat shaky grounds.

Last year, half of teachers in a Teacher Tapp poll reported they were entering data for Question Level Analysis (QLA) – which analyses performance on individual questions, instead of overall scores.

Similarly, assessment organisations are regularly asked to provide their clients with “sub-domain” scores, so they can better understand pupils’ relative strengths and weaknesses.

For instance, perhaps little Jonny is great at fractions but terrible at geometry, meaning he should spend more time working on his understanding of shapes and angles.

Or perhaps the whole class is awesome at algebra but doesn’t really get statistics, potentially guiding teaching plans.

Fraught with difficulties

The trouble is that QLA is fraught with difficulties. Individual questions differ in several ways, making it difficult to know what exactly to take from a single correct or incorrect response.

Such information is also very noisy, given one is looking at single points of data, one at a time.

Sub-domain scores in many ways attempt to bridge this gap, providing schools with more granular information than overall test scores, but with greater reliability than looking at individual question responses.

Sounds great, right? But is such additional information really that useful to schools?

In a recent project funded by the Nuffield Foundation, we investigated this issue with respect to the key stage 2 mathematics test.

The central aim of our project was to investigate the reliability of key stage 2 SATs sub-domain scores, and provide useful information to provide back to schools (e.g. to inform their teaching and curriculum development).

First, the good news

We believe that producing sub-domain scores that are reliable enough for school-level reporting is indeed possible. We have managed to produce reasonably reliable school-level SATs scores for the eight areas of the Key Stage 2 mathematics curriculum.

But this must be done by pooling data across years and requires the use of fairly sophisticated statistical techniques (you can’t just add up the number of geometry questions pupils get right and expect to produce a reliable geometry score – which is what the Department for Education currently do).

Now for the bad news

These scores turn out to be pretty useless, in terms of the additional information they provide.

They, in essence, give schools very little extra insight over what can be inferred from overall mathematics scores. This is reflected by just how similarly schools perform across the eight national curriculum domains – the vast majority of the correlations sit above 0.99.

One may of course question whether this is something specific to the key stage 2 mathematics test. We have, however, also experimented with the reading data, where essentially the same result was found.

Our initial plan – once we had produced our scores – was to deliver school-level results back to schools. But – based on our findings – we no longer believe this is the right thing to do.

With more than enough data to be getting on with, all this information would do is give schools some extra distracting noise.

While this may at first seem a bit of a depressing result (at least for us) the findings do have real value for schools.

We all know the workload pressures staff are under. Our results show that any school currently undertaking QLA or any kind of sub-domain analysis of the key stage 2 tests should stop. This practice is at best a waste of time and – at worse – counterproductive.

We believe the same is likely true for many other assessments schools use, including those from commercial providers and QLA of GCSEs.

In life, sometimes less is more. This is also true in terms of reporting results from assessments back to schools.

This research was conducted by John Jerrim, Dave Thomson and Natasha Plaister

Latest education roles from

NCG – Head of Learner Data Services (LDS)

NCG – Head of Learner Data Services (LDS)

FEA

Delivery Director

Delivery Director

Knovia

Finance Director – HRUC

Finance Director – HRUC

FEA

Chief Executive Officer

Chief Executive Officer

Danes Educational Trust

Your thoughts

Leave a Reply

Your email address will not be published. Required fields are marked *

3 Comments

  1. NoahGiraffe

    So we just stop thanks to a short, anecdotal article with no clarity on exactly what your analyses showed? QLA can highlight trends for a class or year group, to be further investigated. It is not a magic wand, but it can generate curiosity in teachers in the right areas. Perhaps low scores are due to a problem solving element of the question rather than simple knowledge. So a ‘red’ might not mean the students don’t know the content. The QLA instigates this investigation. A relatively short piece of data entry (Vs the marking time) can help to highlight areas that you night not pick up during marking?
    More details needed on your study for me. With Mats ever expanding, pooling data and using your unnamed ‘statistical techniques’ is possible if you would care to share which techniques and what insights you gained over simple averages.
    “We believe the same is likely true” based on gut instinct? Why would it be true of GCSE Vs KS2? Very different levels of knowledge and skills.
    Give us some depth please, as this article just feels like you don’t like entering QLA data. If so; utilise admin staff at your school who are there to support data entry, this is what many schools do! You can get on and plan (using that data)

    • Have to disagree with this article. Very light on detail it’s dangerous in that leaders will just see the headline and stop. QLA can provide essential detail on areas of the curriculum that needs more work, or types of questions, content from certain nc year delivery etc. Just being able to say our disadvantaged girls didn’t so well in maths, for example, is not enough! The delay in publication of yr6 QLA by DfE this year meant school leaders started the year without this detail…and they missed it!

  2. It sounds like Question Level Analysis (QLA) may be useful for learning, as the article explains.

    The article then conveys that the author has apparently discovered that more-detailed school-level mathematics scores are perhaps less helpful, although in unclear circumstances – presumably in the context of school-level accountability, although this is not really clear from the article.

    Using information for refining learning and using information for school-level accountability are different.

    Even if information is less helpful in the context of school-level accountability – it doesn’t mean that it’s less helpful for learning.

    I realise that it’s tempting to have ‘impactful’ research findings or messages, but I’m not sure that it’s especially helpful if or when this risks sensationalisation such as sweeping claims about whether something is helpful or not – readers risks inferring that a general claim is being made, but the research results are presumably much more focused and limited.