The Teacher Assessment Framework used to mark writing in primary school takes a long time to apply – and isn’t even that reliable, writes Daisy Christodoulou

Teachers frequently report that one of the most time-consuming aspects of their job is marking. Given that time at the start of a frantic new year is in especially short supply, it’s more important than ever that teachers don’t spend unnecessarily long on it.

Extended pieces of writing are particularly laborious to mark. At primary, the statutory national rubric for writing, the Teacher Assessment Framework (TAF) is very specific about the features of good writing, and so can take a long time to apply to each pupil’s work.

Yet despite the time it takes, there’s plenty of evidence to suggest that this type of rubric-based assessment doesn’t deliver very consistent results.

Comparative judgement (CJ) is an alternative method of assessing writing that does not rely on a rubric. Instead, teachers read two pieces of writing and make a holistic judgement about which is the better script. Many different teachers make a series of such decisions, and those decisions are combined to provide a grade for every piece.

Teachers worry about whether they’ll get the same grade using comparative judgement

The organisation I work for, No More Marking, carried out a study in 2019-20 funded by the DfE and the charity NESTA that compared the consistency of grading when using either comparative judgement or the TAF. We worked with an independent researcher from Oxford University, Anne Pinot de Moira.

In early March we ran a year 6 comparative judgement assessment in which more than 7,000 teachers judged over 33,000 pieces of writing. Our guidance for teachers is simple: we ask them to make a professional judgement about which is the better piece of writing.

Meanwhile, we recruited a senior local authority moderator and 30 experienced primary teachers to work with us on grading a smaller sample of 349 of these scripts using the TAF.

The moderator trained the teachers in using the TAF, and then each marked a pack of scripts. Each script was graded by at least eight different teachers.

For both CJ and the TAF, there were three possible grades: Working Towards (WS), Expected Standard (ES) and Greater Depth (GD).

Our findings were pretty striking. First, we found that CJ was more consistent than the TAF.

For the TAF, the chance that a particular piece of writing would be given the same grade by two different markers was 64 per cent.

But for CJ, the chance that a particular piece of writing would be given the same grade by two different uses of CJ was 86 per cent – much higher.

Second, we found that if each script was marked by four different teachers and those grades averaged out, then the reliability of the TAF reached the same standard as CJ. However, this obviously takes a lot longer: we found it took eight minutes to mark a script this way with the TAF, but just four with CJ.

Thirdly, comparative judgement actually matches up pretty well to the average TAF grade. Let me explain. Lots of teachers worry about whether they’ll get the same grade using CJ as using the TAF.

To look at this, we averaged out the grades our teachers gave each script using the TAF. For example, if four teachers gave a script EXS, two gave it WTS, and two gave it GDS, then the average would be EXS. This process gave us a more reliable “average-TAF” grade for each of the 349 scripts.

Then we compared that average TAF grade to the CJ grade for each script. We found that for 79 per cent of scripts, the CJ grade and the TAF grade matched.

Furthermore, because 79 per cent of CJ and TAF grades matched, they actually agree more often than any two individual markers using the TAF (who, if you remember from the above, only agree 64 per cent of the time).

We’ve published the scripts and grades so you can take a look yourself.

Teachers’ time has never been more precious. With comparative evidence showing a better marking policy is available, what better support could we offer them?