Nearly half of pupils in English Literature are not awarded the “correct” grade on a particular exam paper because of marking inconsistencies and the design of the tests, according to the first major study into marking accuracy by Ofqual.
The study, published today and called ‘marking consistency metrics’, shows the probability of a pupil being awarded the “definitive” grade for their exam.
Ofqual used “seeded” items – that is, exam questions given a definitive mark by an experienced moderator and which are also marked by every examiner during their marking period – to ascertain how often examiners veered away from awarding the correct score.
Delegates at Ofqual’s autumn symposium today, titled ‘how consistent can marking be?’, heard how some subjects had greater agreement of scores than others.
Examiners in physics were 95 per cent likely to agree with the definitive mark per question, compared to just 50 per cent in English.
At the exam paper level, these variations affected how likely a pupil was to be given a “definitive” or “true” mark for the paper.
In some subjects the score was just over half of pupils achieving the “true” score on a single exam paper.
In English Literature, the probability of a pupil getting the definitive grade was just over 50 per cent. In history it appeared to be just over 62 per cent.
That compared to probability rates of more than 85 per cent in physics, Spanish and French.
However, the watchdog told Schools Week this does not mean GCSE and A-level grades overall were inaccurate, as the data only relates to individual exam papers.
Different levels of marking accuracy across subjects is inevitable
The large inconsistencies in some subjects is likely to draw criticism from the education community, especially as the government’s move towards linear exams means a pupil’s grade depends solely on their exams, rather than additional coursework.
Ofqual said the differences in marking accuracy across subjects were “inevitable” – highlighting that the consistency of marking in physics is higher than in the “more subjective” English language or history.
The report also shows the probability of receiving a definitive grade is “significantly influenced” by the location of grade boundaries – which are drawn up by exam boards.
Where grade boundaries are close together, the marking consistency “will have a more profound impact” on the definitive grade probability. Therefore the wider the grade boundaries, the greater probability of candidates receiving the definite grade.
The report states that “this is a very important point” as the design of test “might be as important as marking consistency in securing the ‘true’ grade for candidates.”
Is double-marking a solution?
During the symposium, Ofqual also revealed the results of a landmark study into the benefits of having two or more examiners mark each exam papers, in order to increase accuracy.
Using scores given on identical exam answers by all examiners during this year’s GCSE examinations, the watchdog was able to show there is an overall benefit of double-marking – but it is small and not universal.
In English, if all exams were marked by two people, around one pupil in every 100 would more likely to get the “correct” grade. This could be lower or higher than the grade the pupil would otherwise have incorrectly received.
If an exam was given two very different scores, and a third person was brought in, that increases to around four pupils in every 100 getting the “correct” grade.
However, Ofqual predict a triplicate system would require at least 2.5 times more markers than currently available, which would be expensive and could “dilute the quality of markers”.
Double-marking also does not always improve accuracy. In business studies, having two examiners score a 16-mark question actually reduced the likelihood of a pupil gaining the “correct” grade by around 2 percentage points.
Beth Black, associate director of research and analysis at Ofqual, said that while the data showed a small benefit of double-marking, the case was “not compelling”.
The future of exam marking research
The data was published by the exam watchdog as part of its drive to examine the quality of marking.
Further research is now planned to discover the impact of examiner consistency on GCSE and A-level grades, to benchmark variation in different subjects, and consider if changes to question types might reduce marking differences.
Michelle Meadows, executive director for strategy, risk and research, said the organisation wants “to have a wide-ranging discussion about the current state of marking in England, and if and how it might be improved.”
She urged heads, teachers, academics and exam boards to provide their thoughts, adding: “The opinions and research we discuss will be vital in terms of contextualising evidence of the quality of marking within our own system compared with other systems used internationally and domestically within higher education.”
TABLE: Boxplot of the probability of a candidate being awarded the definitive grade. The mean probability for each subject is denoted by the white triangle.