Marking can never be 100 per cent reliable. So perhaps it is time, says one-time examiner Debra Kidd, to remove open-ended tasks from the exam system altogether
I was once an examiner. It was a mind numbing, cheerless experience that was paid at a pittance, but I did it, year on year, because it gave me an insight that helped me to prepare my students for their own exams. I’d start off full of enthusiasm. Within a couple of days, I’d find myself cheering out loud when I opened an empty paper or one with a couple of sentences. No empathy — just relief that I had earned £2.35 for a few minutes’ work (you’ve still got admin to do on it, even if it’s blank). I’d mark papers fresh after breakfast and a shower. And I’d mark papers after my eyes felt like sandpaper and I was yawning so much that I almost swallowed my red pen. Can I honestly say that I marked every paper to the same standard? No. I made sure my sample went in right — checked and double checked. Then I hoped for the best on the rest. I wasn’t negligent. Just human.
As long ago as 1996, Wiley et al showed that marking, even from the same examiner, was unreliable. And this was at a point when there were fewer exams and all examiners were expected to have had at least three years’ teaching experience. The system is so overloaded now that boards last year were recruiting undergraduates and bombarding trainee teachers with requests to examine. It’s no wonder that requests for re-marks have rocketed, as has the number of successful appeals. The system is straining.
This is a difficult enough problem to solve, but even if money were invested to ensure that every examiner was experienced and thoroughly checked, there are still problems with trying to apply criteria to open tasks written under pressure. We really need to consider what exams can tell us about the performance and progress of our children. And to this end, I am moving towards the belief that we would be better off removing open-ended tasks from the examination system altogether.
We must not confuse testing with assessment
There is now significant evidence that it is possible to design multiple-choice tests that reliably tell us whether or not a student understands key concepts/information and is able to apply them. Dylan Wiliam’s hinge questions show that these can be quite sophisticated and Daisy Christodoulou has written widely on how it is possible for multiple-choice to expose misconceptions and show secure levels of knowledge. If we accept that examining open tasks is unreliable and that we have a significant and damaging shortage of people who are willing or even capable of examining, then it makes sense to test what is testable. Hell, even a computer could mark it. BUT…
The sum of an education system should not be limited to what can reliably be tested. We need to take care not to confuse testing with assessment. No child should leave school defined by test scores. They should leave feeling that they have been assessed for a whole range of capabilities. Their test scores may show a level of competency in a set of measurable concepts and knowledge; they will not show imagination, the capacity to argue, assimilate, precis, connect and create. They will not show empathy, compassion, reasoning. They will not allow a student to make a momentous, informed decision such as how and why to vote in a referendum. I don’t believe that any of those skills are possible to properly assess in an examined situation and attempting to do so has led us down a blind alley that is failing young people.
We need to combine that which is testable with portfolio, project-based assessment. And to do this, we need to start trusting and training teachers to assess pupils’ work. While I accept that there is some evidence of bias in teacher assessments, there is no evidence that they are less reliable than the vagaries of examining, particularly in the arts and humanities. Indeed as a year 7 teacher I found assessments for writing from year 6 colleagues to be far more reliable than SATs scores. With careful moderation and training, it is perfectly possible to have the best of both worlds. And in the process we can create a fairer and far more exciting and relevant educational experience for young people.
This is spot on Deborah, and hits on what I have felt for many years.
There is only really one binary subject, Mathematics – where you can positively ascertain whether someone is ‘right’ or ‘wrong’ about a subject. Just look at the disastrous SPAG test for Year 6. Even the linguists, authors, ministers and of most certainly the teachers couldn’t agree on what were the correct answers and definitions. When you throw essay based subjects which require the testing of opinion or creative subjects into the mix, reliable marking becomes intangible.
You are right, we need a varied approach to assess learning and that in turn will give more reliable & constructive feedback to pupils & students as to how they are engaging with the tasks.