Has Ofsted given up on school case studies?

Ofsted’s case studies have not always had their intended consequences. As Sean Harford said at the recent ‘Radical ideas to transform Ofsted’ conference at the UCL Institute for Education: “When Ofsted tries to nudge something, it often becomes a shove.”

A famous example is triple marking. When the inspectorate published the case study of a school using an innovative method of pupil feedback, it caused a nationwide mania for marking with different coloured pens, notoriously piling additional work onto already hard-pressed teachers.

So Ofsted pulled back on case studies. Then last December, there was ‘Bold beginnings’ – a study of practice in the reception classes of 41 primary schools, which irked many and led to accusations that Ofsted is picking and choosing “good practice” case studies according to its own bias.

After the conference, we caught up with a man who spends much of his time engaging with the schools community, trying to repair the damage caused by the triple marking case study and others, and asked whether Ofsted really has ditched the practice for good.

CM: Is there a case for publishing examples of good practice in schools?

SH: Case studies have got us into problems in the past. The marking case study – that’s where triple marking came from, that and Dylan William and Paul Black with Inside the black box. Coupled together, they were like blue touchpaper – it got all the schools doing triple marking. So case studies are good in some respects, but schools and colleges have got to realise that what works in one place might not work in another.

I always go back to the marking example. The inspector went into that school and saw a great school, which identified that one of the key things they’d done since the previous inspection was to change their marking system. It was about dialogue between teachers and pupils – and people writing in different coloured pens, literally.

They identified it as something that was really working for them. We sent an inspector along, who spent half a day there, discussed it, wrote it all up, published it… and then we know what happened.

The problem is of course that the school was probably managing other parts of the teachers’ workload in such a way as to carve out time, because in their context it worked: the teachers weren’t dying from exhaustion – they were doing great teaching and planning; they were doing everything really well and this was part of it.

But drop it on 2,000 to 3,000 schools that haven’t got those other things in place, and it’s a disaster. So we can’t just assume that we can take these things and plonk them in another place and they will work as well, or even that they won’t be detrimental.

CM: Does that mean you don’t use case studies anymore?

SH: We haven’t published one for a fair time now, and we took down the ones that we had on the website, because we wanted to sit back and think about and how they’re used whether it’s the right thing to do. We haven’t resolved that, as it’s a resource issue as well, to go and do the extra visits. So we haven’t done them for a while, and we haven’t published them for a while.

CM: So you haven’t yet come to a policy on it?

SH: It’s something we will consider as we do our new framework.

CM: As for last December’s ‘Bold beginnings’ report on the reception year, some of the complaints felt you’d taken too small a sample size, basically like an enlarged case study.

SH: I’m sure people told you about this, but the reality was, we had a data trawl that yielded about 150 schools – after a first trawl that yielded very small numbers, about six schools, so the criteria we were setting were really too stringent. We widened those criteria, including where those schools were in terms of IDACI [the income deprivation affecting children index], and we ended up with about 150.

Then we looked at the schools and said “well, that number have just had inspection in the last year; it’s unfair to go back to them. These ones, we know we’re going to in the next year or 18 months.

It’d be unfair to go to them”. This left us with about 50, and we dropped some of them for different reasons. That’s why we ended up with 41.

The key to this argument is the way they were identified. We hadn’t gone into these schools; we didn’t know what they were like. We just knew that they met our criteria. And the main criterion was: are they doing really well by the disadvantaged children?

CM: How did you measure that?

SH: It was over time, what had they done with their key stage 1 and key stage 2 results, their phonics screen check scores, and so on.

CM: So from entry to key stage 2 results.

SH: Not entry. We did it from KS1 results, progress from KS1 to 2, KS2 results, and phonics screen checks. Then we looked at the difference between disadvantaged children and their more advantaged peers, and we went to schools where there was very little difference, or the disadvantaged children did better in some places, or were on a par.

And so we said “let’s try and get underneath what these schools are doing that means that their disadvantaged children are basically doing as well. It matters not when you go to the school if they’re disadvantaged or not.”

CM: If you are looking at how children have performed throughout primary school, can you say that it’s good practice in the reception year that makes the difference?

SH: You’re right, there’s no absolute causal effect. There is correlation, clearly. But we said “okay, let’s go see what they’re doing. They seem to, over the time that the kids are at the school, be doing really well by the disadvantaged kids. Let’s have a look what they do.”

CM: In that first year?

SH: Yes. A lot of the pushback against ‘Bold beginnings’ was actually from people who don’t teach in reception, but who teach preschool, and we didn’t go into preschools.

CM: But even from reception experts, I hear that if you start kids on phonics and writing too early, research says that by 11, they’re actually disadvantaged.

SH: But it’s contested both ways. What we wanted to see is what these schools are doing from the start.

CM: But by making the recommendations, are you not attributing causality that doesn’t exist?

SH: Well, we’re saying that with a professional eye on experience, we’ve put early-years experts from our inspectorate into those schools, and they looked at certain things and reported back. We really need to look at the report recommendations, because they’re really not the recommendations that a lot of people say they are.

We were saying “there is something in this, this is what these places are doing”. And they were making sure that reading was at the heart of reception.

CM: But did you know other schools weren’t doing that?

SH: No, we don’t know. And actually, if a school says “yes, we do that”, well, what’s the problem?

CM: But what if all the schools that are getting bad results are doing that too? Therefore, there’s no correlation.

SH: That could be true.

CM: It could easily be true.

SH: Not easily. It could be true. It’s unlikely that all the bad schools are doing the same things at these places. But, I mean, there’s a chance.

CM: So, what were you doing? Pulling out all the threads that these schools all have in common?

SH: Yes, exactly. So putting reading at the heart of reception. Rhymes, reading to children really frequently, kids learning rhymes by heart, and getting phonics in.

CM: But maybe all those kids are all drinking a certain water in their local area, or they’re all having a banana for breakfast.

SH: Welcome to educational research!

CM: If you’re recommending something, you have a responsibility to make sure it’s a rigorous process.

SH: It was a rigorous process. Some people said that 41 isn’t a very large sample. You go and talk to these researchers in this building [the UCL Institute of Education]. Going to 41 schools, actually, for first-hand evidence, is a pretty large sample. Daniel Muijs, who’s with us at Ofsted now [as head of research], was professor of education at Southampton. He asked “what is the question here?”

It was 41 schools, first-hand experience. Doctorates are written on talking to five teachers.

CM: That’s true.

SH: So where does this come from? Piaget did his work with three children, and it influenced education for 100 years.

CM: People’s worry is that you find evidence that supports what you already believe. That’s the danger with case studies, isn’t it?

SH: That isn’t what we did, but that could be levelled at every single piece of research. I guarantee you can go and look at the research journals from this place, and their positionality statement will tell you exactly where this research is going. So I don’t think it’s any different.

CM: So in your mind, ‘Bold beginnings’ is different from a case study. That would be a one-school report.

SH: A one-off, exactly. And where I think we’ve made our mistake in the past is to say “look at what that single school is doing with this method, in that case of marking”. With hindsight, we should not have done it. But then, with hindsight, Paul Black and Dylan William might not have written Inside the Black Box if they knew that a government was going to pick it up and brand it as assessment for learning and stick it in national strategies.

When I spoke to Paul at the time, he said it was being implemented right. Well, 20 years on, has it been implemented right? Does Dylan think it’s been implemented right? I think he’s rowing back and saying “oh well, what we really meant was…”