Humans are not naturally good at understanding abstract statistics. As teachers, sure, we can easily understand the idea that 65 per cent of our students answered the hinge question correctly or that the average mark on an essay was a 72.4/100. But when we read reports on education interventions, it can be harder to get a grip on the meaning behind things such as correlation coefficients, p-values, and effect sizes.

Statisticians and researchers have tried to put qualitative labels on some of these quantitative measures. Take effect size, for example – a quantification of the extent to which an intervention had an impact. When Jacob Cohen proposed the Cohen’s d test for effect size in the mid-20th century, he proposed a set of labels to suggest small, medium, or large effects.

It’s a handy rule of thumb. If I were told that giving my students extra tutoring had “a medium effect” on increasing their scores, that would mean a lot more to me than “a Cohen’s d of 0.58”.

But is this helpful to educators? How could I decide whether an intervention is worth trying? Is something with a “small effect” worth the effort? Or is something with a “very large effect size” worth the time, energy and financial costs?

Matthew Kraft (2018) at Brown University has proposed five considerations to interpret effect sizes in education – a way to go beyond “medium” in favour of a more meaningful understanding. These questions are useful for examining any research, but are also a great way to unpack effect size. While he goes on to propose new bands of labels for Cohen’s d levels (interesting for those with a keen interest in statistics), the guidelines are useful just in themselves.

1. An effect size is not necessarily a causal effect
It’s easy to associate the word effect with thinking that something is causing another. But the mantra about correlation remains just as true: correlation does not mean causation. The effect sizes for correlational studies tend to be high, so don’t be lured by this alone.

2. How an outcome is measured can impact effect size
Short-term, quick interventions tend to have higher effect sizes; the same is true of outcomes that are measured immediately after an intervention. This phenomenon will seem familiar to any teacher who has seen their students try to cram before an assessment – you may get an immediate bump in a score, but the long-term learning may be lacking…

3. Choices a researcher makes can impact effect size
Even the most empirical studies share a very human, subjective element: the researchers themselves. Selecting who constitutes the sample group, its size, the treatment the participants get, can all affect the findings. Is the intervention aimed at one particular group of students? Could the intervention “bleed” from the treatment group into the control group?

4. Cost is key
Research sometimes comes with a rough estimate of the costs and potential returns. A small impact with a small cost (monetary as well as time and effort) can be very worthwhile.

5. Scalability is key
In education, smaller studies of heterogeneous groups are likely to find a larger effect size than when the intervention is applied at a larger scale. There are practical difficulties at reproducing many aspects of an intervention to a larger, different context.

It’s essential to consider the context and the meaning behind the numbers reported with research. It can be tempting to see “large effect size” and rush to attempt to implement it – or to shy away from costly interventions with a small effect. Instead, as educators, we need to look at the story behind the statistics into something we can better naturally understand.

Gail M. Sullivan, & Richard Feinn. (2012). Using Effect Size – or Why the P Value Is Not Enough. Journal of Graduate Medical Education, 4(3), 279-282. doi:10.4300/jgme-d-12-00156.1
Hill, C. J., Bloom, H. S., Black, A. R., & Lipsey, M. W. (2008). Empirical Benchmarks for Interpreting Effect Sizes in Research. Child Development Perspectives, 2(3), 172-177. doi:doi:10.1111/j.1750-8606.2008.00061.x

Kraft, M.A. (2018). Interpreting Effect Sizes of Education Interventions. Brown University Working Paper.