In a recently published paper, (Simpson, 2017) argues that the rankings of educational interventions through combining effect sizes from meta-analyses and meta-meta-analyses is fundamentally
flawed. Assumptions about statistical summaries of effect size providing an
estimate of the impact of an educational intervention are shown to be
false. Furthermore, the use of effect
size is open to researcher manipulation. As such, league tables of the effectiveness of interventions (Hattie, 2008) are potentially hierarchies of the openness to the manipulation of
research design. Consequently, league tables
of the effectiveness of educational interventions provide little guidance for
educators at national, school or classroom level.
The rest of this post will consist of the
following:
- A brief introduction to effect sizes
- An attempt to summarise briefly summarise (Simpson, 2017)
- Other considerations vis a vis meta-analyses and meta-meta-analyses (Wiliam, 2016)
- Consider the implications for school leaders and teacher of the use of hierarchies of the effect size of interventions,(Hattie, 2008).
Effect
sizes: A brief introduction
Put quite simply, an
effect size is a way of estimating the strength /magnitude of a
phenomenon. So an effect size can be
result of an intervention identified through the comparison of two groups – one
group who received the intervention and another group, the control group who
did not receive the intervention.
Alternatively, it can be used to describe to measure the strength of the
relationship between variables.
However, for our purposes, we will focus on the use of effect sizes when
comparing the differences between two groups and is estimated by using the
following calculation.
Effect Size = (Mean
of experimental group) - (Mean of the control group)
Standard
Deviation
Assumptions
underpinning meta-analysis and meta-meta-analysis
(Simpson, 2017) argues there are two key assumptions associated with meta-analysis
and meta-meta-analysis. First, the
larger the effect size is associated with greater educational
significance. Second, two or more
different studies on the same interventions can have their effect sizes
combined to give a meaningful estimate of the intervention’s educational
importance. However, Simpson identifies
three reasons – different comparator groups, range restrictions, and measure
design – as to why these assumptions do not hold.
Why
these assumptions do not hold.
Unequal
Comparator Groups
Say we are looking at combining the effect
sizes of a couple of studies on the impact of written feedback. In one study the results of a group of
pupils who receive written feedback is compared with the results of pupils who
receive verbal feedback. Let’s say that
give us an effect size of 0.6. In a
second study, the results pupils who receive written feedback are compared with
pupils who receive only group feedback and has an effect size of 0.4. Now we may be tempted to add the two effect
sizes together to find out the average effect size of written feedback, in this
case 0.5. However, that would not allow
us to make an accurate estimate as to the effect size of providing written
feedback. This would require a study
where the results of written feedback is compared to pupils who receive no
feedback whatsoever. As such, it is
simply not possible to accurately combine studies which have used different
types of comparator groups.
Range
restriction
This time we are going to undertake the
same two interventions but in this example we are going to restrict the the
range of pupils used in the studies.
In the first study, only highly attaining pupils are included in the
study. Whereas in the second study,
pupils involved in the intervention are drawn from the whole ability
range. As a result, and for at least two
reasons, this may lead to a change in the effect size of receiving written
feedback. First, it will take out from
study pupils who may not know how to respond to the feedback. Second, it may well be that highly attaining
pupils have less ‘head-room’ to demonstrate the impact of either type of
feedback. As a result, the effect size
is highly likely to change. The
consequence of this is the different ranges of pupils used in interventions
will influence the impact of an intervention and influence the effect
size. As such, unless the interventions
combine studies which use the same range of pupils, the combined effect size in
unlikely to be an accurate estimate of the ‘true’ effect size of the
intervention.
Measure
design
Finally, we are going to look at the impact
of measure design on effect sizes. (Simpson, 2017) argues that researchers can directly influence effect size by
choices they make about how they seek to measure the effect. First, if researchers design an intervention
and the measure used is specifically focussed on measuring the effect of that
intervention this will lead to an increase in effect size. For example, you could be undertaking an
intervention looking to improve algebra scores.
Now you could choose to use a measure which is specifically designed to
‘measure’ algebra or you could choose to use a measure of general mathematical
competence, which includes an element of algebra. In this situation, the effect size of the
former will be greater than the latter, due to the precisions of the measure
used. Second, the researcher could
increase the number of test items. Simpson states that a relatively well
designed test that having two questions instead of one increases the effect
size by 20% and if we can twenty questions, this can lead to a doubling of the
effect size. Simulations suggest that
if you increase the number of questions used to measure the effectiveness of an
intervention, this may lead to effect size inflation of 400%.
Other
considerations
It is important to note that there are
considerations as regard the limitation effect sizes and meta-analysis. (Wiliam, 2016) identifies four other
limitations of effect sizes. First, the
intensity and duration of the intervention will have an impact on the resulting
effect size. Second, there is the file drawer problem, we don’t know how many
similar interventions have been carried out, which did not generate
statistically significant results, and as a result have not been
published. (Polanin et al., 2016) found when reviewing 383 meta-analysis published research yielded
larger effect results than those from unpublished studies, and provides
evidence to support the notion of publication bias, i.e. a phenomenon where studies with large and/or
statistically significant effects, relative to studies with small or null
effects, have a greater chance of being published. Third, there is the age dependence of effect
size. All other things being equal, the older the
pupils the smaller the effect size, which is result of a greater diversity in
population of older pupils compared to younger pupils. Finally, Wiliam raises the issue of the
generalisability of the studies. One of
the problems of trying to calculate the overall effect size of an intervention,
is that much of the published research is undertaken by psychology professors
in laboratories on their own under-graduate students. As such, these students will have little in
common with say Key Stage 2 or Key Stage 3 pupils, and will have a substantial
impact on the generalisability of the findings.
So
what are the implications for teachers and school leaders who wish to use
Hattie’s hierarchy of the educational significance of interventions?
For a start, as (Simpson, 2017) argues league table of effect sizes may reflect openness to the
manipulation of outcomes through research design. In other words, Hattie’s hierarchy may not
reflect the educational significance of interventions but rather the
sensitivity of the intervention to measurement. As such, if teachers or school leaders use
Hattie’s league table of intervention effectiveness to choose what
interventions to priorities, they are probably looking at the wrong hierarchy.
Second, if teachers and school leaders wish
to use effect sizes generated by research to help prioritise interventions,
then it is necessary to look at the original research. And when aggregating studies, make sure you
are looking at studies which use the same type of comparator groups, range of
pupils, and measurement design.
Third, it requires teachers and school
leaders to commit on-going professional development and engagement with
research with research output. With that
in mind the recent announcement by the Chartered College of Teachers that
members will be able access research which is currently behind paywalls, could
not be more timely.
*In this section I’m pushing the both
boundaries of my understanding of the impact measure design on effect and my
ability to communicate the core message.
I hope I have made my explanations as simple as possible, but not
simpler.
Reference
HATTIE, J.
2008. Visible learning: A synthesis of
over 800 meta-analyses relating to achievement, Routledge.
POLANIN, J. R., TANNER-SMITH, E. E. & HENNESSY, E.
A. 2016. Estimating the difference between published and unpublished effect
sizes a meta-review. Review of
Educational Research, 86, 207-236.
SIMPSON, A. 2017. The misdirection of public policy :
Comparing and combining standardised effect sizes. Journal of Education Policy.WILIAM, D. 2016. Leadership
for teacher learning, West Palm Beach, Learning Sciences International.
Thanks for drawing attention to this critique of Hattie's effect sizes. There is an excellent article "Invisible Learnings? A commentary on John Hattie's book...by researchers at Massey University in NZ. You can download it here http://www.scoop.it/doc/download/6UarJDb6k6gzp_rZxY6UJGQ
ReplyDeleteThank you for the article. I did look at the paper and I wondered why you say this is a nail in the coffin for Hattie, when the article only mentions Hattie briefly. It is most about the EEF toolkit. Why not a nail in the coffin of the EEF toolkit?
ReplyDeleteWhat are the William 2016 reference details?
ReplyDeleteThanks Gary for an excellent article, I have started doing what you suggest and am analysing the research used by Hattie in detail here and am very surprised at what i've found so far-http://visablelearning.blogspot.com.au/
ReplyDeleteI'm looking for others to help too.