Friday 27 January 2017

The school research lead and another nail in the coffin of Hattie's Visible Learning

In a recently published paper, (Simpson, 2017) argues that the rankings of educational interventions through combining effect sizes from meta-analyses and meta-meta-analyses is fundamentally flawed. Assumptions about statistical summaries of effect size providing an estimate of the impact of an educational intervention are shown to be false.  Furthermore, the use of effect size is open to researcher manipulation.  As such, league tables of the effectiveness of interventions (Hattie, 2008) are potentially hierarchies of the openness to the manipulation of research design.  Consequently, league tables of the effectiveness of educational interventions provide little guidance for educators at national, school or classroom level.  

The rest of this post will consist of the following:
  •   A brief introduction to effect sizes
  •  An attempt to summarise briefly summarise (Simpson, 2017)
  •  Other considerations vis a vis meta-analyses and meta-meta-analyses (Wiliam, 2016)
  •  Consider the implications for school leaders and teacher of the use of hierarchies of the effect  size of interventions,(Hattie, 2008).
Effect sizes: A brief introduction

Put quite simply, an effect size is a way of estimating the strength /magnitude of a phenomenon.  So an effect size can be result of an intervention identified through the comparison of two groups – one group who received the intervention and another group, the control group who did not receive the intervention.   Alternatively, it can be used to describe to measure the strength of the relationship between variables.   However, for our purposes, we will focus on the use of effect sizes when comparing the differences between two groups and is estimated by using the following calculation.

Effect Size = (Mean of experimental group) - (Mean of the control group)
                                                Standard Deviation

Assumptions underpinning meta-analysis and meta-meta-analysis

(Simpson, 2017) argues there are two key assumptions associated with meta-analysis and meta-meta-analysis.  First, the larger the effect size is associated with greater educational significance.  Second, two or more different studies on the same interventions can have their effect sizes combined to give a meaningful estimate of the intervention’s educational importance.   However, Simpson identifies three reasons – different comparator groups, range restrictions, and measure design – as to why these assumptions do not hold.

Why these assumptions do not hold.

Unequal Comparator Groups

Say we are looking at combining the effect sizes of a couple of studies on the impact of written feedback.   In one study the results of a group of pupils who receive written feedback is compared with the results of pupils who receive verbal feedback.  Let’s say that give us an effect size of 0.6.  In a second study, the results pupils who receive written feedback are compared with pupils who receive only group feedback and has an effect size of 0.4.  Now we may be tempted to add the two effect sizes together to find out the average effect size of written feedback, in this case 0.5.  However, that would not allow us to make an accurate estimate as to the effect size of providing written feedback.  This would require a study where the results of written feedback is compared to pupils who receive no feedback whatsoever.  As such, it is simply not possible to accurately combine studies which have used different types of comparator groups.

Range restriction

This time we are going to undertake the same two interventions but in this example we are going to restrict the the range of pupils used in the studies.    In the first study, only highly attaining pupils are included in the study.  Whereas in the second study, pupils involved in the intervention are drawn from the whole ability range.  As a result, and for at least two reasons, this may lead to a change in the effect size of receiving written feedback.   First, it will take out from study pupils who may not know how to respond to the feedback.  Second, it may well be that highly attaining pupils have less ‘head-room’ to demonstrate the impact of either type of feedback.  As a result, the effect size is highly likely to change.  The consequence of this is the different ranges of pupils used in interventions will influence the impact of an intervention and influence the effect size.  As such, unless the interventions combine studies which use the same range of pupils, the combined effect size in unlikely to be an accurate estimate of the ‘true’ effect size of the intervention.

Measure design

Finally, we are going to look at the impact of measure design on effect sizes. (Simpson, 2017) argues that researchers can directly influence effect size by choices they make about how they seek to measure the effect.  First, if researchers design an intervention and the measure used is specifically focussed on measuring the effect of that intervention this will lead to an increase in effect size.  For example, you could be undertaking an intervention looking to improve algebra scores.  Now you could choose to use a measure which is specifically designed to ‘measure’ algebra or you could choose to use a measure of general mathematical competence, which includes an element of algebra.  In this situation, the effect size of the former will be greater than the latter, due to the precisions of the measure used.  Second, the researcher could increase the number of test items. Simpson states that a relatively well designed test that having two questions instead of one increases the effect size by 20% and if we can twenty questions, this can lead to a doubling of the effect size.   Simulations suggest that if you increase the number of questions used to measure the effectiveness of an intervention, this may lead to effect size inflation of 400%.

Other considerations

It is important to note that there are considerations as regard the limitation effect sizes and meta-analysis.  (Wiliam, 2016)  identifies four other limitations of effect sizes.  First, the intensity and duration of the intervention will have an impact on the resulting effect size. Second, there is the file drawer problem, we don’t know how many similar interventions have been carried out, which did not generate statistically significant results, and as a result have not been published.  (Polanin et al., 2016) found when reviewing 383 meta-analysis published research yielded larger effect results than those from unpublished studies, and provides evidence to support the notion of publication bias, i.e.  a phenomenon where studies with large and/or statistically significant effects, relative to studies with small or null effects, have a greater chance of being published.  Third, there is the age dependence of effect size.    All other things being equal, the older the pupils the smaller the effect size, which is result of a greater diversity in population of older pupils compared to younger pupils.  Finally, Wiliam raises the issue of the generalisability of the studies.  One of the problems of trying to calculate the overall effect size of an intervention, is that much of the published research is undertaken by psychology professors in laboratories on their own under-graduate students.  As such, these students will have little in common with say Key Stage 2 or Key Stage 3 pupils, and will have a substantial impact on the generalisability of the findings.

So what are the implications for teachers and school leaders who wish to use Hattie’s hierarchy of the educational significance of interventions?

For a start, as (Simpson, 2017) argues league table of effect sizes may reflect openness to the manipulation of outcomes through research design.   In other words, Hattie’s hierarchy may not reflect the educational significance of interventions but rather the sensitivity of the intervention to measurement.   As such, if teachers or school leaders use Hattie’s league table of intervention effectiveness to choose what interventions to priorities, they are probably looking at the wrong hierarchy.

Second, if teachers and school leaders wish to use effect sizes generated by research to help prioritise interventions, then it is necessary to look at the original research.  And when aggregating studies, make sure you are looking at studies which use the same type of comparator groups, range of pupils, and measurement design.

Third, it requires teachers and school leaders to commit on-going professional development and engagement with research with research output.  With that in mind the recent announcement by the Chartered College of Teachers that members will be able access research which is currently behind paywalls, could not be more timely.

*In this section I’m pushing the both boundaries of my understanding of the impact measure design on effect and my ability to communicate the core message.  I hope I have made my explanations as simple as possible, but not simpler. 


HATTIE, J. 2008. Visible learning: A synthesis of over 800 meta-analyses relating to achievement, Routledge.
POLANIN, J. R., TANNER-SMITH, E. E. & HENNESSY, E. A. 2016. Estimating the difference between published and unpublished effect sizes a meta-review. Review of Educational Research, 86, 207-236.
SIMPSON, A. 2017. The misdirection of public policy : Comparing and combining standardised effect sizes. Journal of Education Policy.WILIAM, D. 2016. Leadership for teacher learning, West Palm Beach, Learning Sciences International.

Saturday 21 January 2017

The school research lead and the test of a first-rate intelligence

One of the challenges facing an evidence-based teacher or school leader is the need to keep two potentially conflicting ideas in some form of constructive tension. First, teaching involves increasingly complex work that is highly cognitive and intellectual, where evidence provides a source for improving student learning through enhanced teacher learning about effects of their teaching; strengths and needs of their students; and alternative strategies that have externally validated record of success.  On the other hand, teachers’ understandings of their problems run deeper than those offered by theorists, with teacher being able to provide common-sense insight into their problems of practice.  Evidence provides a legitimate but imperfect basis for professional judgement and knowledge.  Practical experience is as important as research-driven knowledge. Validity of teacher knowledge depends upon the conditions in which it is produced as well as the processes by which it is validated.  Teachers need to become adaptive experts who actively seek to check existing practises and have a disposition towards career-long professional experiential learning.  (Hargreaves and Stone-Johnson, 2009)

So given this tension tension between theory and experience, how does the evidence-based teacher and leader go about managing it.  One way forward could be provided by (Martin, 2009) who in part influenced by the F Scott Fitzgerald quote at the top of this blog has developed the notion of integrative thinking and which is defined as:

The ability to face constructively the tension of opposing ideas and, instead of choosing one at the expense of the other, generate a creative resolution of the tension in the form of a new ideas that contains elements of the opposing ideas but is superior to each (p15)

Martin notes the work of the 19th century scholar Thomas C Chamberlin – who argues that when seeking to explain phenomena it is necessary to have in place a number of potentially conflicting hypotheses

The use of the method leads to certain peculiar habits of mind which deserve passing notice, since as a factors of education its disciplinary value is one of importance.  When faithfully pursued for a period of years, it develops a habit of thought analogous to the method itself, which may be designated a habit or parallel of complex thoughts.  Instead of a simple succession of thoughts in linear order, the procedure is complex, and the mind appears to be be possessed of the power of simultaneous visions from different standpoints.  Phenomena appear to become capable of being viewed analytically and synthetically at once. (Martin, 2009 p22-230

Martin goes onto raise the question as to whether integrative are born, not made, and subsequently raises the question as to whether the skill of integrative thinking can actually be taught.   That said, Martin is of the view that integrative thinking is untaught and that it is mainly a tacit skill which resides in the heads of individuals who have somehow developed an opposable mind.   However, Martin is of the view that the thinking processes of those individuals who undertake integrative thinking can be captured, described and analysed by others, leading us to be able to teach integrative thinking to others.

So what does this mean for you as an evidence-based school teacher and school leader. Well to me it seems that three implications immediately spring to mind.
  1. Integrative thinking is a necessary requirement for the evidence-based educator, balancing multiple sources of evidence – experience matters but so does research.
  2. Creative answers to pressing problems of practice are unlikely to be found from just one source of evidence  - be it research or expert knowledge of the school. New and creative solutions are likely to be found by the melding together of research, school date, stakeholder views and practitioner expertise.
  3. Developing your skills as an integrative thinker requires support, to help articulate your tacit thinking and make it explicit and to help you understand your thinking process
And finally

In future posts we will look at both a model thinking processes of integrative thinkers and a framework for building integrative thinking capacity, and how they might benefit the development of evidence-based teachers and school leaders.


HARGREAVES, A. & STONE-JOHNSON, C. 2009. Evidence-informed change and the practice of teaching. The role of research in educational improvement, 89-110.

MARTIN, R. L. 2009. The opposable mind: Winning through integrative thinking, Harvard Business Press.

Friday 13 January 2017

The school research lead - do we need more foxes and fewer hedgehogs?

Does your school have too many 'hedgehogs' and not enough 'foxes' ?  Is your school's teaching staff full of individuals - hedgehogs - who are all committed to their own one big idea, and will stick with it even when it's been shown to fail.  Or is your school full of foxes? Individuals who know lots about everything, though not everything about anything.  Individuals who are pragmatic, prone to self-doubt and who are willing to admit when they get things wrong.  In this post, we will look at the role of specialists and expertise and how it can often get in the way of the evidence-based school.  I will then make some suggestions as to how not to become imprisoned by your own expertise.

One of the problems with past-experience and expertise is that it can get in the way of an individual assessing their data objectively. (Tetlock and Gardner, 2016) describing Tetlock’s book  Expert Political Judgement: How Good Is It and How Can We Know? found that being a subject expert more often than not, got in the way of making an accurate forecast or prediction.  These experts – classified as hedgehogs who know one big thing.  Furthermore, ‘hedgehogs’ were totally committed to their conclusions.  This resulted in ‘hedghogs’ being extremely reluctant to alter their opinions even if their forecasts had gone ‘horribly wrong’.  And, for want of a better phrase ‘hedgehogs’s predictions were not as accurate as random guesses, which could have been produced the so-called dart-throwing chimp

On the the other hand, there were another group of experts called ‘foxes’ who were more accurate in their predictions – though they only just beat the so-called dart-throwing chimpanzee.    Now ‘foxes’ know many things, they don’t just know one thing.   They sought out information and evidence from many different sources and used a number of different techniques to analyse the data.  ‘Foxes’ tended to be much less confident about their predictions and forecasts, and were willing to change their minds and admit when they had made mistakes and were wrong.

So how come specialists's forecasts were less accurate than the generalists'?  Tetlock and Gardner (2016) argue that the hedgehog has one big idea,  or a ‘set of spectacles’ which dominates how they see situations and informs their subsequent forecasts and predictions.  Unfortunately, these spectacles are tinged with a particular colour, which distorts the ‘hedghogs’ predictions and forecasts.  This leads to hedgehogs trying to squeeze what they see into a narrow frame of reference, even though it may not fit.  However, because they are wearing ‘glasses’ the hedgehogs believe that they are seeing things with more accuracy than others, so this increases their confidence and belief in what they are seeing.  

So what can you do to make yourself more fox-like and less like a hedgehog.  Tetlock and Gardner (2016) suggest a range of strategies, which I have adapted for the use of evidence-based school leaders :
  • Strike the right balance between over and under-reacting to evidence - in other words don't have over-react when new evidence is made available, it may be random noise, on the other hand don't ignore it.
  • Get the views of outsiders - just because you know your school extremely well does not mean that others from outside of the school cannot provide an insight into the workings of your school.  They may spot something you have missed or have taken for granted
  • Break problems into their components parts - some of which you'll know more about than others.  Recognise that although you may be an expert in one area relevant to the problem, you may not be an expert in everything
  • Look for clashing causal forces - things that are pushing and pulling in opposite directions.  There may well be factors which can have a positive impact on staff engagement - be how individual staff are treated - on the other hand, external factors, such as poorly planned external curriculum change may have a detrimental impact
  • Strike the right balance between under and over-confidence - bottom line your decisions involve making judgements, you may be right, you many be wrong.  All you can do is ensure that whatever decision you make is made with positive intent
  • Allow yourself some degrees of doubt, but not too much so that you become paralysed with indecision 
  • Look for errors in your mistakes - but avoid fundamental attribution error - when things go wrong it's not always about what others have done or not done, or circumstances beyond your control - sometimes you just got it wrong through thinking which was prone to biases
  • Bring out the best in others and let others bring out the best in your - it's not all you, it's about us - and creating an environment for making decisions which brings out the best in you and the best in your colleagues
And finally, if you want to be a better evidence-based school leader, give it a go, you'll make mistakes but you will get better at it


TETLOCK, P. & GARDNER, D. 2016. Superforecasting: The art and science of prediction, Random House.