Sunday 29 April 2018

The school research lead - understanding p-values, statistical significance and avoiding misconceptions

A major challenge for aspiring  evidence-informed teachers is knowing when to trust the experts.  It would be easy to assume that just because you have come across a particular interpretation of a concept or idea in a number different places - book, peer-reviewed article or blog - that it is correct.  Unfortunately, if you did this, you could well be making a mistake.   For example, in recent weeks I have come across three examples – Churches and Dommett (2016), Firth (2018) and Ashman (2018) - where the meaning of p-values and statistical significance would appear to have been misinterpreted.  Furthermore, as Gorard et al (2017) states this mistakes are not uncommon. So to help aspiring  school research leads and evidence-informed teachers spot where p-values and statistical significance have been misinterpreted I will:
  • Explain what is meant by the terms  p-values and statistical significance
  • Identify a number of common misconceptions about p values and statistical
  • Show how the work of Churches and Dommett, Firth and  Ashman all fall foul of some of these misconceptions and misinterpretations
  • Examine some of the implications for evidence-informed teachers.
And to help me do this I’m going to draw upon the work of Greenland, Senn, et al. (2016), the American Statistical Association and Wasserstein and Lazar (2016) 

P values and statistical significance

When seeking to understand these terms there are  a number of major problems and as Greenland, et al. (2016) state: ‘There are no interpretations of these concepts, which are at once simple, intuitive, correct, and foolproof’ (p337). Greenland et al go onto illustrate their point by providing twenty-five examples of common misconceptions and interpretation of these terms, which even professional academics are prone.   Nevertheless, the American Statistical Association seek to informally  define a p-value as: the probability under a specified statistical model that a statistical summary of the data (e.g., the sample mean difference between two compared groups) would be equal to or more extreme than its observed value.

The smaller the p value, the more unlikely are our results if the null hypothesis (and test assumptions) hold true.   Whereas, the larger the p value, the less surprising are our results, given the null hypothesis and (test assumptions) hold true.  In other words, as Greenland et al state: ‘The P value simply indicates the degree to which the data conform to the pattern predicted by the test hypothesis and all the other assumptions used in the test (the underlying statistical model). Thus P =  0.01 would indicate that the data are not very close to what the statistical model (including the null hypothesis) predicted they should be, while P =  0.40 would indicate that the data are much closer to the model prediction, allowing for chance variation’. p340

Statistical Significance

Put very simply a result is often deemed to be statistically significant if the p value is less than or equal to 0.05, although the level of statistical significance can be set lower levels, for example, p is less than or equal 0.01  

Interpreting p values and statistical significance – guidance from the American Statistical Association

Given difficulties in interpreting p values and statistical significance the American Statistical Association  - Wasserstein and Lazar (2016) – have provided some guidance on how to avoid some common mistakes.  This guidance is summarised in six principles
  • P-values can indicate how incompatible the data are with a specified statistical model.
  • P-values do not measure the probability that the studied hypothesis is true, or the probability that the data were produced by random chance alone. 
  • Scientific conclusions and business or policy decisions should not be based only on whether a p-value passes a specific threshold. 
  • Proper inference requires full reporting and transparency
  • A p-value, or statistical significance, does not measure the size of an effect or the importance of a result.
  • By itself, a p-value does not provide a good measure of evidence regarding a model or hypothesis. 

Some common misinterpretations – Churches, Dommett, Firth and Ashman

I will now look at how Churches and Dommett, Firth. and Ashman have - in my view-  all misinterpreted either p-values or statistical significance.  

Richard Churches and Eleanor Dommett - In their book Teacher-Led Research : Designing and implementing randomised controlled trials and other forms of experimental research – t include the following definitions within their glossary of terms

p-value – Probability value – that is the probability that the result may have occurred by chance (e.g p = 0.001 – a 1 in 1000 probability that the result may have happened by chance) Also known as the significance level.

Significance – The probability that a change in score may have occurred by chance. A threshold for significance (alpha) is set at the start of piece of research.  This is never less stringent than 0.05 ….

Unfortunately,  according to the ASA  both of these statements are incorrect. First, the p-value is a measure of the consistency of the results with a particular statistical model – with all the assumptions behind the model being maintained.  Second,  the p-value is not the probability that the data were produced by random chance alone as it also depends on the accuracy of the assumptions underpinning the statistical model  Third, the definition of significance conflates scientific significance, with statistical significance.
Jonathan Firth - Firth, J. (2018). The Application of Spacing and Interleaving Approaches in the Classroom Impact. 1. 2.

In a recent edition of Impact, Jonathan Firth uses p-values and statistical significance to the application of spacing an interleaving in the classroom where an opportunity sample of 31 school pupils between 16 and 17 years of age was used.

The mean percentages of correct answers on the end-of-task test for the interleaved and blocked conditions are shown in Figure 4. A between-subjects ANOVA was carried out. This analysis revealed a significant main effect of spacing (performance in the spaced condition being worse than the massed condition, with mean scores of 12.25 vs 9.45, p = .002), while interleaving did not have a significant main effect.  Importantly, there was also a significant (p = .009) interaction between the two variables (spacing vs interleaving), indicating that interleaving had a mediating or protective effect against the difficulties caused by spacing (see Figure 5). 

The findings demonstrated that spacing had a harmful effect on the immediate test, while the main effect of interleaving was neutral. The results fit with the idea that these are ‘desirable difficulties’, with the potential to impede learning in the short term. .   

Again, according to the ASA there are errors in both paragraphs.  Statistical significance does not demonstrate whether an  a scientifically or substantively important/significant relation has been detected.  Neither is statistical significance a property of the phenomenon being studied, but is a product of the consistency between the data and what would have been expected using the specified statistical model.  In other words, the map is not the territory. 

Greg Ashman - Ashman (2018) The Article That England’s Chartered College Will Not Print. Filling the Pail.

In a blogpost which criticises the EEF’s approach to both meta-cognition and meta-analysis, Greg also falls foul of the problems of interpreting p-values and statistical significance

If we focus only on the randomised controlled trials conducted by the EEF, the case for meta-cognition and self-regulation seems weak at best. Of the seven studies, only two appear to have statistically significant results. In three of the other studies, the results are not significant and in two more, significance was not even calculated. This matters because a test of statistical significance tells us how likely we would be to collect this particular set of data if there really was no effect from the intervention. If results are not statistically significant then they could well have arisen by chance.

Again using the ASA’s guidance there are a number of errors in this statement.   First, statistical significance – or rather the lack of it – does not tell us whether there was no effect from the intervention. It just tells us the data was inconsistent with the statistical model.  Second, even if the results are or are not statistically significant it does not mean the results have arisen by chance. It is a statement about data in relation to a specified hypothetical explanation, and is not a statement about the explanation itself.  In other words, it is a statement about the results of the study relative to a particular statistical model.

Where does this leave us?

First, p-values, significance and statistical are slippery concepts, which take time and effort to even begin to understand never alone master.  Indeed, you may need to forget what you have already learnt at university on under-graduate or post-graduate courses.

Second, misuse of p-values and statistical significance is not uncommon, so is something you have to watch out for when reading quantitative research reports.  So keep the ASA principles hand to see if they are being misapplied in research reports.  You don’t have to understand something and how it works, (though it helps) to be able to spot it misuse.

Third, just because you can come across something in a variety of formats – book, peer-reviewed article or blog and from a variety of authors – researchers, researchers at university, or school teachers - does not mean it is correct.

Fourth, I am not making personal comments about the personal integrity of any of the authors I have criticised.  These comments should be seen as ‘business not personal’ and are a genuine attempt to increase the research literacy of teachers and school leaders.  Being an evidence-informed teacher or school leaders is hard enough when you are using the right,  never mind the wrong, tools.

And finally,  it’s worth remembering the words of Greenland, et al. (2016) who state: ‘In closing, we note that no statistical method is immune to misinterpretation and misuse, but prudent users of statistics will avoid approaches especially prone to serious abuse. In this regard, we join others in singling out the degradation of P values into ‘‘significant’’ and ‘‘nonsignificant’’ as an especially pernicious statistical practice.’ p348.


Ashman, G. (2018). The Article That England’s Chartered College Will Not Print. Filling the Pail. 21 April, 2018.
Churches, R. and Dommett, E. (2016). Teacher-Led Research: Designing and Implementing Randomised Controlled Trials and Other Forms of Experimental Research. London. Crown House Publishing.
Firth, J. (2018). The Application of Spacing and Interleaving Approaches Int He Classroom Impact. 1. 2.
Gorard, S., See, B. and Siddiqui, N. (2017). The Trials of Evidence-Based Education. London. Routledge
Greenland, S., Senn, S., Rothman, K., Carlin, J., Poole, C., Goodman, S. and Altman, D. (2016). Statistical Tests, P Values, Confidence Intervals, and Power: A Guide to Misinterpretations. European journal of epidemiology. 31. 4. 337-350.

Wasserstein, R. and Lazar, N. (2016). The Asa's Statement on P-Values: Context, Process, and Purpose, the American Statistician, 70:2, 129-133,. The American Statistician. 70. 2. 129-133.

Sunday 22 April 2018

Trust in Schools

Anyone with a passing acquaintance with Twitter and who is interested in education will know that the notion of 'trust' in schools - or lack of it - is constantly being commented upon  Governments should trust schools more to get on with the job of educating pupils. CEOS of multi-academy should trust their senior leaders of schools more to come up with local solutions for school problems .  Senior leaders should trust teachers  more to know what is best for their pupils and let them get on with the job of teaching in the classroom.  Teachers should trust senior leaders to know what is right for the school.  Parents should trust teachers to do their best for children.  Teachers should trust pupils to take responsibility for their own learning.  In other words,  trust is a good thing and there should be more of it.  However, high levels trust are not easy to create, develop and maintain, and can be very easily lost.  So in this post, I will use the work of Romero and Mitchell (2018) to explore:

  • The importance of trust in schools.
  • The nature of trust.
  • Implications for the leadership and management of schools in creating, maintaining and developing trust.

The importance of trust in schools

Romero and Mitchell provide a range of supporting evidence to support the folllowing claims

  • Trust is important in high functioning modern institutions
  • Trust is a defining characterstic of professional work
  • Trust between teachers and lead leader plays an importants role in attempts to collaborate, openness to new ideas, mentoring and professionalism
  • Student trust of teachers is associated, for example, with academic achievement and good behaviour
  • Trust is essential for effective partnerships between schools and parents.
  • Trust is important between the different levels of an educational organisation, system or institution 

However, whether these claims are fully warranted would depend upon a careful analysis of the supporting evidence for each claim: Wallace and Wray (2016).  Nevertheless,  for the purposes of this post, I am going to assume that each claim stands up to critical scrutiny.

The nature of trust

The components of trust is subject to some debate,  Adams and Miskell (2016) hypothesising that trust constists of five components - benevolence, competences, honesty, openness, and reliablity.  Bryk and Schneider (2002) argues that relational trusts consists of of four components - respect, personal regard, personal integrity and competence in core responsibiltiies.  However, Romero and Mitchell state that trusts has three key facets:

  • Benevolence - is the sense that the trusted party has the trustee's best interests at heart.
  • Competence - reflects the belief that the trustee has the needed skills and abilities
  • Integrity - reflects the belief that the trustee will behave fairly and ethically.

As such, Romero and Mitchell argue that trust is effectively a second-order factor, and is a function of the levels of all three facets, with each being present to varying degrees.  This has has a number of consequences for both attempts to measure trust i.e. the need to measure all three facets, but also how to develop, maintain or report trust in schools.   For example, there may low levels of trust in a school, even if individuals act with high levels of benevolence and integrity, but with low levels of competence.  In other words, trust requires the presence of high levels of benevolence, competence and integrity.

What are the implications for trust in schools.

It seems to me that this analysis has a number of implications for schools and school leaders.

  1. Given the interrelationship between trust and each of benevolence, competence and integrity,  maybe low levels of trust within schools is likely to be the norm. This does not mean low levels of trust should deemed acceptable, instead it should be seen as a recognition of the challenge of creating high trust environments. 
  2. If school leaders wish to develop levels of trust within a school, it will involve spinning 'multiple plates' - just being deemed to be a good person or good at your job will not be enough to generate trust.
  3. The actions necessary to develop trust in schools - will depend very much on the situation in each school.  If there are perceived low levels benevolence, competence and integrity, this will require sustained action across all three factors.  Whereas, if there are concerns about leader competence - it may require a school leader to focus on doing the basics of school leadership - managing pupil behaviour, recruiting staff on time, and keeping the books balanced.
  4. If you accept the notion that of what its meant to be competent changes over time, with increasing levels of performance being required to be competent, then schools then schools have no choice but to constantly investing in the professional learning and development of ALL staff.
  5. At whatever level of the school system you operate at - be it a CEO of Mat, school leader, head of department, teacher or teacher assistance - do not take trust for granted, as it can so easily slip through your fingers and disappear
  6. Probably the simplest thing to do when trying to develop a high trust environment is adopt Bob Sutton's No Asshole Rule, Sutton (2007)


Adams, C. M. and Miskell, R. C. (2016). Teacher Trust in District Administration:A Promising Line of Inquiry. Educational Administration Quarterly. 52. 4. 675-706.
Bryk, A. and Schneider, B. (2002). Trust in Schools: A Core Resource for Improvement. New York. Russell Sage Foundation.
Sutton, R. I. (2007). The No Asshole Rule: Building a Civilized Workplace and Surviving One That Isn't. London. Hachette UK.
Wallace, M. and Wray, A. (2016). Critical Reading and Writing for Postgraduates (Third Edition). London. Sage.

Friday 13 April 2018

School Leadership and Civility

In a recent post I argued that both procedural and interactional justice within schools are  essential components of promoting teacher and organisational well-being.  Indeed, thanks to a retweet by Jill Berry @jillberry102 this post generated some traffic on Twitter, the vast majority of which was supportive.  However, the post was interpreted by some as 'SLT bashing,' which was never the intent.  Ironically, the post was designed to be supportive of SLTs by identifying evidence-based strategies which could be adopted and which may reduce both staff turnover and teachers leaving the profession.

In this post I'm going to continue to look at strategies which can support interactional justice within schools.  In doing so, I'm going to look at the work of Christine Porath (Porath, 2018) on how promoting civility can have an important role in developing interactional justice.   Porath argues that if you want colleagues to be 'civil' to one another it is important that leaders engage in conversation with team members to establish precisely what civility means.  By doing this, Porath argues that it then becomes much easier to generate support for 'civility' as a way of doing things, and at the same time empowers colleagues to hold each other to account.

Porath then goes onto describe a law firm's (Bryan Cave)  code of civility which has 10 elements.

Bryan Cave's Code of Civility

1 We greet and acknowledge each other.

2 We say please and thank you.

3 We treat each other equally and with respect, no matter the conditions.

4 We acknowledge the impact of our behaviour on others.

5 We welcome feedback from each other.

6 We are approachable.

7 We are direct, sensitive, and honest.

8 We acknowledge the contributions of others.

9 We respect each other's time commitments.

10 We address incivility.

However Porath then argues that it is not enough to define cultural norms of civility, they need to receive specific training which examines

" What civility looks like
" Situations where colleagues may act with a lack of civility
" Techniques to maintain civility when under pressure
" Opportunities to practise being civil

So what are the implications for school leaders?

If you accept the notion that how you behave has an impact on others, and that  school leader civility may be an important part of a school's strategy for retaining staff, the following may be worth considering.

1. Keep a  daily civility diary and record where you have may behaved in way which lacked civility - and reflect on what might have triggered that behaviour.
2. Ask a colleague to observe how you behave in meetings and other settings - and whether they can identify occasions where you have acted in a manner - which could be described as disrespectful to others.
3. See if you can spot when colleagues have acted with a lack of civility towards one another and ask the following:
a. Did you intervene?
b. Is this behaviour new ?
c. What are you going to do about it?

And finally 

Am I holding myself up as a paragon of virtue when it comes to civility, absolutely not.   What I do know is that as a senior leader I could have done a better job at being civil and I should have been more proactive when  colleagues displayed less than 'civil' behaviour towards colleagues.  In future posts I will begin to explore the role of trust within schools.


PORATH, C. 2018. Make Civility the Norm on Your Team Harvard Business Review. Cambridge, : Harvard Business Review.

Saturday 7 April 2018

Research shows that human resources management practices in schools matter.

Friday 6 April, 2018 saw the TES publish an article with the headline:  Performance-related pay is 'ineffective in schools', and which is based on a study by Bryson, Stokes, et al. (2018).   What’s particularly disappointing about the article – apart from the reference cited in the article being a blog post based on the research rather than the research itself – is that the research shows that human management practices in schools do matter.  Whereas the focus of the TES article is on what we already have a pretty good idea does not work i.e. performance related pay. So given the increasing concerns about a shortage of teachers in the future  and the impact of poor leadership and management on teacher retention  it would be far more helpful if the article focused on ‘what works’ rather than recycling a tired old headline.

Can HRM improve schools performance?

Bryson, et al. (2018) compared schools to observationally equivalent workplaces in the rest of the British economy using measures of workplace performance that are common across all workplaces and found that intensive use of HRM practices is correlated with substantial improvement in workplace performance, both among schools and other workplaces. Yet, the types of practices that improve school performance are different from those that improve performance in other workplaces. Moreover, there would appear to linear returns to HRM intensity in most workplaces, whereas in schools they are an increasing function of the intensity of HRM use.

In their blog post summarising the research Bryson, Stokes, et al. (2018b) state:

Schools benefit from increased use of rigorous hiring practices when selecting new recruits, employee participation mechanisms (such as team briefings), total quality management (TQM) and careful record-keeping, none of which seem to improve workplace performance elsewhere in the economy. By contrast, increased use of performance-related pay and performance monitoring, which do improve workplace performance elsewhere in the economy, are ineffective in schools. The only HRM practice that benefits both schools and other workplaces is more intensive provision of training.

Nevertheless, it is important to note that all research has limitations.  In this instance, although the research headlines are accessible to all, it would requires a post-graduate level of statistics to appraise the accuracy of the research.  Second, the research drew upon work-place survey data from 2004 and 2011, which raises questions about whether HRM practices have changed since the data was collected Third, the research is not actionable as there is insufficient detail about how the various human resources techniques were carried out, and as everyone who is involved in evidence-based education knows – it ain’t what you do, but the way that you do it.

What are the implications for senior school leaders?

  • Always, always read the original research cited in newspaper articles.  The odds are that the research will take both a difference stance to that reported and will also have a number of unreported limitations.
  • Do not confuse correlation with cause and effect – extensive use of HRM practices appears to be correlated with improvement in work-place performance.  That does not mean that extensive use of HRM has caused that improvement, as other factors may well be at work.
  • Evidence-based practice within schools should not be limited to matters relating to teaching and learning.  Evidence-based practice is also applicable to all aspects of the work of the school including human-resource management, operations and finances
  • If you are thinking about introducing performance related pay for teachers within your school – think again – the evidence suggests that it is not appropriate for complex tasks such as teaching.


Bryson, A., Stokes, L. and Wilkinson, D. (2018a). Can Hrm Improve Schools’ Performance? Bonn. Institute of Labor Economics.
Bryson, A., Stokes, L. and Wilkinson, D. (2018b). Which Modern Management Techniques Work Best for Schools? IOE London Blog.