Tuesday 28 October 2014

Teaching, learning and assessment and the grade for overall effectiveness - a case of mistaken identity

This post highlights the flaws of using the aspect grade for teaching, learning and assessment as a limiting factor for the grade awarded for overall effectiveness, and with it the implications for the reliability and validity of inspection outcomes for general further education colleges.
      The OfSTED Handbook for the Inspection of Further Education and Skills http://www.ofsted.gov.uk/resources/handbook-for-inspection-of-further-education-and-skills-september-2012  makes it clear that the aspect grade for  quality of teaching, learning and assessment is a limiting  factor for  the overall grade for effectiveness, for example, to be awarded an outstanding grade for overall effectiveness, the grade for the quality of teaching, learning and assessment must also be outstanding.
     Using a similar approach as undertaken by Waldegrave and Simons (2014) to analyse the relationship between grades awarded for school inspections, the following table summarises the relationship between the different inspection grades awarded during 125 general further education college inspections and which took place between January 2013 and June 2014.
Aspect grade
Overall grade for effectiveness
Agreement with the grade for outcomes for learners
Agreement with the grade for the quality of teaching, learning and assessment
Agreement with grade for the effectiveness of  leadership and management

*only 4 colleges were awarded a grade 4, which has an impact on the associated correlations.

     It can be clearly seen from the data that the teaching, learning and assessment aspect grade corresponds most strongly with the overall grade for effectiveness, which is not surprising given the guidance in the handbook.   Out of 125 GFE college inspections undertaken in the specified 18 month period, there was only 1 occasion when there was a difference between the two grades, and on this occasion the overall grade for effectiveness was lower than the grade for teaching, learning and assessment. 
     However, the direct relationship between the grade for overall effectiveness and the quality of teaching, learning and assessment is not without its’ problems.  In the further education sector, unlike schools, individual lesson grades are still being used by OfSTED inspectors to summarise judgments about the quality of teaching, learning and assessment within a lesson.    Both Matt O’ Leary and Rob Coe identify the serious challenges with the use of observations in the grading of teaching and learning.   Waldegrave and Simons (2014) cite Coe’s synthesis of a number of research studies, which raises serious questions about the validity and reliability of lesson observation grades.  When considering value added progress made by students and a lesson observation grade (validity) , Coe states that in the best case there will be only 49% agreement between the two grades and in the worst case there will be only 37% agreement.  As for the reliability of grades Coe’s synthesis suggests that in the best case there will be 61% agreement and in the worst care only 45% agreement between two observers.
     As such, it would seem that using the teaching, learning and assessment grade as the driver for the grade for overall effectiveness is not consistent with the current best available evidence, and indicates that systems of accountability within the further educations sector have yet to be fully informed by the evidence-based practice movement.  Accordingly, in order to bring about more effective and more meaningful accountability processes the following are worthy of consideration.
  • the direct relationship between the teaching, learning and assessment aspect grade and the overall grade for effectiveness should be replaced by a holistic judgment.
  • the design of 'inspection regimes' to be subject to open and transparent college effectiveness/college improvement 'state of the art' reviews to ensure processes are likely to generate valid and reliable judgments.
  • as part of the process of self-assessment colleges reduce, if not eradicate,  their over-reliance on lesson observation grade profiles in making judgments about teaching, learning and assessment.

Coe, R, Lesson Observation: It’s harder than you think, TeachFirst TDT Meeting, 13th January 2014
O’Leary M (Forthcoming in 2015) ‘Measurement as an obstacle to improvement: moving beyond the limitations of graded lesson observations’, chapter in Gregson, M. & Hillier, Y. (eds) Reflective Teaching in Further, Adult and Vocational Education, London: Bloomsbury.
O’Leary, M. (2014). Lesson observation in England’s Further Education colleges: why isn’t it working and what needs to change? Paper presented at the Research in Post-Compulsory Education Inaugural International Conference: 11th –13th July 2014, Harris Manchester College, Oxford.
Waldegrave, H., and Simons, J. (2014). Watching the Watchmen: The future of school inspections in England, Policy Exchange, London

Tuesday 21 October 2014

The Skills Commissioner and learning from others - it's not as easy as you think

Last Friday saw the publication of the FE Skills Commissioner's latest letter to the sector.  In this letter Dr David Collins addresses the issue of quality improvement and the variations in quality of leadership and management seen across the sector and provides some quite straightforward advice.   

If a college is having a problem my advice is simple - find someone who is performing well in that area and learn from them. 

However, things are not quite that simple for as Jeffrey Pfeffer and Robert I Sutton in their 2006 book Hard Facts Dangerous Half-Truths and Total Nonsense : Profiting from evidence-based management argue there are three 'dangerous' decision practices which can often cause both organisations and individuals harm and one of which is casual benchmarking.  Pfeffer and Sutton argue there is nothing inherently wrong with trying to learn from others.  However, for this type of benchmarking to be of use the underlying logic of what worked and why it worked needs to be unpacked and understood.   Pfeffer and Sutton identify three questions which must be answered, if learning from others is to be beneficial.  
  • Is the success of a particular college down to the approach which you may seek to copy or is it merely a coincidence? Has a particular leadership style made no difference to student outcomes, even though student outcomes appear to have improved. Do other factors explain the improvement of student outcomes and which are independent of leadership style.
  • Why is a particular approach to, for example, lesson observation linked to performance improvement.   How has this approach led to sustained improvements in the level of teacher performance and subsequent improved outcomes for students.
  • Are there negative unintended consequences of high levels of compliance in well performing colleges.  How are these consequences being managed, and are these consequences and mitigating strategies evident in any benchmarking activity? (Adapted from Pfeffer and Sutton p8)
Furthermore not only is the FE Commissioner's  advice not as quite as simple as first thought, it may also be wrong.   Rosenzweig (2006) identifies a range of errors of logic or flawed thinking which distort our understanding of company (college) performance and which is implicit within the Skills Commissioner's letter and his 10 Cs.  Rosenzweig identifies the  most common delusion as the Halo Effect, as is when an observer's overall impression of a person, company, brand and product  and in this case college, influences the observer's views and thoughts about that entity's (college)  character or properties. In this case when a college's retention, achievement, success rates and operating surplus improve people (inspectors or significant other stakeholders) may conclude that these arise from a brilliant leadership and a coherent strategy or a strong and a college  culture with high levels of compliance. If and when performance deteriorates - success rates or position in leagues tables fall -  observers conclude it is the result of weak leadership and management (the Principal), and the college was complacent or coasting. On the other hand, the reality may be that there has been little or no substantive change and that the college performance creates a HALO that shapes the way judgements are made about outcomes for learners, teaching and learning and assessment, and leadership and management.

To conclude, I have argued that learning from others is not quite as simple as going to visit another college.  I have also argued that to learn from other requires awareness of possibly flawed thinking, logic and cognitive biases which means that the wrong things are learnt.  In future posts, I will argue the case for the need for evidence-based educational leadership which is relevant for the further education sector.

Pfeffer, J and Sutton, R., (2006) Hard Facts Dangerous Half-Truths and Total Nonsense : Profiting from evidence-based management. Harvard Business Review Press.

Rosenzweig, P., (2006). The Halo Effect … and the eight other business delusions that deceive managers, Free Press, London

Wednesday 15 October 2014

It seemed a good idea at the time BUT we really should have known better!

This academic year will have seen the introduction of  a wide range innovations in schools and colleges, many of which will be showing the first fruits of success.  On the other hand, there will be innovations which are not working and have little prospect of success and were introduced because the originator(s) had fallen in love with the idea.   These unfortunate failures highlight one of the major challenges of evidence based leadership and management which is to develop the processes which reduce the errors generated by cognitive and information processing limits which make decision-makers prone to biases and which subsequently lead to negative outcomes.

In this post I will be drawing upon the work of
Kahneman, Lavallo and Sibony (2011) in how to find dangerous biases before they lead to poor-decision-making.  Developing the skills to to appraise and critically judge the trustworthiness and relevance of multiple sources of evidence is a critical element of evidence based practice.   Kahneman et al identify a number of specific biases, questions and actions which could be used to improve the rigor of decision-making.  These have been summarised and adapted and  in the following table.

Avoiding Biases and Making Better Decisions - A Checklist - Summarised and adapted from Kahneman, Lavallo and Sibony (2011)

Preliminary questions
Check/Confirm for

Self-interested biasesIs there any reason to suspect that the team of individuals making the recommendation are making errors motivated by self-interest?Review the proposal with care
Affect heuristicHas the team fallen in love with its’ proposals?Apply the check-list
GroupthinkWere there dissenting opinions, were these opinions fully explored?Discretely obtain dissenting views
Challenge questions
Saliency biasCould the diagnosis be overly influenced by an analogy to a memorable success?Are there other analogies?
How similar are this and other analogies to the current situation?
Confirmation biasAre credible alternatives included with the recommendation?Request additional options be provided
Availability biasIf this decision was to be made again in a year’s time, what information would you want and can you get more of it now?Develop checklists of available information for different types decisions
Anchoring biasDo you know where the numbers came from – are there unsubstantiated numbers – have they been extrapolated from historical data?Check the figures against other models, are there alternative benchmarks which can be used for analysis.
Halo effectIs the team assuming that a person, organisation or innovation  which is successful in one area will be just as successful in another?Eliminate false inferences- seek alternative examples
Ask about the proposal
Overconfidence, planning fallacy, optimistic biases, competition neglectIs the base case overly optimistic?Have outside views been taken into account?
Check for disaster neglectIs the worst case bad enough?Conduct a pre-mortem to work out what could go wrong
Check for loss aversionsIs the recommending team overly cautious?Realign incentives to share responsibility for the risk or remove the risk.

How could this check-list be used to improve decision-making within educational settings?

  • Ensuring the check-list is applied before the action is taken which commits the school or college to the action being proposed.
  • Ensuring the decision-check-list are applied by a member or members of staff who are both sufficiently senior within the school/college, whilst at the same time is not part of the group making the recommendation.  Separation from recommenders and decision-makes is desirable and which has implications for governance and leadership.
  • Ensuring the check-list is used in whole and not in parts and is not 'cherry-picked' to legitimate a decision.
If colleagues are able to adopt the above process then it is likely to increase the chances of success in the use of evidence based practice.   As we approach the end of the first half-term  of the academic year I wonder how many schools, colleges, teachers, lecturers and most importantly students could have avoided the negative impacts of poorly thought out decisions, if the above checklist had been applied early on as part of the decision-making process.


Kahneman, D., Lovallo, D and Sibony, O.  (2011) Before you make that big decision ... Harvard Business Review, June 2011

Thursday 9 October 2014

Asking better questions

In my recent posts I raised the importance for evidence-based practice of being able to translate a practical problem into an answerable question and used the PICOT framework.

An alternative, though similar approach, is identified by  (Briner, Denyer, & Rousseau, 2009) who cite the work of (Denyer & Tranfield, 2009) who argue that well-crafted review questions need to take into both the organisational context and the relationship between an intervention and an outcome.  Adapting the work of (Pawson, 2006),  Denyer and Tranfield have developed a structured and contextual approach to developing an answerable question (CIMO) and which provides a better focus on both the context and the mechanism(s) by which change is brought about.

What is CIMO?
CIMO is an acronym for the the components of a well-formulated question for use in a social science or organisational context.

C — Context. Which individuals, relationships, institutional settings, or wider systems are being studied?

I — Intervention. The effects of what event, action, or activity are being studied?

M — Mechanisms. What are the mechanisms that explain the relationship between interventions and outcomes? Under what circumstances are these mechanisms activated or not activated?

O — Outcomes. What are the effects of the intervention? How will the outcomes be measured? What are the intended and unintended effects?
Denyer and Tranfield provide a worked example of a question framed with these components: 

“Under what conditions (C) does leadership style (I) influence the performance of project teams (O), and what mechanisms operate in the influence of leadership style (I) on project team performance (O)?” (Denyer and Tranfied, 2009 p 682)

Educational Examples
Using these elements it is now possible to frame answerable questions, several examples of which can be found below.

Under what circumstances does a further education college middle manager’s leadership style influence the academic performance of students, and what are the mechanisms of middle management leadership style which affect student performance (adapted from (Denyer & Tranfield, 2009)

Under what conditions does re-taking GCSE English provide an effective mechanism for developing 16 year old full-time further education students’ English skills, where those students previously achieved a grade D?  What are the processes associated with re-sitting GCSE English which affect English skills

Is the use of flipped learning an effective mechanism for engaging  full-time 16 year old level one further education college students effective in reducing the risk of non-attendance, where there has previously been a history of non-attendance in school. 
What are the mechanisms of flipped learning which affect student attendance.

Under what circumstance are graded lesson observations effective in improving lecturers teaching where those teachers have previously been judged to be inadequate or requiring improvement, and what are the mechanisms of graded lesson observation which affect teacher performance.

So what are benefits of using CIMOs and phrasing questions in such a manner?.
A number of benefits spring immediately to mind:
  1. CIMO provides a framework for formulating problems in a structured manner, and the very process of developing the question promotes understanding of the issue at hand.
  2. By formulating questions in this manner is that subsequently provides the basis for undertaking a systematic review and in particular provides guidance as to what literature to review and the data to be considered.
  3. CIMOs provide  a basis for allowing researchers/bloggers and tweeters to attempt to agree the question to which they are trying to contribute.
As we approach half-term, no doubt there are a number of new practices which have been introduced this term where colleagues are already saying, we should have known better/why did we allow ourselves to be swept along with this idea.  In order, to help prevent these issues happening again, we will look at the work of Daniel Kahnemann and others and a useful check-list of pre-implementation questions.

Briner, R. B., & Denyer, D. 2012. Systematic Review and Evidence Synthesis as a Practice. In D. M. Rousseau (Ed.), The Oxford Handbook of Evidence-Based Management. Oxford: Oxford University Press.
Briner, R. B., Denyer, D., & Rousseau, D. M. 2009. Evidence-Based Management: Concept Cleanup Time? Academy of Management Perspectives. , 23(4): 19-32.
Denyer, D., & Tranfield, D. 2009. Producing a systematic review. In D. Buchanan, & A. Bryman (Eds.), The SAGE handbook of organizational research methods: 671-689. London: SAGE Publications Ltd.

Pawson, R. 2006. Evidence-based policy : A realist perspective. London: Sage Publications.

Wednesday 1 October 2014

Can nurse education make you a better teacher by asking better questions - Part Two

In my previous post I adapted the work of (Stillwell, Melnyk, & Williamson, 2010)  to help evidence-based educational practitioners go about the task of devising well formulated questions.  In this post I intend to further adapt Stillwell et al's work  and look at the differing types of questions that can be asked.   But first a quick recap

PICOT is an acronym for the components of a clinical question and are as follows:
P — Patient or Problem. How would you describe the group of patients or problem?
I — Intervention. What are you planning to do with your patients?
C — Comparison. What is the alternative to the intervention (e.g.different intervention)?
O — Outcomes. What are the effects of the intervention?
T  - Time.  What time does it take for intervention to achieve the outcomes.

Question types

Having created a well-formulated question, it is worth reflecting on the type of question-which has been created and is illustrated in the following table and which has been adapted from (Stillwell, Melnyk, & Williamson, 2010).

Question type
PICOT question
Pedagogical intervention
To determine which pedagogical intervention leads to the best outcome for pupils/outcomes
For students requiring students requiring additional learning supported how does the provision of 1 to 1 support compared with group support affect retention rates in the first term?

To determine greatest success/risk factors
Are level 3 BTEC Extended Diploma students who have grade C or above in GCSE Mathematics compared with those students who do not, more likely to successfully complete their two year programme of study?
To determine which test is more accurate in diagnosing learning needs
For students requiring support in the development of English skills,  are GCSE grades a better indicator of needs compared to specific on-line screening tool (eg BKSB)?
Prognosis or prediction
To determine the course over time and likely complications of a particular condition or pedagogical intervention
Do weekly tutorials for students with poor records of attendance improve timely completion of coursework within three months of the initiation of the weekly tutorials?
To understand the meaning of an experience for a particular group of  students
How do further educations students with grade D or below in GCSE English perceive re-siting GCSE English during the first year of post-16 education?

Straus et al 2010 have suggested a series of filters which could be used to identify the most appropriate question to ask in a particular situation.  I have adapted the suggested filters so they can be easily transferred to an educational setting.
  • Which question, if answered, will be most useful for our learners' well being - academic or personal?
  • Which question will be most useful for subject leaders, heads of department in gaining a better understanding of the issues at hand?
  • Which question will be most useful in helping to improve the department, school or college?
  • Which question is most likely to re-occur and will need to be revisited in the future?
  • Which question is most interesting to you as an evidence-based practitioner and contribute most to your personal professional development?

I'm sure that some colleagues are saying that they do not have sufficient time to formulate questions in such a structured manner.  On the other hand, as Strauss et al (2010) so clearly articulate - again amended for evidence-based educational practitioners - there are a number of clear benefits of such an approach:

  • Focussing our scarce professional development time on the needs of our learners.
  • Making it easier to communicate with colleagues.
  • As our knowledge grows we are role-modelling to our colleagues lifelong learning

I hope you agree with me that by asking better questions we can expand our skills as evidence-based practitioners.  In future posts  I will continue to explore the challenge of asking better questions and will be drawing upon a number of differing perspectives.


Straus, S.E., Glasziou, P., Richardson, W. S. & Haynes, B.R. (2010)  Evidence Based Medicine : How to practice and teach it, (4th edition), Churchill Livingston.
Stillwell, S. B., Melnyk, B. M., & Williamson, K. M. (2010). Asking the Clinical Question : A key step in evidence based practice. American Journal of Nursing, 210(3).