Efficacy and evidence

Posted: April 21, 2014 in learning outcomes
Tags: 21st century skills, adaptive, analytics, Ben Goldacre, big data, business, data, efficacy, Global Scale of English, John Fallon, knowledge graphs, learning outcomes, Pearson

Pearson’s ‘Efficacy’ initiative is a series of ‘commitments designed to measure and increase the company’s impact on learning outcomes around the world’. The company’s dedicated website offers two glossy brochures with a wide range of interesting articles, a good questionnaire tool that can be used by anyone to measure the efficacy of their own educational products or services, as well as an excellent selection of links to other articles, some of which are critical of the initiative. These include Michael Feldstein’s long blog post ‘Can Pearson Solve the Rubric’s Cube?’ which should be a first port of call for anyone wanting to understand better what is going on.

What does it all boil down to? The preface to Pearson’s ‘Asking More: the Path to Efficacy’ by CEO John Fallon provides a succinct introduction. Efficacy in education, says Fallon, is ‘making a measurable impact on someone’s life through learning’. ‘Measurable’ is the key word, because, as Fallon continues, ‘it is increasingly possible to determine what works and what doesn’t in education, just as in healthcare.’ We need ‘a relentless focus’ on ‘the learning outcomes we deliver’ because it is these outcomes that can be measured in ‘a systematic, evidence-based fashion’. Measurement, of course, is all the easier when education is delivered online, ‘real-time learner data’ can be captured, and the power of analytics can be deployed.

Pearson are very clearly aligning themselves with recent moves towards a more evidence-based education. In the US, Obama’s Race to the Top is one manifestation of this shift. Britain (with, for example, the Education Endowment Foundation) and France (with its Fonds d’Expérimentation pour la Jeunesse ) are both going in the same direction. Efficacy is all about evidence-based practice.

Both the terms ‘efficacy’ and ‘evidence-based practice’ come originally from healthcare. Fallon references this connection in the quote two paragraphs above. In the UK last year, Ben Goldacre (medical doctor, author of ‘Bad Science’ and a relentless campaigner against pseudo-science) was commissioned by the UK government to write a paper entitled ‘Building Evidence into Education’ . In this, he argued for the need to introduce randomized controlled trials into education in a similar way to their use in medicine.

As Fallon observed in the preface to the Pearson ‘Efficacy’ brochure, this all sounds like ‘common sense’. But, as Ben Goldacre discovered, things are not so straightforward in education. An excellent article in The Guardian outlined some of the problems in Goldacre’s paper.

With regard to ELT, Pearson’s ‘Efficacy’ initiative will stand or fall with the validity of their Global Scale of English, discussed in my March post ‘Knowledge Graphs’ . However, there are a number of other considerations that make the whole evidence-based / efficacy business rather less common-sensical than might appear at first glance.

The purpose of English language teaching and learning (at least, in compulsory education) is rather more than simply the mastery of grammatical and lexical systems, or the development of particular language skills. Some of these other purposes (e.g. the development of intercultural competence or the acquisition of certain 21^st century skills, such as creativity) continue to be debated. There is very little consensus about the details of what these purposes (or outcomes) might be, or how they can be defined. Without consensus about these purposes / outcomes, it is not possible to measure them.
Even if we were able to reach a clear consensus, many of these outcomes do not easily lend themselves to measurement, and even less to low-cost measurement.
Although we clearly need to know what ‘works’ and what ‘doesn’t work’ in language teaching, there is a problem in assigning numerical values. As the EduThink blog observes, ‘the assignation of numerical values is contestable, problematic and complex. As teachers and researchers we should be engaging with the complexity [of education] rather than the reductive simplicities of [assigning numerical values]’.
Evidence-based medicine has resulted in unquestionable progress, but it is not without its fierce critics. A short summary of the criticisms can be found here . It would be extremely risky to assume that a contested research procedure from one discipline can be uncritically applied to another.
Kathleen Graves, in her plenary at IATEFL 2014, ‘The Efficiency of Inefficiency’, explicitly linked health care and language teaching. She described a hospital where patient care was as much about human relationships as it was about medical treatment, an aspect of the hospital that went unnoticed by efficiency experts, since this could not be measured. See this blog for a summary of her talk.

These issues need to be discussed much further before we get swept away by the evidence-based bandwagon. If they are not, the real danger is that, as John Fallon cautions, we end up counting things that don’t really count, and we don’t count the things that really do count. Somehow, I doubt that an instrument like the Global Scale of English will do the trick.

Comments

Scott Thornbury says:

April 21, 2014 at 10:01 pm

“Although we clearly need to know what ‘works’ and what ‘doesn’t work’ in language teaching, there is a problem in assigning numerical values”.

There is indeed. But, short of assigning numerical values (e.g. by measuring the effects of a teaching intervention on a treatment group and comparing it with a control group), how do you counter the claims of those who advocate applying techniques derived from, say, neuro-linguistic programing, or brain gym, to the classroom? If, as you yourself have argued, these approaches are fake science, they are fake science because – among other things – they have not been tested empirically. Is it reasonable, then, to demand empirical evidence for NLP or learning styles, but to dismiss the need for empirical evidence for the efficacy of the kind of teaching we find more plausible?

(I don’t have a ready answer to this question so I’m hoping you do!)

Reply
Russ says:

April 22, 2014 at 8:47 am

Very interesting post. I just wanted to add a minor correction to Scott’s reply. I think it’s not that NLP and LS have not been tested, it’s that they HAVE in fact been tested and hVe failed to provide positive results in those tests. I think Pashler et al lists some of the studies on LS and Hatties notes that the only evidence of LS working is from Dunn and Dunn (promoter of LS) and not only were their calculations were wrong their effect size was so large as to be bigger than any other teaching intervention available to us. The author view this was suspicion. The same is true of NLP which has been tested and has failed to produce positive result s time and time again. See for example http://psycnet.apa.org/journals/cou/32/4/622/

Reply
- Scott Thornbury says:
  
  April 22, 2014 at 10:14 am
  
  Thanks Russ… you’re right: I just checked Hattie 2009: 197), including his observation that ‘it is hard not to be skeptical about these learning preference claims’ and that the existing LS studies are characterised by ‘too much overstatement, poor items and assessments; low validity and negligible impact on practice’.
  
  So, maybe I should rephrase my question: ‘Is it reasonable to dismiss the claims of advocates of NLP, LS, MI etc because they fail empirical tests, while querying the need for doing empirical studies on the kind of teaching we find more plausible?’
  
  Reply
philipjkerr says:

April 22, 2014 at 12:01 pm

Aye, Scott, there’s the rub, but I do have an answer of sorts.
First of all, I do not think it is an either / or question. I did not intend to give the impression that I was opposed to all forms of quantitative research. My concern is with the relative weightings that are given to quantitative and qualitative analysis, and the kinds of insights that these different kinds of research can give us.
Evidence-based medicine, from which evidence-based education derives, takes a number of forms. In its extreme form, there are those who would practically do away with doctors altogether, but a more balanced view sees research evidence as one factor, rather than the only factor, in the process of clinical decision making. My fear is that we may be moving in language education towards an exclusive reliance on measurable data when it comes to making decisions about educational interventions. The reason why, I think, this is a reasonable worry is that with the move towards e-learning, it is so easy to capture and analyse data. Furthermore, the research tools we use will impact on the things we are trying to measure. The combination of evidence-based education and (much) e-learning is a very potent one. Both conceptualize the educational process in terms of the delivery of learning outcomes and assume a relatively simple relationship between intervention and outcome that is causal. As Biesta (Biesta, G.J.J. 2010 Good Education in an Age of Measurement) observes, ‘evidence-based practice entails a technological model of professional action’. Learning affordances don’t come into the picture because they do not lend themselves to the kind of granular measurement that is possible with narrowly-defined learning outcomes.
In your comment, you refer to NLP, learning styles and Brain Gym. Thanks, Russ, for referring to some of the research that has been done in these areas. Of the three, learning styles has received the most research attention. The reason for this, I think, is that (of the three) it is (was?) the most plausible hypothesis. Brain Gym has been much less researched – for the same reason that the use of crystals in the language classroom has not been much investigated. Before undertaking complex quantitative analysis, there must be a plausible hypothesis that we want to test. In the absence of this, we don’t really need new empirical evidence.
If the only (or primary) justification for a classroom technique / activity is its conformity to an implausible theory (such as Brain Gym or NLP), we can rubbish the theory, but we may still wish to investigate the technique / activity further. Teachers’ claims (based on their experience) that the technique / activity ‘works’ are interesting, and are worth further study because this may lead to the formulation of a more plausible theory. For example, what exactly does this technique / activity work for? And for whom? The investigation of a particular educational intervention is very different from the investigation of a broader theory, although the former may shed light on the latter. The point of such research would be to refine the hypothetical question more than to prove or disprove anything.
Quantitative analysis is better at disproving than proving: falsifiability is taken by many to be central to scientific research. I doubt if we will ever be able to prove numerically that one educational approach is better than another (e.g. e-learning versus F2F, or Dogme versus PPP). Specific interventions (e.g. one online gapfill compared to another) lend themselves much better to data analysis, but they also lend themselves more easily to misinterpretation (e.g. to the conclusion that we should be administering gapfill B rather than gapfill A). Whether or not our students should be doing online gapfills at all should be determined by moral, political and philosophical considerations, even if our thinking is informed by hard scientific data.

Reply
Rob Hickling says:

April 22, 2014 at 3:44 pm

Ben Goldacre has consistently argued for a mixed methods approach with qualitative research sitting alongside robust quantitative research. The qualitative aspect is a crucial element in understanding how a particular intervention works in practice. However, his drive towards the use of more RCTs in education is I believe absolutely the right way to go if we are to learn from the mistakes made in medicine (e.g. the use of steroid injections for head injuries which made sense but actually ended up killing people) and, as Scott points out, our own flirtations with utter nonsense such as Brain Gym, NLP and Learning Styles. Ben Goldacre is also one of the authors of a paper by the Cabinet Office Behavioural Insights Team called “Test, Learn, Adapt: Developing Public Policy with Randomised Controlled Trials”. I highly recommend this to anyone interested in finding out more about RCTs. It can be downloaded here.

Click to access TLA-1906126.pdf

My own concern with the Pearson efficacy project is that it does not go far enough in insisting on rigorous and robust research methodology. The pressure is clearly on marketing teams to provide the “efficacy studies” and given a choice, it is far easier to produce a case study or an observational study than to carry out a rigorous RCT.

So, at IATEFL a couple of weeks ago I picked up a brochure from the Pearson stand called “Efficacy Results”. It gives the results of 12 “studies” for the MyLab English products. Eleven out of twelve of these are either simple case studies or observational studies – mostly teachers saying how great they are. One goes so far as to say that “…whilst we do not have statistical evidence to illustrate overall results…” before making a spurious claim. And they call this evidence. Only one out of twelve is comparative study but they don’t give any information about how the students were assigned, or the statistical significance of the results, potentially making them worthless. My conclusion from this is that whilst the intentions may be good, when it filters down to those responsible and under pressure to provide reports, the whole thing becomes a marketing exercise dressed up as science.

Reply
Stephanie says:

April 25, 2014 at 6:17 pm

Scott asks: “Is it reasonable, then, to demand empirical evidence for NLP or learning styles, but to dismiss the need for empirical evidence for the efficacy of the kind of teaching we find more plausible?”

I don’t see why any teacher would NOT want evidence that what they do is effective. When we talk about empirical evidence in this context, presumably we mean information. Non-empirical evidence would be something like faith or woo-woo. (Correct me if I’m wrong. Please!)

It’s worth looking at the website of the Center for Evidence Based Management (CEBM), which also has articles on evidence-based management in teaching and evidence-based management education. Rob Briner’s article, “Does coaching work and does anyone really care” (2012), is germane to this discussion.

http://tinyurl.com/mlebw2s

As you read it, try replacing the word ‘coaching’ with Brain Gym, NLP, Learning Styles, Homeopathy, or any ELT fad of your choosing.

The author notes that while there’s a lot of talk about evidence-based practice, there’s far less action. One of these actions would be “ask and answer more precisely and much more frequently a series of difficult but essential questions”.

These questions are listed with reference to occupational psychologists (OP), but could apply to any self-styled practitioner:

– What is it exactly that OPs do? What range of products, services and interventions?
– What claims are made about each of these interventions and activities?
– What is the evidence for each of these interventions and activities?
– What does a critical appraisal of that evidence tell us about a particular interventions?
– Are OPs aware of this evidence?
– Do OPs use this evidence in their work?
– What facilitates and inhibits the use of evidence?
– What are the ethical and other implications of this evidence for the professional behaviour of OPs?

Reply
- philipjkerr says:
  
  April 30, 2014 at 8:23 am
  
  Hi Stephanie
  Thanks for your comment and, especially, the link to the article about coaches by Rob Briner, which is well worth a read.
  I take your point that it is hard to imagine why any teacher would NOT want evidence about the outcomes of their work. But evidence can only be meaningfully and usefully collected if (1) there is a very clear idea what those outcomes are supposed to be, and (2) we can assume a causal connection between the teacher’s work and the outcomes that are being measured.
  Both of these are problematic.
  In some English language teaching contexts, desired outcomes are very clear. This would be the case, for example, in a preparatory course for a particular exam or in an ESP context, where the ‘S’ is very specific. But most English language learning / teaching takes place in contexts where the outcomes are less clearly defined and where there is little or no shared vision of what the outcomes should be between the various stakeholders in the process (students, teachers, school administrations, educational authorities, politicians, parents, etc). One example might help to make this clear. What outcomes might we want to measure after a semester’s work of a seven-year-old child doing two hours of English a week in primary school? Even if we could reach agreement about the desirable outcomes for the class as a whole, we would need then need to reach agreement about the desirable outcomes for the individual child (not least because, however differentiated the instruction, we would have to take into account differing developmental stages). This cannot be measured empirically.
  I can see some parallels between English language teachers (especially in public education) and some kinds of coach and therapist. Whatever outcomes are stipulated before a course of intervention, these are likely to shift in the course of that course of intervention in order to address individual needs, which are also shifting.
  Putting all that to one side for the moment, there is also a problem in assuming causal connections between teacher interventions and outcomes. We know, from very extensive research, that teacher intervention can only account for a relatively small proportion of the learning gains (and these are usually defined very narrowly) that are measured in standardised tests.
  Like Gert Biesta, I am not arguing against RCTs or an evidence-based approach, but I do think we need to be very aware of what this can and cannot tell us. We need to spend a lot more time and energy framing our difficult, but essential, questions, and be in less of a hurry to collect data / evidence about questions that are inadequately framed.
  Thanks again,
  Philip
  
  Reply
Russ says:

April 27, 2014 at 12:22 pm

I love the use of woo-woo 🙂

Reply
The EdTech Imaginary in ELT | Adaptive Learning in ELT says:

January 13, 2020 at 8:09 am

[…] unless the innovations improve something, but for us to know this, we need a way to measure it. In a previous post , I looked at Pearson’s ‘Asking More: the Path to Efficacy’ by CEO John Fallon (who will be […]

Reply

	Cutting To The Chase… on Questions about pre-teaching v…
	chrisfrybarcelona on Multimodality and modals
	MB on Multimodality and modals
	Multimodality and mo… on Digital literacies literacy
	Multimodality and mo… on Fake news and critical thinkin…

	Cutting To The Chase… on Questions about pre-teaching v…
	chrisfrybarcelona on Multimodality and modals
	MB on Multimodality and modals
	Multimodality and mo… on Digital literacies literacy
	Multimodality and mo… on Fake news and critical thinkin…

Adaptive Learning in ELT

Follow Blog via Email

Recent Posts

Recent Comments

Archives

Categories