Posts Tagged ‘analytics’

In December last year, I posted a wish list for vocabulary (flashcard) apps. At the time, I hadn’t read a couple of key research texts on the subject. It’s time for an update.

First off, there’s an article called ‘Intentional Vocabulary Learning Using Digital Flashcards’ by Hsiu-Ting Hung. It’s available online here. Given the lack of empirical research into the use of digital flashcards, it’s an important article and well worth a read. Its basic conclusion is that digital flashcards are more effective as a learning tool than printed word lists. No great surprises there, but of more interest, perhaps, are the recommendations that (1) ‘students should be educated about the effective use of flashcards (e.g. the amount and timing of practice), and this can be implemented through explicit strategy instruction in regular language courses or additional study skills workshops ‘ (Hung, 2015: 111), and (2) that digital flashcards can be usefully ‘repurposed for collaborative learning tasks’ (Hung, ibid.).

nakataHowever, what really grabbed my attention was an article by Tatsuya Nakata. Nakata’s research is of particular interest to anyone interested in vocabulary learning, but especially so to those with an interest in digital possibilities. A number of his research articles can be freely accessed via his page at ResearchGate, but the one I am interested in is called ‘Computer-assisted second language vocabulary learning in a paired-associate paradigm: a critical investigation of flashcard software’. Don’t let the title put you off. It’s a review of a pile of web-based flashcard programs: since the article is already five years old, many of the programs have either changed or disappeared, but the critical approach he takes is more or less as valid now as it was then (whether we’re talking about web-based stuff or apps).

Nakata divides his evaluation for criteria into two broad groups.

Flashcard creation and editing

(1) Flashcard creation: Can learners create their own flashcards?

(2) Multilingual support: Can the target words and their translations be created in any language?

(3) Multi-word units: Can flashcards be created for multi-word units as well as single words?

(4) Types of information: Can various kinds of information be added to flashcards besides the word meanings (e.g. parts of speech, contexts, or audios)?

(5) Support for data entry: Does the software support data entry by automatically supplying information about lexical items such as meaning, parts of speech, contexts, or frequency information from an internal database or external resources?

(6) Flashcard set: Does the software allow learners to create their own sets of flashcards?


(1) Presentation mode: Does the software have a presentation mode, where new items are introduced and learners familiarise themselves with them?

(2) Retrieval mode: Does the software have a retrieval mode, which asks learners to recall or choose the L2 word form or its meaning?

(3) Receptive recall: Does the software ask learners to produce the meanings of target words?

(4) Receptive recognition: Does the software ask learners to choose the meanings of target words?

(5) Productive recall: Does the software ask learners to produce the target word forms corresponding to the meanings provided?

(6) Productive recognition: Does the software ask learners to choose the target word forms corresponding to the meanings provided?

(7) Increasing retrieval effort: For a given item, does the software arrange exercises in the order of increasing difficulty?

(8) Generative use: Does the software encourage generative use of words, where learners encounter or use previously met words in novel contexts?

(9) Block size: Can the number of words studied in one learning session be controlled and altered?

(10) Adaptive sequencing: Does the software change the sequencing of items based on learners’ previous performance on individual items?

(11) Expanded rehearsal: Does the software help implement expanded rehearsal, where the intervals between study trials are gradually increased as learning proceeds? (Nakata, T. (2011): ‘Computer-assisted second language vocabulary learning in a paired-associate paradigm: a critical investigation of flashcard software’ Computer Assisted Language Learning, 24:1, 17-38)

It’s a rather different list from my own (there’s nothing I would disagree with here), because mine is more general and his is exclusively oriented towards learning principles. Nakata makes the point towards the end of the article that it would ‘be useful to investigate learners’ reactions to computer-based flashcards to examine whether they accept flashcard programs developed according to learning principles’ (p. 34). It’s far from clear, he points out, that conformity to learning principles are at the top of learners’ agendas. More than just users’ feelings about computer-based flashcards in general, a key concern will be the fact that there are ‘large individual differences in learners’ perceptions of [any flashcard] program’ (Nakata, N. 2008. ‘English vocabulary learning with word lists, word cards and computers: implications from cognitive psychology research for optimal spaced learning’ ReCALL 20(1), p. 18).

I was trying to make a similar point in another post about motivation and vocabulary apps. In the end, as with any language learning material, research-driven language learning principles can only take us so far. User experience is a far more difficult creature to pin down or to make generalisations about. A user’s reaction to graphics, gamification, uploading time and so on are so powerful and so subjective that learning principles will inevitably play second fiddle. That’s not to say, of course, that Nakata’s questions are not important: it’s merely to wonder whether the bigger question is truly answerable.

Nakata’s research identifies plenty of room for improvement in digital flashcards, and although the article is now quite old, not a lot had changed. Key areas to work on are (1) the provision of generative use of target words, (2) the need to increase retrieval effort, (3) the automatic provision of information about meaning, parts of speech, or contexts (in order to facilitate flashcard creation), and (4) the automatic generation of multiple-choice distractors.

In the conclusion of his study, he identifies one flashcard program which is better than all the others. Unsurprisingly, five years down the line, the software he identifies is no longer free, others have changed more rapidly in the intervening period, and who knows will be out in front next week?


Having spent a lot of time recently looking at vocabulary apps, I decided to put together a Christmas wish list of the features of my ideal vocabulary app. The list is not exhaustive and I’ve given more attention to some features than others. What (apart from testing) have I missed out?

1             Spaced repetition

Since the point of a vocabulary app is to help learners memorise vocabulary items, it is hard to imagine a decent system that does not incorporate spaced repetition. Spaced repetition algorithms offer one well-researched way of improving the brain’s ‘forgetting curve’. These algorithms come in different shapes and sizes, and I am not technically competent to judge which is the most efficient. However, as Peter Ellis Jones, the developer of a flashcard system called CardFlash, points out, efficiency is only one half of the rote memorisation problem. If you are not motivated to learn, the cleverness of the algorithm is moot. Fundamentally, learning software needs to be fun, rewarding, and give a solid sense of progression.

2             Quantity, balance and timing of new and ‘old’ items

A spaced repetition algorithm determines the optimum interval between repetitions, but further algorithms will be needed to determine when and with what frequency new items will be added to the deck. Once a system knows how many items a learner needs to learn and the time in which they have to do it, it is possible to determine the timing and frequency of the presentation of new items. But the system cannot know in advance how well an individual learner will learn the items (for any individual, some items will be more readily learnable than others) nor the extent to which learners will live up to their own positive expectations of time spent on-app. As most users of flashcard systems know, it is easy to fall behind, feel swamped and, ultimately, give up. An intelligent system needs to be able to respond to individual variables in order to ensure that the learning load is realistic.

3             Task variety

A standard flashcard system which simply asks learners to indicate whether they ‘know’ a target item before they flip over the card rapidly becomes extremely boring. A system which tests this knowledge soon becomes equally dull. There needs to be a variety of ways in which learners interact with an app, both for reasons of motivation and learning efficiency. It may be the case that, for an individual user, certain task types lead to more rapid gains in learning. An intelligent, adaptive system should be able to capture this information and modify the selection of task types.

Most younger learners and some adult learners will respond well to the inclusion of games within the range of task types. Examples of such games include the puzzles developed by Oliver Rose in his Phrase Maze app to accompany Quizlet practice.Phrase Maze 1Phrase Maze 2

4             Generative use

Memory researchers have long known about the ‘Generation Effect’ (see for example this piece of research from the Journal of Verbal Learning and Learning Behavior, 1978). Items are better learnt when the learner has to generate, in some (even small) way, the target item, rather than simply reading it. In vocabulary learning, this could be, for example, typing in the target word or, more simply, inserting some missing letters. Systems which incorporate task types that require generative use are likely to result in greater learning gains than simple, static flashcards with target items on one side and definitions or translations on the other.

5             Receptive and productive practice

The most basic digital flashcard systems require learners to understand a target item, or to generate it from a definition or translation prompt. Valuable as this may be, it won’t help learners much to use these items productively, since these systems focus exclusively on meaning. In order to do this, information must be provided about collocation, colligation, register, etc and these aspects of word knowledge will need to be focused on within the range of task types. At the same time, most vocabulary apps that I have seen focus primarily on the written word. Although any good system will offer an audio recording of the target item, and many will offer the learner the option of recording themselves, learners are invariably asked to type in their answers, rather than say them. For the latter, speech recognition technology will be needed. Ideally, too, an intelligent system will compare learner recordings with the audio models and provide feedback in such a way that the learner is guided towards a closer reproduction of the model.

6             Scaffolding and feedback

feebuMost flashcard systems are basically low-stakes, practice self-testing. Research (see, for example, Dunlosky et al’s metastudy ‘Improving Students’ Learning With Effective Learning Techniques: Promising Directions From Cognitive and Educational Psychology’) suggests that, as a learning strategy, practice testing has high utility – indeed, of higher utility than other strategies like keyword mnemonics or highlighting. However, an element of tutoring is likely to enhance practice testing, and, for this, scaffolding and feedback will be needed. If, for example, a learner is unable to produce a correct answer, they will probably benefit from being guided towards it through hints, in the same way as a teacher would elicit in a classroom. Likewise, feedback on why an answer is wrong (as opposed to simply being told that you are wrong), followed by encouragement to try again, is likely to enhance learning. Such feedback might, for example, point out that there is perhaps a spelling problem in the learner’s attempted answer, that the attempted answer is in the wrong part of speech, or that it is semantically close to the correct answer but does not collocate with other words in the text. The incorporation of intelligent feedback of this kind will require a number of NLP tools, since it will never be possible for a human item-writer to anticipate all the possible incorrect answers. A current example of intelligent feedback of this kind can be found in the Oxford English Vocabulary Trainer app.

7             Content

At the very least, a decent vocabulary app will need good definitions and translations (how many different languages?), and these will need to be tagged to the senses of the target items. These will need to be supplemented with all the other information that you find in a good learner’s dictionary: syntactic patterns, collocations, cognates, an indication of frequency, etc. The only way of getting this kind of high-quality content is by paying to license it from a company with expertise in lexicography. It doesn’t come cheap.

There will also need to be example sentences, both to illustrate meaning / use and for deployment in tasks. Dictionary databases can provide some of these, but they cannot be relied on as a source. This is because the example sentences in dictionaries have been selected and edited to accompany the other information provided in the dictionary, and not as items in practice exercises, which have rather different requirements. Once more, the solution doesn’t come cheap: experienced item writers will be needed.

Dictionaries describe and illustrate how words are typically used. But examples of typical usage tend to be as dull as they are forgettable. Learning is likely to be enhanced if examples are cognitively salient: weird examples with odd collocations, for example. Another thing for the item writers to think about.

A further challenge for an app which is not level-specific is that both the definitions and example sentences need to be level-specific. An A1 / A2 learner will need the kind of content that is found in, say, the Oxford Essential dictionary; B2 learners and above will need content from, say, the OALD.

8             Artwork and design

My wordbook2It’s easy enough to find artwork or photos of concrete nouns, but try to find or commission a pair of pictures that differentiate, for example, the adjectives ‘wild’ and ‘dangerous’ … What kind of pictures might illustrate simple verbs like ‘learn’ or ‘remember’? Will such illustrations be clear enough when squeezed into a part of a phone screen? Animations or very short video clips might provide a solution in some cases, but these are more expensive to produce and video files are much heavier.

With a few notable exceptions, such as the British Councils’s MyWordBook 2, design in vocabulary apps has been largely forgotten.

9             Importable and personalisable lists

Many learners will want to use a vocabulary app in association with other course material (e.g. coursebooks). Teachers, however, will inevitably want to edit these lists, deleting some items, adding others. Learners will want to do the same. This is a huge headache for app designers. If new items are going to be added to word lists, how will the definitions, example sentences and illustrations be generated? Will the database contain audio recordings of these words? How will these items be added to the practice tasks (if these include task types that go beyond simple double-sided flashcards)? NLP tools are not yet good enough to trawl a large corpus in order to select (and possibly edit) sentences that illustrate the right meaning and which are appropriate for interactive practice exercises. We can personalise the speed of learning and even the types of learning tasks, so long as the target language is predetermined. But as soon as we allow for personalisation of content, we run into difficulties.

10          Gamification

Maintaining motivation to use a vocabulary app is not easy. Gamification may help. Measuring progress against objectives will be a start. Stars and badges and leaderboards may help some users. Rewards may help others. But gamification features need to be built into the heart of the system, into the design and selection of tasks, rather than simply tacked on as an afterthought. They need to be trialled and tweaked, so analytics will be needed.

11          Teacher support

Although the use of vocabulary flashcards is beginning to catch on with English language teachers, teachers need help with ways to incorporate them in the work they do with their students. What can teachers do in class to encourage use of the app? In what ways does app use require teachers to change their approach to vocabulary work in the classroom? Reporting functions can help teachers know about the progress their students are making and provide very detailed information about words that are causing problems. But, as anyone involved in platform-based course materials knows, teachers need a lot of help.

12          And, of course, …

Apps need to be usable with different operating systems. Ideally, they should be (partially) usable offline. Loading times need to be short. They need to be easy and intuitive to use.

It’s unlikely that I’ll be seeing a vocabulary app with all of these features any time soon. Or, possibly, ever. The cost of developing something that could do all this would be extremely high, and there is no indication that there is a market that would be ready to pay the sort of prices that would be needed to cover the costs of development and turn a profit. We need to bear in mind, too, the fact that vocabulary apps can only ever assist in the initial acquisition of vocabulary: apps alone can’t solve the vocabulary learning problem (despite the silly claims of some app developers). The need for meaningful communicative use, extensive reading and listening, will not go away because a learner has been using an app. So, how far can we go in developing better and better vocabulary apps before users decide that a cheap / free app, with all its shortcomings, is actually good enough?

I posted a follow up to this post in October 2016.

51Fgn6C4sWL__SY344_BO1,204,203,200_Decent research into adaptive learning remains very thin on the ground. Disappointingly, the Journal of Learning Analytics has only managed one issue so far in 2015, compared to three in 2014. But I recently came across an article in Vol. 18 (pp. 111 – 125) of  Informing Science: the International Journal of an Emerging Transdiscipline entitled Informing and performing: A study comparing adaptive learning to traditional learning by Murray, M. C., & Pérez, J. of Kennesaw State University.

The article is worth reading, not least because of the authors’ digestible review of  adaptive learning theory and their discussion of levels of adaptation, including a handy diagram (see below) which they have reproduced from a white paper by Tyton Partners ‘Learning to Adapt: Understanding the Adaptive Learning Supplier Landscape’. Murray and Pérez make clear that adaptive learning theory is closely connected to the belief that learning is improved when instruction is personalized — adapted to individual learning styles, but their approach is surprisingly uncritical. They write, for example, that the general acceptance of learning styles is evidenced in recommended teaching strategies in nearly every discipline, and learning styles continue to inform the evolution of adaptive learning systems, and quote from the much-quoted Pashler, H., McDaniel, M., Rohrer, D., & Bjork, R. (2008) Learning styles: concepts and evidence, Psychological Science in the Public Interest, 9, 105–119. But Pashler et al concluded that the current evidence supporting the use of learning style-matched approaches is virtually non-existent (see here for a review of Pashler et al). And, in the world of ELT, an article in the latest edition of ELTJ by Carol Lethaby and Patricia Harries disses learning styles and other neuromyths. Given the close connection between adaptive learning theory and learning styles, one might reasonably predict that a comparative study of adaptive learning and traditional learning would not come out with much evidence in support of the former.

adaptive_taxonomyMurray and Pérez set out, anyway, to explore the hypothesis that adapting instruction to an individual’s learning style results in better learning outcomes. Their study compared adaptive and traditional methods in a university-level digital literacy course. Their conclusion? This study and a few others like it indicate that today’s adaptive learning systems have negligible impact on learning outcomes.

I was, however, more interested in the comments which followed this general conclusion. They point out that learning outcomes are only one measure of quality. Others, such as student persistence and engagement, they claim, can be positively affected by the employment of adaptive systems. I am not convinced. I think it’s simply far too soon to be able to judge this, and we need to wait quite some time for novelty effects to wear off. Murray and Pérez provide two references in support of their claim. One is an article by Josh Jarrett, Bigfoot, Goldilocks, and Moonshots: A Report from the Frontiers of Personalized Learning in Educause. Jarrett is Deputy Director for Postsecondary Success at the Bill & Melinda Gates Foundation and Educause is significantly funded by the Gates Foundation. Not, therefore, an entirely unbiased and trustworthy source. The other is a journalistic piece in Forbes. It’s by Tim Zimmer, entitled Rethinking higher ed: A case for adaptive learning and it reads like an advert. Zimmer is a ‘CCAP contributor’. CCAP is the Centre for College Affordability and Productivity, a libertarian, conservative foundation with a strong privatization agenda. Not, therefore, a particularly reliable source, either.

Despite their own findings, Murray and Pérez follow up their claim about student persistence and engagement with what they describe as a more compelling still argument for adaptive learning. This, they say, is the intuitively appealing case for adaptive learning systems as engines with which institutions can increase access and reduce costs. Ah, now we’re getting to the point!













In ELT circles, ‘behaviourism’ is a boo word. In the standard history of approaches to language teaching (characterised as a ‘procession of methods’ by Hunter & Smith 2012: 432[1]), there were the bad old days of behaviourism until Chomsky came along, savaged the theory in his review of Skinner’s ‘Verbal Behavior’, and we were all able to see the light. In reality, of course, things weren’t quite like that. The debate between Chomsky and the behaviourists is far from over, behaviourism was not the driving force behind the development of audiolingual approaches to language teaching, and audiolingualism is far from dead. For an entertaining and eye-opening account of something much closer to reality, I would thoroughly recommend a post on Russ Mayne’s Evidence Based ELT blog, along with the discussion which follows it. For anyone who would like to understand what behaviourism is, was, and is not (before they throw the term around as an insult), I’d recommend John A. Mills’ ‘Control: A History of Behavioral Psychology’ (New York University Press, 1998) and John Staddon’s ‘The New Behaviorism 2nd edition’ (Psychology Press, 2014).

There is a close connection between behaviourism and adaptive learning. Audrey Watters, no fan of adaptive technology, suggests that ‘any company touting adaptive learning software’ has been influenced by Skinner. In a more extended piece, ‘Education Technology and Skinner’s Box, Watters explores further her problems with Skinner and the educational technology that has been inspired by behaviourism. But writers much more sympathetic to adaptive learning, also see close connections to behaviourism. ‘The development of adaptive learning systems can be considered as a transformation of teaching machines,’ write Kara & Sevim[2] (2013: 114 – 117), although they go on to point out the differences between the two. Vendors of adaptive learning products, like DreamBox Learning©, are not shy of associating themselves with behaviourism: ‘Adaptive learning has been with us for a while, with its history of adaptive learning rooted in cognitive psychology, beginning with the work of behaviorist B.F. Skinner in the 1950s, and continuing through the artificial intelligence movement of the 1970s.’

That there is a strong connection between adaptive learning and behaviourism is indisputable, but I am not interested in attempting to establish the strength of that connection. This would, in any case, be an impossible task without some reductionist definition of both terms. Instead, my interest here is to explore some of the parallels between the two, and, in the spirit of the topic, I’d like to do this by comparing the behaviours of behaviourists and adaptive learning scientists.

Data and theory

Both behaviourism and adaptive learning (in its big data form) are centrally concerned with behaviour – capturing and measuring it in an objective manner. In both, experimental observation and the collection of ‘facts’ (physical, measurable, behavioural occurrences) precede any formulation of theory. John Mills’ description of behaviourists could apply equally well to adaptive learning scientists: theory construction was a seesaw process whereby one began with crude outgrowths from observations and slowly created one’s theory in such a way that one could make more and more precise observations, building those observations into the theory at each stage. No behaviourist ever considered the possibility of taking existing comprehensive theories of mind and testing or refining them.[3]

Positivism and the panopticon

Both behaviourism and adaptive learning are pragmatically positivist, believing that truth can be established by the study of facts. J. B. Watson, the founding father of behaviourism whose article ‘Psychology as the Behaviorist Views Itset the behaviourist ball rolling, believed that experimental observation could ‘reveal everything that can be known about human beings’[4]. Jose Ferreira of Knewton has made similar claims: We get five orders of magnitude more data per user than Google does. We get more data about people than any other data company gets about people, about anything — and it’s not even close. We’re looking at what you know, what you don’t know, how you learn best. […] We know everything about what you know and how you learn best because we get so much data. Digital data analytics offer something that Watson couldn’t have imagined in his wildest dreams, but he would have approved.

happiness industryThe revolutionary science

Big data (and the adaptive learning which is a part of it) is presented as a game-changer: The era of big data challenges the way we live and interact with the world. […] Society will need to shed some of its obsession for causality in exchange for simple correlations: not knowing why but only what. This overturns centuries of established practices and challenges our most basic understanding of how to make decisions and comprehend reality[5]. But the reverence for technology and the ability to reach understandings of human beings by capturing huge amounts of behavioural data was adumbrated by Watson a century before big data became a widely used term. Watson’s 1913 lecture at Columbia University was ‘a clear pitch’[6] for the supremacy of behaviourism, and its potential as a revolutionary science.

Prediction and controlnudge

The fundamental point of both behaviourism and adaptive learning is the same. The research practices and the theorizing of American behaviourists until the mid-1950s, writes Mills[7] were driven by the intellectual imperative to create theories that could be used to make socially useful predictions. Predictions are only useful to the extent that they can be used to manipulate behaviour. Watson states this very baldly: the theoretical goal of psychology is the prediction and control of behaviour[8]. Contemporary iterations of behaviourism, such as behavioural economics or nudge theory (see, for example, Thaler & Sunstein’s best-selling ‘Nudge’, Penguin Books, 2008), or the British government’s Behavioural Insights Unit, share the same desire to divert individual activity towards goals (selected by those with power), ‘without either naked coercion or democratic deliberation’[9]. Jose Ferreira of Knewton has an identical approach: We can predict failure in advance, which means we can pre-remediate it in advance. We can say, “Oh, she’ll struggle with this, let’s go find the concept from last year’s materials that will help her not struggle with it.” Like the behaviourists, Ferreira makes grand claims about the social usefulness of his predict-and-control technology: The end is a really simple mission. Only 22% of the world finishes high school, and only 55% finish sixth grade. Those are just appalling numbers. As a species, we’re wasting almost four-fifths of the talent we produce. […] I want to solve the access problem for the human race once and for all.


Because they rely on capturing large amounts of personal data, both behaviourism and adaptive learning quickly run into ethical problems. Even where informed consent is used, the subjects must remain partly ignorant of exactly what is being tested, or else there is the fear that they might adjust their behaviour accordingly. The goal is to minimise conscious understanding of what is going on[10]. For adaptive learning, the ethical problem is much greater because of the impossibility of ensuring the security of this data. Everything is hackable.


Behaviourism was seen as a god-send by the world of advertising. J. B. Watson, after a front-page scandal about his affair with a student, and losing his job at John Hopkins University, quickly found employment on Madison Avenue. ‘Scientific advertising’, as practised by the Mad Men from the 1920s onwards, was based on behaviourism. The use of data analytics by Google, Amazon, et al is a direct descendant of scientific advertising, so it is richly appropriate that adaptive learning is the child of data analytics.

[1] Hunter, D. and Smith, R. (2012) ‘Unpacking the past: “CLT” through ELTJ keywords’. ELT Journal, 66/4: 430-439.

[2] Kara, N. & Sevim, N. 2013. ‘Adaptive learning systems: beyond teaching machines’, Contemporary Educational Technology, 4(2), 108-120

[3] Mills, J. A. (1998) Control: A History of Behavioral Psychology. New York: New York University Press, p.5

[4] Davies, W. (2015) The Happiness Industry. London: Verso. p.91

[5] Mayer-Schönberger, V. & Cukier, K. (2013) Big Data. London: John Murray, p.7

[6] Davies, W. (2015) The Happiness Industry. London: Verso. p.87

[7] Mills, J. A. (1998) Control: A History of Behavioral Psychology. New York: New York University Press, p.2

[8] Watson, J. B. (1913) ‘Behaviorism as the Psychologist Views it’ Psychological Review 20: 158

[9] Davies, W. (2015) The Happiness Industry. London: Verso. p.88

[10] Davies, W. (2015) The Happiness Industry. London: Verso. p.92

Back in December 2013, in an interview with eltjam , David Liu, COO of the adaptive learning company, Knewton, described how his company’s data analysis could help ELT publishers ‘create more effective learning materials’. He focused on what he calls ‘content efficacy[i]’ (he uses the word ‘efficacy’ five times in the interview), a term which he explains below:

A good example is when we look at the knowledge graph of our partners, which is a map of how concepts relate to other concepts and prerequisites within their product. There may be two or three prerequisites identified in a knowledge graph that a student needs to learn in order to understand a next concept. And when we have hundreds of thousands of students progressing through a course, we begin to understand the efficacy of those said prerequisites, which quite frankly were made by an author or set of authors. In most cases they’re quite good because these authors are actually good in what they do. But in a lot of cases we may find that one of those prerequisites actually is not necessary, and not proven to be useful in achieving true learning or understanding of the current concept that you’re trying to learn. This is interesting information that can be brought back to the publisher as they do revisions, as they actually begin to look at the content as a whole.

One commenter on the post, Tom Ewens, found the idea interesting. It could, potentially, he wrote, give us new insights into how languages are learned much in the same way as how corpora have given us new insights into how language is used. Did Knewton have any plans to disseminate the information publicly, he asked. His question remains unanswered.

At the time, Knewton had just raised $51 million (bringing their total venture capital funding to over $105 million). Now, 16 months later, Knewton have launched their new product, which they are calling Knewton Content Insights. They describe it as the world’s first and only web-based engine to automatically extract statistics comparing the relative quality of content items — enabling us to infer more information about student proficiency and content performance than ever before possible.

The software analyses particular exercises within the learning content (and particular items within them). It measures the relative difficulty of individual items by, for example, analysing how often a question is answered incorrectly and how many tries it takes each student to answer correctly. It also looks at what they call ‘exhaustion’ – how much content students are using in a particular area – and whether they run out of content. The software can correlate difficulty with exhaustion. Lastly, it analyses what they call ‘assessment quality’ – how well  individual questions assess a student’s understanding of a topic.

Knewton’s approach is premised on the idea that learning (in this case language learning) can be broken down into knowledge graphs, in which the information that needs to be learned can be arranged and presented hierarchically. The ‘granular’ concepts are then ‘delivered’ to the learner, and Knewton’s software can optimise the delivery. The first problem, as I explored in a previous post, is that language is a messy, complex system: it doesn’t lend itself terribly well to granularisation. The second problem is that language learning does not proceed in a linear, hierarchical way: it is also messy and complex. The third is that ‘language learning content’ cannot simply be delivered: a process of mediation is unavoidable. Are the people at Knewton unaware of the extensive literature devoted to the differences between synthetic and analytic syllabuses, of the differences between product-oriented and process-oriented approaches? It would seem so.

Knewton’s ‘Content Insights’ can only, at best, provide some sort of insight into the ‘language knowledge’ part of any learning content. It can say nothing about the work that learners do to practise language skills, since these are not susceptible to granularisation: you simply can’t take a piece of material that focuses on reading or listening and analyse its ‘content efficacy at the concept level’. Because of this, I predicted (in the post about Knowledge Graphs) that the likely focus of Knewton’s analytics would be discrete item, sentence-level grammar (typically tenses). It turns out that I was right.

Knewton illustrate their new product with screen shots such as those below.















They give a specific example of the sort of questions their software can answer. It is: do students generally find the present simple tense easier to understand than the present perfect tense? Doh!

It may be the case that Knewton Content Insights might optimise the presentation of this kind of grammar, but optimisation of this presentation and practice is highly unlikely to have any impact on the rate of language acquisition. Students are typically required to study the present perfect at every level from ‘elementary’ upwards. They have to do this, not because the presentation in, say, Headway, is not optimised. What they need is to spend a significantly greater proportion of their time on ‘language use’ and less on ‘language knowledge’. This is not just my personal view: it has been extensively researched, and I am unaware of any dissenting voices.

The number-crunching in Knewton Content Insights is unlikely, therefore, to lead to any actionable insights. It is, however, very likely to lead (as writer colleagues at Pearson and other publishers are finding out) to an obsession with measuring the ‘efficacy’ of material which, quite simply, cannot meaningfully be measured in this way. It is likely to distract from much more pressing issues, notably the question of how we can move further and faster away from peddling sentence-level, discrete-item grammar.

In the long run, it is reasonable to predict that the attempt to optimise the delivery of language knowledge will come to be seen as an attempt to tackle the wrong question. It will make no significant difference to language learners and language learning. In the short term, how much time and money will be wasted?

[i] ‘Efficacy’ is the buzzword around which Pearson has built its materials creation strategy, a strategy which was launched around the same time as this interview. Pearson is a major investor in Knewton.

2014-09-30_2216Jose Ferreira, the fast-talking sales rep-in-chief of Knewton, likes to dazzle with numbers. In a 2012 talk hosted by the US Department of Education, Ferreira rattles off the stats: So Knewton students today, we have about 125,000, 180,000 right now, by December it’ll be 650,000, early next year it’ll be in the millions, and next year it’ll be close to 10 million. And that’s just through our Pearson partnership. For each of these students, Knewton gathers millions of data points every day. That, brags Ferreira, is five orders of magnitude more data about you than Google has. … We literally have more data about our students than any company has about anybody else about anything, and it’s not even close. With just a touch of breathless exaggeration, Ferreira goes on: We literally know everything about what you know and how you learn best, everything.

The data is mined to find correlations between learning outcomes and learning behaviours, and, once correlations have been established, learning programmes can be tailored to individual students. Ferreira explains: We take the combined data problem all hundred million to figure out exactly how to teach every concept to each kid. So the 100 million first shows up to learn the rules of exponents, great let’s go find a group of people who are psychometrically equivalent to that kid. They learn the same ways, they have the same learning style, they know the same stuff, because Knewton can figure out things like you learn math best in the morning between 8:40 and 9:13 am. You learn science best in 42 minute bite sizes the 44 minute mark you click right, you start missing questions you would normally get right.

The basic premise here is that the more data you have, the more accurately you can predict what will work best for any individual learner. But how accurate is it? In the absence of any decent, independent research (or, for that matter, any verifiable claims from Knewton), how should we respond to Ferreira’s contribution to the White House Education Datapalooza?

A 51Oy5J3o0yL._AA258_PIkin4,BottomRight,-46,22_AA280_SH20_OU35_new book by Stephen Finlay, Predictive Analytics, Data Mining and Big Data (Palgrave Macmillan, 2014) suggests that predictive analytics are typically about 20 – 30% more accurate than humans attempting to make the same judgements. That’s pretty impressive and perhaps Knewton does better than that, but the key thing to remember is that, however much data Knewton is playing with, and however good their algorithms are, we are still talking about predictions and not certainties. If an adaptive system could predict with 90% accuracy (and the actual figure is typically much lower than that) what learning content and what learning approach would be effective for an individual learner, it would still mean that it was wrong 10% of the time. When this is scaled up to the numbers of students that use Knewton software, it means that millions of students are getting faulty recommendations. Beyond a certain point, further expansion of the data that is mined is unlikely to make any difference to the accuracy of predictions.

A further problem identified by Stephen Finlay is the tendency of people in predictive analytics to confuse correlation and causation. Certain students may have learnt maths best between 8.40 and 9.13, but it does not follow that they learnt it best because they studied at that time. If strong correlations do not involve causality, then actionable insights (such as individualised course design) can be no more than an informed gamble.

Knewton’s claim that they know how every student learns best is marketing hyperbole and should set alarm bells ringing. When it comes to language learning, we simply do not know how students learn (we do not have any generally accepted theory of second language acquisition), let alone how they learn best. More data won’t help our theories of learning! Ferreira’s claim that, with Knewton, every kid gets a perfectly optimized textbook, except it’s also video and other rich media dynamically generated in real time is equally preposterous, not least since the content of the textbook will be at least as significant as the way in which it is ‘optimized’. And, as we all know, textbooks have their faults.

Cui bono? Perhaps huge data and predictive analytics will benefit students; perhaps not. We will need to wait and find out. But Stephen Finlay reminds us that in gold rushes (and internet booms and the exciting world of Big Data) the people who sell the tools make a lot of money. Far more strike it rich selling picks and shovels to prospectors than do the prospectors. Likewise, there is a lot of money to be made selling Big Data solutions. Whether the buyer actually gets any benefit from them is not the primary concern of the sales people. (p.16/17) Which is, perhaps, one of the reasons that some sales people talk so fast.

(This post was originally published at eltjam.)

learning_teaching_ngramWe now have young learners and very young learners, learner differences and learner profiles, learning styles, learner training, learner independence and autonomy, learning technologies, life-long learning, learning management systems, virtual learning environments, learning outcomes, learning analytics and adaptive learning. Much, but not perhaps all, of this is to the good, but it’s easy to forget that it wasn’t always like this.

The rise in the use of the terms ‘learner’ and ‘learning’ can be seen in policy documents, educational research and everyday speech, and it really got going in the mid 1980s[1]. Duncan Hunter and Richard Smith[2] have identified a similar trend in ELT after analysing a corpus of articles from the English Language Teaching Journal. They found that ‘learner’ had risen to near the top of the key-word pile in the mid 1980s, but had been practically invisible 15 years previously. Accompanying this rise has been a relative decline of words like ‘teacher’, ‘teaching’, ‘pupil’ and, even, ‘education’. Gert Biesta has described this shift in discourse as a ‘new language of learning’ and the ‘learnification of education’.

It’s not hard to see the positive side of this change in focus towards the ‘learner’ and away from the syllabus, the teachers and the institution in which the ‘learning’ takes place. We can, perhaps, be proud of our preference for learner-centred approaches over teacher-centred ones. We can see something liberating (for our students) in the change of language that we use. But, as Bingham and Biesta[3] have pointed out, this gain is also a loss.

The language of ‘learners’ and ‘learning’ focusses our attention on process – how something is learnt. This was a much-needed corrective after an uninterrupted history of focussing on end-products, but the corollary is that it has become very easy to forget not only about the content of language learning, but also its purposes and the social relationships through which it takes place.

There has been some recent debate about the content of language learning, most notably in the work of the English as a Lingua Franca scholars. But there has been much more attention paid to the measurement of the learners’ acquisition of that content (through the use of tools like the Pearson Global Scale of English). There is a growing focus on ‘granularized’ content – lists of words and structures, and to a lesser extent language skills, that can be easily measured. It looks as though other things that we might want our students to be learning – critical thinking skills and intercultural competence, for example – are being sidelined.

More significant is the neglect of the purposes of language learning. The discourse of ELT is massively dominated by the paying sector of private language schools and semi-privatised universities. In these contexts, questions of purpose are not, perhaps, terribly important, as the whole point of the enterprise can be assumed to be primarily instrumental. But the vast majority of English language learners around the world are studying in state-funded institutions as part of a broader educational programme, which is as much social and political as it is to do with ‘learning’. The ultimate point of English lessons in these contexts is usually stated in much broader terms. The Council of Europe’s Common European Framework of Reference, for example, states that the ultimate point of the document is to facilitate better intercultural understanding. It is very easy to forget this when we are caught up in the business of levels and scales and measuring learning outcomes.

Lastly, a focus on ‘learners’ and ‘learning’ distracts attention away from the social roles that are enacted in classrooms. 25 years ago, Henry Widdowson[4] pointed out that there are two quite different kinds of role. The first of these is concerned with occupation (student / pupil vs teacher / master / mistress) and is identifying. The second (the learning role) is actually incidental and cannot be guaranteed. He reminds us that the success of the language learning / teaching enterprise depends on ‘recognizing and resolving the difficulties inherent in the dual functioning of roles in the classroom encounter’[5]. Again, this may not matter too much in the private sector, but, elsewhere, any attempt to tackle the learning / teaching conundrum through an exclusive focus on learning processes is unlikely to succeed.

The ‘learnification’ of education has been accompanied by two related developments: the casting of language learners as consumers of a ‘learning experience’ and the rise of digital technologies in education. For reasons of space, I will limit myself to commenting on the second of these[6]. Research by Geir Haugsbakk and Yngve Nordkvelle[7] has documented a clear and critical link between the new ‘language of learning’ and the rhetoric of edtech advocacy. These researchers suggest that these discourses are mutually reinforcing, that both contribute to the casting of the ‘learner’ as a consumer, and that the coupling of learning and digital tools is often purely rhetorical.

One of the net results of ‘learnification’ is the transformation of education into a technical or technological problem to be solved. It suggests, wrongly, that approaches to education can be derived purely from theories of learning. By adopting an ahistorical and apolitical standpoint, it hides ‘the complex nexus of political and economic power and resources that lies behind a considerable amount of curriculum organization and selection’[8]. The very real danger, as Biesta[9] has observed, is that ‘if we fail to engage with the question of good education head-on – there is a real risk that data, statistics and league tables will do the decision-making for us’.

[1] 2004 Biesta, G.J.J. ‘Against learning. Reclaiming a language for education in an age of learning’ Nordisk Pedagogik 24 (1), 70-82 & 2010 Biesta, G.J.J. Good Education in an Age of Measurement (Boulder, Colorado: Paradigm Publishers)

[2] 2012 Hunter, D. & R. Smith ‘Unpackaging the past: ‘CLT’ through ELTJ keywords’ ELTJ 66/4 430-439

[3] 2010 Bingham, C. & Biesta, G.J.J. Jacques Rancière: Education, Truth, Emancipation (London: Continuum) 134

[4] 1990 Widdowson, H.G. Aspects of Language Teaching (Oxford: OUP) 182 ff

[5] 1987 Widdowson, H.G. ‘The roles of teacher and learner’ ELTJ 41/2

[6] A compelling account of the way that students have become ‘consumers’ can be found in 2013 Williams, J. Consuming Higher Education (London: Bloomsbury)

[7] 2007 Haugsbakk, G. & Nordkvelle, Y. ‘The Rhetoric of ICT and the New Language of Learning: a critical analysis of the use of ICT in the curricular field’ European Educational Research Journal 6/1 1 – 12

[8] 2004 Apple, M. W. Ideology and Curriculum 3rd edition (New York: Routledge) 28

[9] 2010 Biesta, G.J.J. Good Education in an Age of Measurement (Boulder, Colorado: Paradigm Publishers) 27



(This post won’t make a lot of sense unless you read the previous one – Researching research: part 1!)

dropoutsI suggested in the previous post that the research of Jayaprakash et al had confirmed something that we already knew concerning the reasons why some students drop out of college. However, predictive analytics are only part of the story. As the authors of this paper point out, they ‘do not influence course completion and retention rates without being combined with effective intervention strategies aimed at helping at-risk students succeed’. The point of predictive analytics is to facilitate the deployment of effective and appropriate interventions strategies, and to do this sooner than would be possible without the use of the analytics. So, it is to these intervention strategies that I now turn.

Interventions to help at-risk students included the following:

  • Sending students messages to inform them that they are at risk of not completing the course (‘awareness messaging’)
  • Making students more aware of the available academic support services (which could, for example, direct them to a variety of campus-based or online resources)
  • Promoting peer-to-peer engagement (e.g. with an online ‘student lounge’ discussion forum)
  • Providing access to self-assessment tools

The design of these interventions was based on the work that had been done at Purdue, which was, in turn, inspired by the work of Vince Tinto, one of the world’s leading experts on student retention issues.

The work done at Purdue had shown that simple notifications to students that they were at risk could have a significant, and positive, effect on student behaviour. Jayaprakash and the research team took the students who had been identified as at-risk by the analytics and divided them into three groups: the first were issued with ‘awareness messages’, the second were offered a combination of the other three interventions in the bullet point list above, and the third, a control group, had no interventions at all. The results showed that the students who were in treatment groups (of either kind of intervention) showed a statistically significant improvement compared to those who received no treatment at all. However, there seemed to be no difference in the effectiveness of the different kinds of intervention.

So far, so good, but, once again, I was left thinking that I hadn’t really learned very much from all this. But then, in the last five pages, the article suddenly got very interesting. Remember that the primary purpose of this whole research project was to find ways of helping not just at-risk students, but specifically socioeconomically disadvantaged at-risk students (such as those receiving Pell Grants). Accordingly, the researchers then focussed on this group. What did they find?

Once again, interventions proved more effective at raising student scores than no intervention at all. However, the averages of final scores are inevitably affected by drop-out rates (since students who drop out do not have final scores which can be included in the averages). At Purdue, the effect of interventions on drop-out rates had not been found to be significant. Remember that Purdue has a relatively well-off student demographic. However, in this research, which focussed on colleges with a much higher proportion of students on Pell Grants, the picture was very different. Of the Pell Grant students who were identified as at-risk and who were given some kind of treatment, 25.6% withdrew from the course. Of the Pell Grant students who were identified as at-risk but who were not ‘treated’ in any way (i.e. those in the control group), only 14.1% withdrew from the course. I recommend that you read those numbers again!

The research programme had resulted in substantially higher drop-out rates for socioeconomically disadvantaged students – the precise opposite of what it had set out to achieve. Jayaprakash et al devote one page of their article to the ethical issues this raises. They suggest that early intervention, resulting in withdrawal, might actually be to the benefit of some students who were going to fail whatever happened. It is better to get a ‘W’ (withdrawal) grade on your transcript than an ‘F’ (fail), and you may avoid wasting your money at the same time. This may be true, but it would be equally true that not allowing at-risk students (who, of course, are disproportionately from socioeconomically disadvantaged backgrounds) into college at all might also be to their ‘benefit’. The question, though, is: who has the right to make these decisions on behalf of other people?

The authors also acknowledge another ethical problem. The predictive analytics which will prompt the interventions are not 100% accurate. 85% accuracy could be considered a pretty good figure. This means that some students who are not at-risk are labelled as at-risk, and other who are at-risk are not identified. Of these two possibilities, I find the first far more worrying. We are talking about the very real possibility of individual students being pushed into making potentially life-changing decisions on the basis of dodgy analytics. How ethical is that? The authors’ conclusion is that the situation forces them ‘to develop the most accurate predictive models possible, as well as to take steps to reduce the likelihood that any intervention would result in the necessary withdrawal of a student’.

I find this extraordinary. It is premised on the assumption that predictive models can be made much, much more accurate. They seem to be confusing prediction and predeterminism. A predictive model is, by definition, only predictive. There will always be error. How many errors are ethically justifiable? And, the desire to reduce the likelihood of unnecessary withdrawals is a long way from the need to completely eliminate the likelihood of unnecessary withdrawals, which seems to me to be the ethical position. More than anything else in the article, this sentence illustrates that the a priori assumption is that predictive analytics can be a force for good, and that the only real problem is getting the science right. If a number of young lives are screwed up along the way, we can at least say that science is getting better.

In the authors’ final conclusion, they describe the results of their research as ‘promising’. They do not elaborate on who it is promising for. They say that relatively simple intervention strategies can positively impact student learning outcomes, but they could equally well have said that relatively simple intervention strategies can negatively impact learning outcomes. They could have said that predictive analytics and intervention programmes are fine for the well-off, but more problematic for the poor. Remembering once more that the point of the study was to look at the situation of socioeconomically disadvantaged at-risk students, it is striking that there is no mention of this group in the researchers’ eight concluding points. The vast bulk of the paper is devoted to technical descriptions of the design and training of the software; the majority of the conclusions are about the validity of that design and training. The ostensibly intended beneficiaries have got lost somewhere along the way.

How and why is it that a piece of research such as this can so positively slant its results? In the third and final part of this mini-series, I will turn my attention to answering that question.

article-2614966-1D6DC26500000578-127_634x776In the 8th post on this blog (‘Theory, Research and Practice’), I referred to the lack of solid research into learning analytics. Whilst adaptive learning enthusiasts might disagree with much, or even most, of what I have written on this subject, here, at least, was an area of agreement. May of this year, however, saw the launch of the inaugural issue of the Journal of Learning Analytics, the first journal ‘dedicated to research into the challenges of collecting, analysing and reporting data with the specific intent to improve learning’. It is a peer-reviewed, open-access journal, available here , which is published by the Society for Learning Analytics Research (SoLAR), a consortium of academics from 9 universities in the US, Canada, Britain and Australia.

I decided to take a closer look. In this and my next two posts, I will focus on one article from this inaugural issue. It’s called Early Alert of Academically At‐Risk Students: An Open Source Analytics Initiative and it is co-authored by Sandeep M. Jayaprakash, Erik W. Moody, Eitel J.M. Lauría, James R. Regan, and Joshua D. Baron of Marist College in the US. Bear with me, please – it’s more interesting than it might sound!

The background to this paper is the often referred to problem of college drop-outs in the US, and the potential of learning analytics to address what is seen as a ‘national challenge’. The most influential work that has been done in this area to date was carried out at Purdue University. Purdue developed an analytical system, called Course Signals, which identified students at risk of course failure and offered a range of interventions (more about these in the next post) which were designed to improve student outcomes. I will have more to say about the work at Purdue in my third post, but, for the time being, it is enough to say that, in the field, it has been considered very successful, and that the authors of the paper I looked at have based their approach on the work done at Purdue.

Jayaprakash et al developed their own analytical system, based on Purdue’s Course Signals, and used it at their own institution, Marist College. Basically, they wanted to know if they could replicate the good results that had been achieved at Purdue. They then took the same analytical system to four different institutions, of very different kinds (public, as opposed to private; community colleges offering 2-year programmes rather than universities) to see if the results could be replicated there, too. They also wanted to find out if the interventions with students who had been signalled as at-risk would be as effective as they had been at Purdue. So far, so good: it is clearly very important to know if one particular piece of research has any significance beyond its immediate local context.

So, what did Jayaprakash et al find out? Basically, they learnt that their software worked as well at Marist as Course Signals had done at Purdue. They collected data on student demographics and aptitude, course grades and course related data, data on students’ interactions with the LMS they were using and performance data captured by the LMS. Oh, yes, and absenteeism. At the other institutions where they trialled their software, the system was 10% less accurate in predicting drop-outs, but the authors of the research still felt that ‘predictive models developed based on data from one institution may be scalable to other institutions’.

But more interesting than the question of whether or not the predictive analytics worked is the question of which specific features of the data were the most powerful predictors. What they discovered was that absenteeism was highly significant. No surprises there. They also learnt that the other most powerful predictors were (1) the students’ cumulative grade point average (GPA), an average of a student’s academic scores over their entire academic career, and (2) the scores recorded by the LMS of the work that students had done during the course which would contribute to their final grade. No surprises there, either. As the authors point out, ‘given that these two attributes are such fundamental aspects of academic success, it is not surprising that the predictive model has fared so well across these different institutions’.

Agreed, it is not surprising at all that students with lower scores and a history of lower scores are more likely to drop out of college than students with higher scores. But, I couldn’t help wondering, do we really need sophisticated learning analytics to tell us this? Wouldn’t any teacher know this already? They would, of course, if they knew their students, but if the teacher: student ratio is in the order of 1: 100 (not unheard of in lower-funded courses delivered primarily through an LMS), many teachers (and their students) might benefit from automated alert systems.

But back to the differences between the results at Purdue and Marist and at the other institutions. Why were the predictive analytics less successful at the latter? The answer is in the nature of the institutions. Essentially, it boils down to this. In institutions with low drop-out rates, the analytics are more reliable than in institutions with high drop-out rates, because the more at-risk students there are, the harder it is to predict the particular individuals who will actually drop out. Jayaprakash et al provide the key information in a useful table. Students at Marist College are relatively well-off (only 16% receive Pell Grants, which are awarded to students in financial need), and only a small number (12%) are from ‘ethnic minorities’. The rate of course non-completion in normal time is relatively low (at 20%). In contrast, at one of the other institutions, the College of the Redwoods in California, 44% of the students receive Pell Grants and 22% of them are from ‘ethnic minorities’. The non-completion rate is a staggering 96%. At Savannah State University, 78% of the students receive Pell Grants, and the non-completion rate is 70%. The table also shows the strong correlation between student poverty and high student: faculty ratios.

In other words, the poorer you are, the less likely you are to complete your course of study, and the less likely you are to know your tutors (these two factors also correlate). In other other words, the whiter you are, the more likely you are to complete your course of study (because of the strong correlations between race and poverty). While we are playing the game of statistical correlations, let’s take it a little further. As the authors point out, ‘there is considerable evidence that students with lower socio-economic status have lower GPAs and graduation rates’. If, therefore, GPAs are one of the most significant predictors of academic success, we can say that socio-economic status (and therefore race) is one of the most significant predictors of academic success … even if the learning analytics do not capture this directly.

Actually, we have known this for a long time. The socio-economic divide in education is frequently cited as one of the big reasons for moving towards digitally delivered courses. This particular piece of research was funded (more about this in the next posts) with the stipulation that it ‘investigated and demonstrated effective techniques to improve student retention in socio-economically disadvantaged populations’. We have also known for some time that digitally delivered education increases the academic divide between socio-economic groups. So what we now have is a situation where a digital technology (learning analytics) is being used as a partial solution to a problem that has always been around, but which has been exacerbated by the increasing use of another digital technology (LMSs) in education. We could say, then, that if we weren’t using LMSs, learning analytics would not be possible … but we would need them less, anyway.

My next post will look at the results of the interventions with students that were prompted by the alerts generated by the learning analytics. Advance warning: it will make what I have written so far seem positively rosy.

Pearson’s ‘Efficacy’ initiative is a series of ‘commitments designed to measure and increase the company’s impact on learning outcomes around the world’. The company’s dedicated website  offers two glossy brochures with a wide range of interesting articles, a good questionnaire tool that can be used by anyone to measure the efficacy of their own educational products or services, as well as an excellent selection of links to other articles, some of which are critical of the initiative. These include Michael Feldstein’s long blog post  ‘Can Pearson Solve the Rubric’s Cube?’ which should be a first port of call for anyone wanting to understand better what is going on.

What does it all boil down to? The preface to Pearson’s ‘Asking More: the Path to Efficacy’ by CEO John Fallon provides a succinct introduction. Efficacy in education, says Fallon, is ‘making a measurable impact on someone’s life through learning’. ‘Measurable’ is the key word, because, as Fallon continues, ‘it is increasingly possible to determine what works and what doesn’t in education, just as in healthcare.’ We need ‘a relentless focus’ on ‘the learning outcomes we deliver’ because it is these outcomes that can be measured in ‘a systematic, evidence-based fashion’. Measurement, of course, is all the easier when education is delivered online, ‘real-time learner data’ can be captured, and the power of analytics can be deployed.

Pearson are very clearly aligning themselves with recent moves towards a more evidence-based education. In the US, Obama’s Race to the Top is one manifestation of this shift. Britain (with, for example, the Education Endowment Foundation) and France (with its Fonds d’Expérimentation pour la Jeunesse ) are both going in the same direction. Efficacy is all about evidence-based practice.

Both the terms ‘efficacy’ and ‘evidence-based practice’ come originally from healthcare. Fallon references this connection in the quote two paragraphs above. In the UK last year, Ben Goldacre (medical doctor, author of ‘Bad Science’ and a relentless campaigner against pseudo-science) was commissioned by the UK government to write a paper entitled ‘Building Evidence into Education’ . In this, he argued for the need to introduce randomized controlled trials into education in a similar way to their use in medicine.

As Fallon observed in the preface to the Pearson ‘Efficacy’ brochure, this all sounds like ‘common sense’. But, as Ben Goldacre discovered, things are not so straightforward in education. An excellent article in The Guardian outlined some of the problems in Goldacre’s paper.

With regard to ELT, Pearson’s ‘Efficacy’ initiative will stand or fall with the validity of their Global Scale of English, discussed in my March post ‘Knowledge Graphs’ . However, there are a number of other considerations that make the whole evidence-based / efficacy business rather less common-sensical than might appear at first glance.

  • The purpose of English language teaching and learning (at least, in compulsory education) is rather more than simply the mastery of grammatical and lexical systems, or the development of particular language skills. Some of these other purposes (e.g. the development of intercultural competence or the acquisition of certain 21st century skills, such as creativity) continue to be debated. There is very little consensus about the details of what these purposes (or outcomes) might be, or how they can be defined. Without consensus about these purposes / outcomes, it is not possible to measure them.
  • Even if we were able to reach a clear consensus, many of these outcomes do not easily lend themselves to measurement, and even less to low-cost measurement.
  • Although we clearly need to know what ‘works’ and what ‘doesn’t work’ in language teaching, there is a problem in assigning numerical values. As the EduThink blog observes, ‘the assignation of numerical values is contestable, problematic and complex. As teachers and researchers we should be engaging with the complexity [of education] rather than the reductive simplicities of [assigning numerical values]’.
  • Evidence-based medicine has resulted in unquestionable progress, but it is not without its fierce critics. A short summary of the criticisms can be found here .  It would be extremely risky to assume that a contested research procedure from one discipline can be uncritically applied to another.
  • Kathleen Graves, in her plenary at IATEFL 2014, ‘The Efficiency of Inefficiency’, explicitly linked health care and language teaching. She described a hospital where patient care was as much about human relationships as it was about medical treatment, an aspect of the hospital that went unnoticed by efficiency experts, since this could not be measured. See this blog for a summary of her talk.

These issues need to be discussed much further before we get swept away by the evidence-based bandwagon. If they are not, the real danger is that, as John Fallon cautions, we end up counting things that don’t really count, and we don’t count the things that really do count. Somehow, I doubt that an instrument like the Global Scale of English will do the trick.