Archive for the ‘testing’ Category

One of the most common criticisms of schooling is that it typically requires learners to study in lockstep, with everyone expected to use the same learning material at the same pace to achieve the same learning objectives. From everything we know about individual learner differences, this is an unreasonable and unrealisable expectation. It is only natural, therefore, that we should assume that self-paced learning is a better option. Self-paced learning is at the heart of technology-driven personalized learning. Often, it is the only meaningfully personalized aspect of technology-delivered courses.

Unfortunately, almost one hundred years of attempts to introduce elements of self-pacing into formal language instruction have failed to produce conclusive evidence of its benefits. For a more detailed look at the history of these failures, see my blog post on the topic, and for a more detailed look at Programmed Learning, a 1960s attempt to introduce self-pacing, see this post. This is not to say that self-pacing does not have a potentially important role to play. However, history should act as a warning that the simple provision of self-pacing opportunities through technology may be a necessary condition for successful self-pacing, but it is not a sufficient condition.

Of all the different areas of language learning that can be self-paced, I’ve long thought that technology might help the development of listening skills the most. Much contemporary real-world listening is, in any case, self-paced: why should the classroom not be? With online listening, we can use a variety of help options (Cross, 2017) – pause, rewind, speed control, speech-to-text, dictionary look-up, video / visual support – and we control the frequency and timing of this use. Online listening has become a ‘semi-recursive activity, less dependent on transient memory, inching its way closer to reading’ (Robin, 2007: 110). We don’t know which of these help options and which permutations of these options are most likely to lead to gains in listening skills, but it seems reasonable to believe that some of these options have strong potential. It is perhaps unlikely that research could ever provide a definitive answer to the question of optimal help options: different learners have different needs and different preferences (Cárdenas-Claros & Gruba, 2014). But what is clear is that self-pacing is necessary for these options to be used.

Moving away from whole-class lockstep listening practice towards self-paced independent listening has long been advocated by experts. John Field (2008: 47) identified a key advantage of independent listening: a learner ‘can replay the recording as often as she needs (achieving the kind of recursion that reading offers) and can focus upon specific stretches of the input which are difficult for her personally rather than for the class as a whole’. More recently, interest has also turned to the possibility of self-paced listening in assessment practices (Goodwin, 2017).

So, self-paced listening: what’s not to like? I’ve been pushing it with the teachers I work with for some time. But a recent piece of research from Kathrin Eberharter and colleagues (Eberharter et al., 2023) has given me pause for thought. The researchers wanted to know what effect self-pacing would have on the assessment of listening comprehension in a group of young teenage Austrian learners. They were particularly interested in how learners with SpLDs would be affected, and assumed that self-pacing would boost the performance of these learners. Disappointingly, they were wrong. Not only did self-pacing have, on average, no measurable impact on performance, it also seems that self-pacing may have put learners with shorter working-memory capacity and L1 literacy-related challenges at a disadvantage.

This research concerned self-paced listening in assessment (in this case the TOEFL Junior Standard test), not in learning. But might self-paced listening as part of a learning programme not be quite as beneficial as we might hope? The short answer, as ever, is probably that it depends. Eberhart et al speculate that young learners ‘might need explicit training and more practice in regulating their strategic listening behaviour in order to be able to improve their performance with the help of self-pacing’. This probably holds true for many older learners, too. In other words, it’s not the possibility of self-pacing in itself that will make a huge difference: it’s what a learner does or does not do while they are self-pacing that matters. To benefit from the technological affordances of online listening, learners need to know which strategies (and which tools) may help them. They may need ‘explicit training in exploiting the benefits of navigational freedom to enhance their metacognitive strategy use’ (Eberhart et al. 2023: 17). This shouldn’t surprise us: the role of metacognition is well established (Goh & Vandergrift, 2021).

As noted earlier, we do not really know which permutations of help options are likely to be of most help, but it is a relatively straightforward matter to encourage learners to experiment with them. We do, however, have a much clearer idea of the kinds of listening strategies that are likely to have a positive impact, and the most obvious way of providing this training is in the classroom. John Field (2008) suggested many approaches; Richard Cauldwell (2013) offers more; and Sheila Thorn’s recent ‘Integrating Authentic Listening into the Language Classroom’ (2021) adds yet more. If learners’ metacognitive knowledge, effective listening and help-option skills are going to develop, the training will need to involve ‘a cyclic approach […] throughout an entire course’ (Cross, 2017: 557).

If, on the other hand, our approach to listening in the classroom continues to be (as it is in so many coursebooks) one of testing listening through comprehension questions, we should not be too surprised when learners have little idea what strategy to approach when technology allows self-pacing. Self-paced self-testing of listening comprehension is likely to be of limited value.

References

Cárdenas-Claros, M. S. & Gruba, P. A. (2014) Listeners’ interactions with help options in CALL. Computer Assisted Language Learning, 27 (3): 228 – 245

Cauldwell, R. (2013) Phonology for Listening: Teaching the Stream of Speech. Speech in Action

Cross, J. (2017) Help options for L2 listening in CALL: A research agenda. Language Teaching, 50 (4), 544–560. https://doi.org/10.1017/S0261444817000209

Eberharter,K., Kormos, J.,  Guggenbichler, E.,  Ebner, V. S., Suzuki, S.,  Moser-Frötscher, D., Konrad, E. & Kremmel, B. (2023) Investigating the impact of self-pacing on the L2 listening performance of young learner candidates with differing L1 literacy skills. Language Testing 0 10.1177/02655322221149642 https://journals.sagepub.com/doi/epub/10.1177/02655322221149642

Field, J. (2008) Listening in the Language Classroom. Cambridge: Cambridge University Press

Goh, C. C. M. & Vandergrift, L. (2021) Teaching and learning second language listening: Metacognition in action (2nd ed.). Routledge. https://doi.org/10.4324/9780429287749

Goodwin, S. J. (2017) Locus of control in L2 English listening assessment [Doctoral dissertation]. Georgia State University. https://scholarworks.gsu.edu/cgi/viewcontent.cgi?article=1037&context=alesl_diss

Robin, R. (2007) Commentary: Learner-based listening and technological authenticity. Language Learning & Technology, 11 (1): 109-115. https://www.lltjournal.org/item/461/

Thorn, S. (2021) Integrating Authentic Listening into the Language Classroom. Shoreham-by-Sea: Pavilion

The paragraph above was written by an AI-powered text generator called neuroflash https://app.neuro-flash.com/home which I told to produce a text on the topic ‘AI and education’. As texts on this topic go, it is both remarkable (in that it was not written by a human) and entirely unremarkable (in that it is practically indistinguishable from hundreds of human-written texts on the same subject). Neuroflash uses a neural network technology called GPT-3 – ‘a large language model’ – and ‘one of the most interesting and important AI systems ever produced’ (Chalmers, 2020). Basically, it generates text by predicting sequences of words based on huge databases. The nature of the paragraph above tells you all you need to know about the kinds of content that are usually found in texts about AI and education.

Not dissimilar from the neuroflash paragraph, educational commentary on uses of AI is characterised by (1) descriptions of AI tools already in use (e.g. speech recognition and machine translation) and (2) vague predictions which invariably refer to ‘the promise of personalised learning, adjusting what we give learners according to what they need to learn and keeping them motivated by giving them content that is of interest to them’ (Hughes, 2022). The question of what precisely will be personalised is unanswered: providing learners with optimal sets of resources (but which ones?), providing counselling services, recommendations or feedback for learners and teachers (but of what kind?) (Luckin, 2022). Nearly four years ago, I wrote https://adaptivelearninginelt.wordpress.com/2018/08/13/ai-and-language-teaching/ about the reasons why these questions remain unanswered. The short answer is that AI in language learning requires a ‘domain knowledge model’. This specifies what is to be learnt and includes an analysis of the steps that must be taken to reach that learning goal. This is lacking in SLA, or, at least, there is no general agreement on what it is. Worse, the models that are most commonly adopted in AI-driven programs (e.g. the deliberate learning of discrete items of grammar and vocabulary) are not supported by either current theory or research (see, for example, VanPatten & Smith, 2022).

In 2021, the IATEFL Learning Technologies SIG organised an event dedicated to AI in education. Unsurprisingly, there was a fair amount of input on AI in assessment, but my interest is in how AI might revolutionize how we learn and teach, not how we assess. What concrete examples did speakers provide?

Rose Luckin, the most well-known British expert on AI in education, kicked things off by mentioning three tools. One of these, Carnegie Learning, is a digital language course that looks very much like any of the ELT courses on offer from the big publishers – a fully blendable, multimedia (e.g. flashcards and videos) synthetic syllabus. This ‘blended learning solution’ is personalizable, since ‘no two students learn alike’, and, it claims, will develop a ‘lifelong love of language’. It appears to be premised on the idea of language learning as optimizing the delivery of ‘content’, of this content consisting primarily of discrete items, and of equating input with uptake. Been there, done that.

A second was Alelo Enskill https://www.alelo.com/about-us/ a chatbot / avatar roleplay program, first developed by the US military to teach Iraqi Arabic and aspects of Iraqi culture to Marines. I looked at the limitations of chatbot technology for language learning here https://adaptivelearninginelt.wordpress.com/2016/12/01/chatbots/ . The third tool mentioned by Luckin was Duolingo. Enough said.

Another speaker at this event was the founder and CEO of Edugo.AI https://www.edugo.ai/ , an AI-powered LMS which uses GPT-3. It allows schools to ‘create and upload on the platform any kind of language material (audio, video, text…). Our AI algorithms process and convert it in gamified exercises, which engage different parts of the brain, and gets students eager to practice’. Does this speaker know anything about gamification (for a quick read, I’d recommend Paul Driver (2012)) or neuroscience, I wonder. What, for that matter, does he know about language learning? Apparently, ‘language is not just about words, language is about sentences’ (Tomasello, 2022). Hmm, this doesn’t inspire confidence.

When you look at current uses of AI in language learning, there is very little (outside of testing, translation and speech ↔ text applications) that could justify enthusiastic claims that AI has any great educational potential. Skepticism seems to me a more reasonable and scientific response: de omnibus dubitandum.

Education is not the only field where AI has been talked up. When Covid hit us, AI was seen as the game-changing technology. It ‘could be deployed to make predictions, enhance efficiencies, and free up staff through automation; it could help rapidly process vast amounts of information and make lifesaving decisions’ (Chakravorti, 2022). The contribution of AI to the development of vaccines has been huge, but its role in diagnosing and triaging patients has been another matter altogether. Hundreds of predictive tools were developed: ‘none of them made a real difference, and some were potentially harmful’ (Heaven, 2021). Expectations were unrealistic and led to the deployment of tools before they were properly trialled. Thirty months down the line, a much more sober understanding of the potential of AI has emerged. Here, then, are the main lessons that have been learnt (I draw particularly on Engler, 2020, and Chakravorti, 2022) that are also relevant to education and language learning.

  • Anticipate what could go wrong before anticipating what might go right. Engler (2020) writes that ‘a poorly kept secret of AI practitioners is that 96% accuracy is suspiciously high for any machine learning problem’. In language learning, it is highly unlikely that personalized recommendations will ever reach anything even approaching this level of reliability. What are the implications for individual learners whose learning is inappropriately personalised?
  • We also know that a significant problem with AI systems is bias (O’Neil, 2016). There is a well-documented history of discriminatory outcomes because of people’s race, gender, social class or disability profile. Bias needs to be addressed proactively, not reactively.
  • Acknowledge from the outset that, for AI to work, huge amounts of data related to prior outcomes will be needed. In the cases of both Covid and language learning, much of this data will be personal. This raises immediate questions of privacy and consent, especially for learners who are children. Don’t minimise this issue.
  • Common by-products of AI use include insecure data storage, digital surveillance and consolidation of market power (Engler, 2020). Since such issues are so pervasive, assume they will be a problem, and need to be considered in evaluating the potential of any piece of AI edtech.
  • Find out what the subject matter experts have to say: ‘AI is only helpful when applied judiciously by subject-matter experts—people with long-standing experience with the problem that they are trying to solve […] Effectively predicting a badly defined problem is worse than doing nothing at all’ (Engler, 2020). It may be that the problem the AI is trying to help solve is the wrong problem and not worth solving at all (e.g. more efficient ‘delivery’ of explicit grammatical knowledge, gamifying gapfills).

To bring this post to an end, I returned to the AI / GTP-3 driven neuroflash program. This time, I asked it to write a longer piece entitled ‘The dangers of artificial intelligence in education: what could go wrong?’. Here’s what it came up with:

There are a few potential dangers of artificial intelligence in education. One is that AI could be used to replace human teachers. This could lead to a decline in the quality of education, as machines are not yet capable of replicating the full range of human emotions and social interactions that are necessary for teaching. Additionally, AI could be used to create biased or “racist” educational materials, as has already happened with some commercial software products. Finally, AI could be used to track and monitor students in ways that invade their privacy, or that could be used to manipulate their behavior.

All of these dangers are real and should be taken seriously. However, it’s important to remember that AI is still a new technology, and its impact on education is not yet fully understood. As such, we should tread cautiously when implementing AI into our educational systems.

When it comes to education, artificial intelligence could be a double-edged sword. While AI has the potential to revolutionize the field of education and help students learn in more efficient ways, there are also dangers that need to be considered.

One of the dangers of AI in education is that it could lead to a more standardized and homogenized form of learning. If all students are relying on AI-based learning tools, they may all end up learning the same things in the same way. This could lead to a loss of creativity and individuality in education.

Another danger is that AI-based learning tools could become too reliant on data and algorithms. This could lead to a situation where students are simply being taught what the machines think they should know, rather than being allowed to explore and discover new things for themselves.

Finally, there is the danger that AI could eventually replace human teachers altogether. While this may seem like a far-fetched scenario, it is not outside the realm of possibility. If AI-based learning tools become sophisticated enough, they may one day be able to do the job of teaching just as well as humans can.

These are just some of the dangers that need to be considered when it comes to artificial intelligence in education. While AI has the potential to revolutionize education, we need to be careful about how we implement it so that we don’t end up doing more harm than good.

References

Chakravorti, B. (2022) Why AI Failed to Live Up to Its Potential During the Pandemic. Harvard Business Review March 17,2022. https://hbr.org/2022/03/why-ai-failed-to-live-up-to-its-potential-during-the-pandemic

Chalmers, D. (2020) Weinberg, Justin (ed.). “GPT-3 and General Intelligence”. Daily Nous. Philosophers On GPT-3 (updated with replies by GPT-3) July 30, 2020. https://dailynous.com/2020/07/30/philosophers-gpt-3/#chalmers

Driver, P. (2012) The Irony of Gamification. In English Digital Magazine 3, British Council Portugal, pp. 21 – 24 http://digitaldebris.info/digital-debris/2011/12/31/the-irony-of-gamification-written-for-ied-magazine.html

Engler, A. (2020) A guide to healthy skepticism of artificial intelligence and coronavirus. Washington D.C.: Brookings Institution https://www.brookings.edu/research/a-guide-to-healthy-skepticism-of-artificial-intelligence-and-coronavirus/

Heaven, W. D. (2021) Hundreds of AI tools have been built to catch covid. None of them helped. MIT Technology Review, July 30, 2021. https://www.technologyreview.com/2021/07/30/1030329/machine-learning-ai-failed-covid-hospital-diagnosis-pandemic/

Hughes, G. (2022) What lies at the end of the AI rainbow? IATEFL LTSIG Newsletter Issue April 2022

Luckin, R. (2022) The implications of AI for language learning and teaching. IATEFL LTSIG Newsletter Issue April 2022

O’Neil, C. (2016) Weapons of Math Destruction. London: Allen Lane

Tomasello, G. (2022) Next Generation of AI-Language Education Software:NLP & Language Modules (GPT3). IATEFL LTSIG Newsletter Issue April 2022

VanPatten, B. & Smith, M. (2022) Explicit and Implicit Learning in Second Language Acquisition. Cambridge: Cambridge University Press

I was intrigued to learn earlier this year that Oxford University Press had launched a new online test of English language proficiency, called the Oxford Test of English (OTE). At the conference where I first heard about it, I was struck by the fact that the presentation of the OUP sponsored plenary speaker was entitled ‘The Power of Assessment’ and dealt with formative assessment / assessment for learning. Oxford clearly want to position themselves as serious competitors to Pearson and Cambridge English in the testing business.

The brochure for the exam kicks off with a gem of a marketing slogan, ‘Smart. Smarter. SmarTest’ (geddit?), and the next few pages give us all the key information.

Faster and more flexible‘Traditional language proficiency tests’ is presumably intended to refer to the main competition (Pearson and Cambridge English). Cambridge First takes, in total, 3½ hours; the Pearson Test of English Academic takes 3 hours. The OTE takes, in total, 2 hours and 5 minutes. It can be taken, in theory, on any day of the year, although this depends on the individual Approved Test Centres, and, again, in theory, it can be booked as little as 14 days in advance. Results should take only two weeks to arrive. Further flexibility is offered in the way that candidates can pick ’n’ choose which of the four skills they want to have tests, just one or all four, although, as an incentive to go the whole hog, they will only get a ‘Certificate of Proficiency’ if they do all four.

A further incentive to do all four skills at the same time can be found in the price structure. One centre in Spain is currently offering the test for one single skill at Ꞓ41.50, but do the whole lot, and it will only set you back Ꞓ89. For a high-stakes test, this is cheap. In the UK right now, both Cambridge First and Pearson Academic cost in the region of £150, and IELTS a bit more than that. So, faster, more flexible and cheaper … Oxford means business.

Individual experience

The ‘individual experience’ on the next page of the brochure is pure marketing guff. This is, after all, a high-stakes, standardised test. It may be true that ‘the Speaking and Writing modules provide randomly generated tasks, making the overall test different each time’, but there can only be a certain number of permutations. What’s more, in ‘traditional tests’, like Cambridge First, where there is a live examiner or two, an individualised experience is unavoidable.

More interesting to me is the reference to adaptive technology. According to the brochure, ‘The Listening and Reading modules are adaptive, which means the test difficulty adjusts in response to your answers, quickly finding the right level for each test taker. This means that the questions are at just the right level of challenge, making the test shorter and less stressful than traditional proficiency tests’.

My curiosity piqued, I decided to look more closely at the Reading module. I found one practice test online which is the same as the demo that is available at the OTE website . Unfortunately, this example is not adaptive: it is at B1 level. The actual test records scores between 51 and 140, corresponding to levels A2, B1 and B2.

Test scores

The tasks in the Reading module are familiar from coursebooks and other exams: multiple choice, multiple matching and gapped texts.

Reading tasks

According to the exam specifications, these tasks are designed to measure the following skills:

  • Reading to identify main message, purpose, detail
  • Expeditious reading to identify specific information, opinion and attitude
  • Reading to identify text structure, organizational features of a text
  • Reading to identify attitude / opinion, purpose, reference, the meanings of words in context, global meaning

The ability to perform these skills depends, ultimately, on the candidate’s knowledge of vocabulary and grammar, as can be seen in the examples below.

Task 1Task 2

How exactly, I wonder, does the test difficulty adjust in response to the candidate’s answers? The algorithm that is used depends on measures of the difficulty of the test items. If these items are to be made harder or easier, the only significant way that I can see of doing this is by making the key vocabulary lower- or higher-frequency. This, in turn, is only possible if vocabulary and grammar has been tagged as being at a particular level. The most well-known tools for doing this have been developed by Pearson (with the GSE Teacher Toolkit ) and Cambridge English Profile . To the best of my knowledge, Oxford does not yet have a tool of this kind (at least, none that is publicly available). However, the data that OUP will accumulate from OTE scripts and recordings will be invaluable in building a database which their lexicographers can use in developing such a tool.

Even when a data-driven (and numerically precise) tool is available for modifying the difficulty of test items, I still find it hard to understand how the adaptivity will impact on the length or the stress of the reading test. The Reading module is only 35 minutes long and contains only 22 items. Anything that is significantly shorter must surely impact on the reliability of the test.

My conclusion from this is that the adaptive element of the Reading and Listening modules in the OTE is less important to the test itself than it is to building a sophisticated database (not dissimilar to the GSE Teacher Toolkit or Cambridge English Profile). The value of this will be found, in due course, in calibrating all OUP materials. The OTE has already been aligned to the Oxford Online Placement Test (OOPT) and, presumably, coursebooks will soon follow. This, in turn, will facilitate a vertically integrated business model, like Pearson and CUP, where everything from placement test, to coursework, to formative assessment, to final proficiency testing can be on offer.

ltsigIt’s hype time again. Spurred on, no doubt, by the current spate of books and articles  about AIED (artificial intelligence in education), the IATEFL Learning Technologies SIG is organising an online event on the topic in November of this year. Currently, the most visible online references to AI in language learning are related to Glossika , basically a language learning system that uses spaced repetition, whose marketing department has realised that references to AI might help sell the product. GlossikaThey’re not alone – see, for example, Knowble which I reviewed earlier this year .

In the wider world of education, where AI has made greater inroads than in language teaching, every day brings more stuff: How artificial intelligence is changing teaching , 32 Ways AI is Improving Education , How artificial intelligence could help teachers do a better job , etc., etc. There’s a full-length book by Anthony Seldon, The Fourth Education Revolution: will artificial intelligence liberate or infantilise humanity? (2018, University of Buckingham Press) – one of the most poorly researched and badly edited books on education I’ve ever read, although that won’t stop it selling – and, no surprises here, there’s a Pearson commissioned report called Intelligence Unleashed: An argument for AI in Education (2016) which is available free.

Common to all these publications is the claim that AI will radically change education. When it comes to language teaching, a similar claim has been made by Donald Clark (described by Anthony Seldon as an education guru but perhaps best-known to many in ELT for his demolition of Sugata Mitra). In 2017, Clark wrote a blog post for Cambridge English (now unavailable) entitled How AI will reboot language learning, and a more recent version of this post, called AI has and will change language learning forever (sic) is available on Clark’s own blog. Given the history of the failure of education predictions, Clark is making bold claims. Thomas Edison (1922) believed that movies would revolutionize education. Radios were similarly hyped in the 1940s and in the 1960s it was the turn of TV. In the 1980s, Seymour Papert predicted the end of schools – ‘the computer will blow up the school’, he wrote. Twenty years later, we had the interactive possibilities of Web 2.0. As each technology failed to deliver on the hype, a new generation of enthusiasts found something else to make predictions about.

But is Donald Clark onto something? Developments in AI and computational linguistics have recently resulted in enormous progress in machine translation. Impressive advances in automatic speech recognition and generation, coupled with the power that can be packed into a handheld device, mean that we can expect some re-evaluation of the value of learning another language. Stephen Heppell, a specialist at Bournemouth University in the use of ICT in Education, has said: ‘Simultaneous translation is coming, making language teachers redundant. Modern languages teaching in future may be more about navigating cultural differences’ (quoted by Seldon, p.263). Well, maybe, but this is not Clark’s main interest.

Less a matter of opinion and much closer to the present day is the issue of assessment. AI is becoming ubiquitous in language testing. Cambridge, Pearson, TELC, Babbel and Duolingo are all using or exploring AI in their testing software, and we can expect to see this increase. Current, paper-based systems of testing subject knowledge are, according to Rosemary Luckin and Kristen Weatherby, outdated, ineffective, time-consuming, the cause of great anxiety and can easily be automated (Luckin, R. & Weatherby, K. 2018. ‘Learning analytics, artificial intelligence and the process of assessment’ in Luckin, R. (ed.) Enhancing Learning and Teaching with Technology, 2018. UCL Institute of Education Press, p.253). By capturing data of various kinds throughout a language learner’s course of study and by using AI to analyse learning development, continuous formative assessment becomes possible in ways that were previously unimaginable. ‘Assessment for Learning (AfL)’ or ‘Learning Oriented Assessment (LOA)’ are two terms used by Cambridge English to refer to the potential that AI offers which is described by Luckin (who is also one of the authors of the Pearson paper mentioned earlier). In practical terms, albeit in a still very limited way, this can be seen in the CUP course ‘Empower’, which combines CUP course content with validated LOA from Cambridge Assessment English.

Will this reboot or revolutionise language teaching? Probably not and here’s why. AIED systems need to operate with what is called a ‘domain knowledge model’. This specifies what is to be learnt and includes an analysis of the steps that must be taken to reach that learning goal. Some subjects (especially STEM subjects) ‘lend themselves much more readily to having their domains represented in ways that can be automatically reasoned about’ (du Boulay, D. et al., 2018. ‘Artificial intelligences and big data technologies to close the achievement gap’ in Luckin, R. (ed.) Enhancing Learning and Teaching with Technology, 2018. UCL Institute of Education Press, p.258). This is why most AIED systems have been built to teach these areas. Language are rather different. We simply do not have a domain knowledge model, except perhaps for the very lowest levels of language learning (and even that is highly questionable). Language learning is probably not, or not primarily, about acquiring subject knowledge. Debate still rages about the relationship between explicit language knowledge and language competence. AI-driven formative assessment will likely focus most on explicit language knowledge, as does most current language teaching. This will not reboot or revolutionise anything. It will more likely reinforce what is already happening: a model of language learning that assumes there is a strong interface between explicit knowledge and language competence. It is not a model that is shared by most SLA researchers.

So, one thing that AI can do (and is doing) for language learning is to improve the algorithms that determine the way that grammar and vocabulary are presented to individual learners in online programs. AI-optimised delivery of ‘English Grammar in Use’ may lead to some learning gains, but they are unlikely to be significant. It is not, in any case, what language learners need.

AI, Donald Clark suggests, can offer personalised learning. Precisely what kind of personalised learning this might be, and whether or not this is a good thing, remains unclear. A 2015 report funded by the Gates Foundation found that we currently lack evidence about the effectiveness of personalised learning. We do not know which aspects of personalised learning (learner autonomy, individualised learning pathways and instructional approaches, etc.) or which combinations of these will lead to gains in language learning. The complexity of the issues means that we may never have a satisfactory explanation. You can read my own exploration of the problems of personalised learning starting here .

What’s left? Clark suggests that chatbots are one area with ‘huge potential’. I beg to differ and I explained my reasons eighteen months ago . Chatbots work fine in very specific domains. As Clark says, they can be used for ‘controlled practice’, but ‘controlled practice’ means practice of specific language knowledge, the practice of limited conversational routines, for example. It could certainly be useful, but more than that? Taking things a stage further, Clark then suggests more holistic speaking and listening practice with Amazon Echo, Alexa or Google Home. If and when the day comes that we have general, as opposed to domain-specific, AI, chatting with one of these tools would open up vast new possibilities. Unfortunately, general AI does not exist, and until then Alexa and co will remain a poor substitute for human-human interaction (which is readily available online, anyway). Incidentally, AI could be used to form groups of online language learners to carry out communicative tasks – ‘the aim might be to design a grouping of students all at a similar cognitive level and of similar interests, or one where the participants bring different but complementary knowledge and skills’ (Luckin, R., Holmes, W., Griffiths, M. & Forceir, L.B. 2016. Intelligence Unleashed: An argument for AI in Education. London: Pearson, p.26).

Predictions about the impact of technology on education have a tendency to be made by people with a vested interest in the technologies. Edison was a businessman who had invested heavily in motion pictures. Donald Clark is an edtech entrepreneur whose company, Wildfire, uses AI in online learning programs. Stephen Heppell is executive chairman of LP+ who are currently developing a Chinese language learning community for 20 million Chinese school students. The reporting of AIED is almost invariably in websites that are paid for, in one way or another, by edtech companies. Predictions need, therefore, to be treated sceptically. Indeed, the safest prediction we can make about hyped educational technologies is that inflated expectations will be followed by disillusionment, before the technology finds a smaller niche.

 

by Philip Kerr & Andrew Wickham

from IATEFL 2016 Birmingham Conference Selections (ed. Tania Pattison) Faversham, Kent: IATEFL pp. 75 – 78

ELT publishing, international language testing and private language schools are all industries: products are produced, bought and sold for profit. English language teaching (ELT) is not. It is an umbrella term that is used to describe a range of activities, some of which are industries, and some of which (such as English teaching in high schools around the world) might better be described as public services. ELT, like education more generally, is, nevertheless, often referred to as an ‘industry’.

Education in a neoliberal world

The framing of ELT as an industry is both a reflection of how we understand the term and a force that shapes our understanding. Associated with the idea of ‘industry’ is a constellation of other ideas and words (such as efficacy, productivity, privatization, marketization, consumerization, digitalization and globalization) which become a part of ELT once it is framed as an industry. Repeated often enough, ‘ELT as an industry’ can become a metaphor that we think and live by. Those activities that fall under the ELT umbrella, but which are not industries, become associated with the desirability of industrial practices through such discourse.

The shift from education, seen as a public service, to educational managerialism (where education is seen in industrial terms with a focus on efficiency, free market competition, privatization and a view of students as customers) can be traced to the 1980s and 1990s (Gewirtz, 2001). In 1999, under pressure from developed economies, the General Agreement on Trade in Services (GATS) transformed education into a commodity that could be traded like any other in the marketplace (Robertson, 2006). The global industrialisation and privatization of education continues to be promoted by transnational organisations (such as the World Bank and the OECD), well-funded free-market think-tanks (such as the Cato Institute), philanthro-capitalist foundations (such as the Gates Foundation) and educational businesses (such as Pearson) (Ball, 2012).

Efficacy and learning outcomes

Managerialist approaches to education require educational products and services to be measured and compared. In ELT, the most visible manifestation of this requirement is the current ubiquity of learning outcomes. Contemporary coursebooks are full of ‘can-do’ statements, although these are not necessarily of any value to anyone. Examples from one unit of one best-selling course include ‘Now I can understand advice people give about hotels’ and ‘Now I can read an article about unique hotels’ (McCarthy et al. 2014: 74). However, in a world where accountability is paramount, they are deemed indispensable. The problem from a pedagogical perspective is that teaching input does not necessarily equate with learning uptake. Indeed, there is no reason why it should.

Drawing on the Common European Framework of Reference for Languages (CEFR) for inspiration, new performance scales have emerged in recent years. These include the Cambridge English Scale and the Pearson Global Scale of English. Moving away from the broad six categories of the CEFR, such scales permit finer-grained measurement and we now see individual vocabulary and grammar items tagged to levels. Whilst such initiatives undoubtedly support measurements of efficacy, the problem from a pedagogical perspective is that they assume that language learning is linear and incremental, as opposed to complex and jagged.

Given the importance accorded to the measurement of language learning (or what might pass for language learning), it is unsurprising that attention is shifting towards the measurement of what is probably the most important factor impacting on learning: the teaching. Teacher competency scales have been developed by Cambridge Assessment, the British Council and EAQUALS (Evaluation and Accreditation of Quality Language Services), among others.

The backwash effects of the deployment of such scales are yet to be fully experienced, but the likely increase in the perception of both language learning and teacher learning as the synthesis of granularised ‘bits of knowledge’ is cause for concern.

Digital technology

Digital technology may offer advantages to both English language teachers and learners, but its rapid growth in language learning is the result, primarily but not exclusively, of the way it has been promoted by those who stand to gain financially. In education, generally, and in English language teaching, more specifically, advocacy of the privatization of education is always accompanied by advocacy of digitalization. The global market for digital English language learning products was reported to be $2.8 billion in 2015 and is predicted to reach $3.8 billion by 2020 (Ambient Insight, 2016).

In tandem with the increased interest in measuring learning outcomes, there is fierce competition in the market for high-stakes examinations, and these are increasingly digitally delivered and marked. In the face of this competition and in a climate of digital disruption, companies like Pearson and Cambridge English are developing business models of vertical integration where they can provide and sell everything from placement testing, to courseware (either print or delivered through an LMS), teaching, assessment and teacher training. Huge investments are being made in pursuit of such models. Pearson, for example, recently bought GlobalEnglish, Wall Street English, and set up a partnership with Busuu, thus covering all aspects of language learning from resources provision and publishing to off- and online training delivery.

As regards assessment, the most recent adult coursebook from Cambridge University Press (in collaboration with Cambridge English Language Assessment), ‘Empower’ (Doff, et. Al, 2015) sells itself on a combination of course material with integrated, validated assessment.

Besides its potential for scalability (and therefore greater profit margins), the appeal (to some) of platform-delivered English language instruction is that it facilitates assessment that is much finer-grained and actionable in real time. Digitization and testing go hand in hand.

Few English language teachers have been unaffected by the move towards digital. In the state sectors, large-scale digitization initiatives (such as the distribution of laptops for educational purposes, the installation of interactive whiteboards, the move towards blended models of instruction or the move away from printed coursebooks) are becoming commonplace. In the private sectors, online (or partially online) language schools are taking market share from the traditional bricks-and-mortar institutions.

These changes have entailed modifications to the skill-sets that teachers need to have. Two announcements at this conference reflect this shift. First of all, Cambridge English launched their ‘Digital Framework for Teachers’, a matrix of six broad competency areas organised into four levels of proficiency. Secondly, Aqueduto, the Association for Quality Education and Training Online, was launched, setting itself up as an accreditation body for online or blended teacher training courses.

Teachers’ pay and conditions

In the United States, and likely soon in the UK, the move towards privatization is accompanied by an overt attack on teachers’ unions, rights, pay and conditions (Selwyn, 2014). As English language teaching in both public and private sectors is commodified and marketized it is no surprise to find that the drive to bring down costs has a negative impact on teachers worldwide. Gwynt (2015), for example, catalogues cuts in funding, large-scale redundancies, a narrowing of the curriculum, intensified workloads (including the need to comply with ‘quality control measures’), the deskilling of teachers, dilapidated buildings, minimal resources and low morale in an ESOL department in one British further education college. In France, a large-scale study by Wickham, Cagnol, Wright and Oldmeadow (Linguaid, 2015; Wright, 2016) found that EFL teachers in the very competitive private sector typically had multiple employers, limited or no job security, limited sick pay and holiday pay, very little training and low hourly rates that were deteriorating. One of the principle drivers of the pressure on salaries is the rise of online training delivery through Skype and other online platforms, using offshore teachers in low-cost countries such as the Philippines. This type of training represents 15% in value and up to 25% in volume of all language training in the French corporate sector and is developing fast in emerging countries. These examples are illustrative of a broad global trend.

Implications

Given the current climate, teachers will benefit from closer networking with fellow professionals in order, not least, to be aware of the rapidly changing landscape. It is likely that they will need to develop and extend their skill sets (especially their online skills and visibility and their specialised knowledge), to differentiate themselves from competitors and to be able to demonstrate that they are in tune with current demands. More generally, it is important to recognise that current trends have yet to run their full course. Conditions for teachers are likely to deteriorate further before they improve. More than ever before, teachers who want to have any kind of influence on the way that marketization and industrialization are shaping their working lives will need to do so collectively.

References

Ambient Insight. 2016. The 2015-2020 Worldwide Digital English Language Learning Market. http://www.ambientinsight.com/Resources/Documents/AmbientInsight_2015-2020_Worldwide_Digital_English_Market_Sample.pdf

Ball, S. J. 2012. Global Education Inc. Abingdon, Oxon.: Routledge

Doff, A., Thaine, C., Puchta, H., Stranks, J. and P. Lewis-Jones 2015. Empower. Cambridge: Cambridge University Press

Gewirtz, S. 2001. The Managerial School: Post-welfarism and Social Justice in Education. Abingdon, Oxon.: Routledge

Gwynt, W. 2015. ‘The effects of policy changes on ESOL’. Language Issues 26 / 2: 58 – 60

McCarthy, M., McCarten, J. and H. Sandiford 2014. Touchstone 2 Student’s Book Second Edition. Cambridge: Cambridge University Press

Linguaid, 2015. Le Marché de la Formation Langues à l’Heure de la Mondialisation. Guildford: Linguaid

Robertson, S. L. 2006. ‘Globalisation, GATS and trading in education services.’ published by the Centre for Globalisation, Education and Societies, University of Bristol, Bristol BS8 1JA, UK at http://www.bris.ac.uk/education/people/academicStaff/edslr/publications/04slr

Selwyn, N. 2014. Distrusting Educational Technology. New York: Routledge

Wright, R. 2016. ‘My teacher is rich … or not!’ English Teaching Professional 103: 54 – 56

 

 

About two and a half years ago when I started writing this blog, there was a lot of hype around adaptive learning and the big data which might drive it. Two and a half years are a long time in technology. A look at Google Trends suggests that interest in adaptive learning has been pretty static for the last couple of years. It’s interesting to note that 3 of the 7 lettered points on this graph are Knewton-related media events (including the most recent, A, which is Knewton’s latest deal with Hachette) and 2 of them concern McGraw-Hill. It would be interesting to know whether these companies follow both parts of Simon Cowell’s dictum of ‘Create the hype, but don’t ever believe it’.

Google_trends

A look at the Hype Cycle (see here for Wikipedia’s entry on the topic and for criticism of the hype of Hype Cycles) of the IT research and advisory firm, Gartner, indicates that both big data and adaptive learning have now slid into the ‘trough of disillusionment’, which means that the market has started to mature, becoming more realistic about how useful the technologies can be for organizations.

A few years ago, the Gates Foundation, one of the leading cheerleaders and financial promoters of adaptive learning, launched its Adaptive Learning Market Acceleration Program (ALMAP) to ‘advance evidence-based understanding of how adaptive learning technologies could improve opportunities for low-income adults to learn and to complete postsecondary credentials’. It’s striking that the program’s aims referred to how such technologies could lead to learning gains, not whether they would. Now, though, with the publication of a report commissioned by the Gates Foundation to analyze the data coming out of the ALMAP Program, things are looking less rosy. The report is inconclusive. There is no firm evidence that adaptive learning systems are leading to better course grades or course completion. ‘The ultimate goal – better student outcomes at lower cost – remains elusive’, the report concludes. Rahim Rajan, a senior program office for Gates, is clear: ‘There is no magical silver bullet here.’

The same conclusion is being reached elsewhere. A report for the National Education Policy Center (in Boulder, Colorado) concludes: Personalized Instruction, in all its many forms, does not seem to be the transformational technology that is needed, however. After more than 30 years, Personalized Instruction is still producing incremental change. The outcomes of large-scale studies and meta-analyses, to the extent they tell us anything useful at all, show mixed results ranging from modest impacts to no impact. Additionally, one must remember that the modest impacts we see in these meta-analyses are coming from blended instruction, which raises the cost of education rather than reducing it (Enyedy, 2014: 15 -see reference at the foot of this post). In the same vein, a recent academic study by Meg Coffin Murray and Jorge Pérez (2015, ‘Informing and Performing: A Study Comparing Adaptive Learning to Traditional Learning’) found that ‘adaptive learning systems have negligible impact on learning outcomes’.

future-ready-learning-reimagining-the-role-of-technology-in-education-1-638In the latest educational technology plan from the U.S. Department of Education (‘Future Ready Learning: Reimagining the Role of Technology in Education’, 2016) the only mentions of the word ‘adaptive’ are in the context of testing. And the latest OECD report on ‘Students, Computers and Learning: Making the Connection’ (2015), finds, more generally, that information and communication technologies, when they are used in the classroom, have, at best, a mixed impact on student performance.

There is, however, too much money at stake for the earlier hype to disappear completely. Sponsored cheerleading for adaptive systems continues to find its way into blogs and national magazines and newspapers. EdSurge, for example, recently published a report called ‘Decoding Adaptive’ (2016), sponsored by Pearson, that continues to wave the flag. Enthusiastic anecdotes take the place of evidence, but, for all that, it’s a useful read.

In the world of ELT, there are plenty of sales people who want new products which they can call ‘adaptive’ (and gamified, too, please). But it’s striking that three years after I started following the hype, such products are rather thin on the ground. Pearson was the first of the big names in ELT to do a deal with Knewton, and invested heavily in the company. Their relationship remains close. But, to the best of my knowledge, the only truly adaptive ELT product that Pearson offers is the PTE test.

Macmillan signed a contract with Knewton in May 2013 ‘to provide personalized grammar and vocabulary lessons, exam reviews, and supplementary materials for each student’. In December of that year, they talked up their new ‘big tree online learning platform’: ‘Look out for the Big Tree logo over the coming year for more information as to how we are using our partnership with Knewton to move forward in the Language Learning division and create content that is tailored to students’ needs and reactive to their progress.’ I’ve been looking out, but it’s all gone rather quiet on the adaptive / platform front.

In September 2013, it was the turn of Cambridge to sign a deal with Knewton ‘to create personalized learning experiences in its industry-leading ELT digital products for students worldwide’. This year saw the launch of a major new CUP series, ‘Empower’. It has an online workbook with personalized extra practice, but there’s nothing (yet) that anyone would call adaptive. More recently, Cambridge has launched the online version of the 2nd edition of Touchstone. Nothing adaptive there, either.

Earlier this year, Cambridge published The Cambridge Guide to Blended Learning for Language Teaching, edited by Mike McCarthy. It contains a chapter by M.O.Z. San Pedro and R. Baker on ‘Adaptive Learning’. It’s an enthusiastic account of the potential of adaptive learning, but it doesn’t contain a single reference to language learning or ELT!

So, what’s going on? Skepticism is becoming the order of the day. The early hype of people like Knewton’s Jose Ferreira is now understood for what it was. Companies like Macmillan got their fingers badly burnt when they barked up the wrong tree with their ‘Big Tree’ platform.

Noel Enyedy captures a more contemporary understanding when he writes: Personalized Instruction is based on the metaphor of personal desktop computers—the technology of the 80s and 90s. Today’s technology is not just personal but mobile, social, and networked. The flexibility and social nature of how technology infuses other aspects of our lives is not captured by the model of Personalized Instruction, which focuses on the isolated individual’s personal path to a fixed end-point. To truly harness the power of modern technology, we need a new vision for educational technology (Enyedy, 2014: 16).

Adaptive solutions aren’t going away, but there is now a much better understanding of what sorts of problems might have adaptive solutions. Testing is certainly one. As the educational technology plan from the U.S. Department of Education (‘Future Ready Learning: Re-imagining the Role of Technology in Education’, 2016) puts it: Computer adaptive testing, which uses algorithms to adjust the difficulty of questions throughout an assessment on the basis of a student’s responses, has facilitated the ability of assessments to estimate accurately what students know and can do across the curriculum in a shorter testing session than would otherwise be necessary. In ELT, Pearson and EF have adaptive tests that have been well researched and designed.

Vocabulary apps which deploy adaptive technology continue to become more sophisticated, although empirical research is lacking. Automated writing tutors with adaptive corrective feedback are also developing fast, and I’ll be writing a post about these soon. Similarly, as speech recognition software improves, we can expect to see better and better automated adaptive pronunciation tutors. But going beyond such applications, there are bigger questions to ask, and answers to these will impact on whatever direction adaptive technologies take. Large platforms (LMSs), with or without adaptive software, are already beginning to look rather dated. Will they be replaced by integrated apps, or are apps themselves going to be replaced by bots (currently riding high in the Hype Cycle)? In language learning and teaching, the future of bots is likely to be shaped by developments in natural language processing (another topic about which I’ll be blogging soon). Nobody really has a clue where the next two and a half years will take us (if anywhere), but it’s becoming increasingly likely that adaptive learning will be only one very small part of it.

 

Enyedy, N. 2014. Personalized Instruction: New Interest, Old Rhetoric, Limited Results, and the Need for a New Direction for Computer-Mediated Learning. Boulder, CO: National Education Policy Center. Retrieved 17.07.16 from http://nepc.colorado.edu/publication/personalized-instruction

There are a number of reasons why we sometimes need to describe a person’s language competence using a single number. Most of these are connected to the need for a shorthand to differentiate people, in summative testing or in job selection, for example. Numerical (or grade) allocation of this kind is so common (and especially in times when accountability is greatly valued) that it is easy to believe that this number is an objective description of a concrete entity, rather than a shorthand description of an abstract concept. In the process, the abstract concept (language competence) becomes reified and there is a tendency to stop thinking about what it actually is.

Language is messy. It’s a complex, adaptive system of communication which has a fundamentally social function. As Diane Larsen-Freeman and others have argued patterns of use strongly affect how language is acquired, is used, and changes. These processes are not independent of one another but are facets of the same complex adaptive system. […] The system consists of multiple agents (the speakers in the speech community) interacting with one another [and] the structures of language emerge from interrelated patterns of experience, social interaction, and cognitive mechanisms.

As such, competence in language use is difficult to measure. There are ways of capturing some of it. Think of the pages and pages of competency statements in the Common European Framework, but there has always been something deeply unsatisfactory about documents of this kind. How, for example, are we supposed to differentiate, exactly and objectively, between, say, can participate fully in an interview (C1) and can carry out an effective, fluent interview (B2)? The short answer is that we can’t. There are too many of these descriptors anyway and, even if we did attempt to use such a detailed tool to describe language competence, we would still be left with a very incomplete picture. There is at least one whole book devoted to attempts to test the untestable in language education (edited by Amos Paran and Lies Sercu, Multilingual Matters, 2010).

So, here is another reason why we are tempted to use shorthand numerical descriptors (such as A1, A2, B1, etc.) to describe something which is very complex and abstract (‘overall language competence’) and to reify this abstraction in the process. From there, it is a very short step to making things even more numerical, more scientific-sounding. Number-creep in recent years has brought us the Pearson Global Scale of English which can place you at a precise point on a scale from 10 to 90. Not to be outdone, Cambridge English Language Assessment now has a scale that runs from 80 points to 230, although Cambridge does, at least, allocate individual scores for four language skills.

As the title of this post suggests (in its reference to Stephen Jay Gould’s The Mismeasure of Man), I am suggesting that there are parallels between attempts to measure language competence and the sad history of attempts to measure ‘general intelligence’. Both are guilty of the twin fallacies of reification and ranking – the ordering of complex information as a gradual ascending scale. These conceptual fallacies then lead us, through the way that they push us to think about language, into making further conceptual errors about language learning. We start to confuse language testing with the ways that language learning can be structured.

We begin to granularise language. We move inexorably away from difficult-to-measure hazy notions of language skills towards what, on the surface at least, seem more readily measurable entities: words and structures. We allocate to them numerical values on our testing scales, so that an individual word can be deemed to be higher or lower on the scale than another word. And then we have a syllabus, a synthetic syllabus, that lends itself to digital delivery and adaptive manipulation. We find ourselves in a situation where materials writers for Pearson, writing for a particular ‘level’, are only allowed to use vocabulary items and grammatical structures that correspond to that ‘level’. We find ourselves, in short, in a situation where the acquisition of a complex and messy system is described as a linear, additive process. Here’s an example from the Pearson website: If you score 29 on the scale, you should be able to identify and order common food and drink from a menu; at 62, you should be able to write a structured review of a film, book or play. And because the GSE is so granular in nature, you can conquer smaller steps more often; and you are more likely to stay motivated as you work towards your goal. It’s a nonsense, a nonsense that is dictated by the needs of testing and adaptive software, but the sciency-sounding numbers help to hide the conceptual fallacies that lie beneath.

Perhaps, though, this doesn’t matter too much for most language learners. In the early stages of language learning (where most language learners are to be found), there are countless millions of people who don’t seem to mind the granularised programmes of Duolingo or Rosetta Stone, or the Grammar McNuggets of coursebooks. In these early stages, anything seems to be better than nothing, and the testing is relatively low-stakes. But as a learner’s interlanguage becomes more complex, and as the language she needs to acquire becomes more complex, attempts to granularise it and to present it in a linearly additive way become more problematic. It is for this reason, I suspect, that the appeal of granularised syllabuses declines so rapidly the more progress a learner makes. It comes as no surprise that, the further up the scale you get, the more that both teachers and learners want to get away from pre-determined syllabuses in coursebooks and software.

Adaptive language learning software is continuing to gain traction in the early stages of learning, in the initial acquisition of basic vocabulary and structures and in coming to grips with a new phonological system. It will almost certainly gain even more. But the challenge for the developers and publishers will be to find ways of making adaptive learning work for more advanced learners. Can it be done? Or will the mismeasure of language make it impossible?

Duolingo testing

Posted: September 6, 2014 in testing
Tags: , , , , ,

After a break of two years, I recently returned to Duolingo in an attempt to build my German vocabulary. The attempt lasted a week. A few small things had changed, but the essentials had not, and my amusement at translating sentences like The duck eats oranges, A red dog wears white clothes or The fly is important soon turned to boredom and irritation. There are better, free ways of building vocabulary in another language.

Whilst little is new in the learning experience of Duolingo, there are significant developments at the company. The first of these is a new funding round in which they raised a further $20 million, bringing total investment to close to $40 million. Duolingo now has more than 25 million users, half of whom are described as ‘active’, and, according to Louis von Ahn,  the company’s founder, their ambition is to dominate the language learning market. Approaching their third anniversary, though, Duolingo will need, before long, to turn a profit or, at least, to break even. The original plan, to use the language data generated by users of the site to power a paying translation service, is beginning to bear fruit, with contracts with CNN and BuzzFeed. But Duolingo is going to need other income streams. This may well be part of the reason behind their decision to develop and launch their own test.

Duolingo’s marketing people, however, are trying to get another message across: Every year, over 30 million job seekers and students around the world are forced to take a test to prove that they know English in order to apply for a job or school. For some, these tests can cost their family an entire month’s salary. And not only that, taking them typically requires traveling to distant examination facilities and waiting weeks for the results. We believe there should be a better way. This is why today I’m proud to announce the beta release of the Duolingo Test Center, which was created to give everyone equal access to jobs and educational opportunities. Now anyone can conveniently certify their English skills from home, on their mobile device, and for only $20. That’s 1/10th the cost of existing tests. Talking the creative disruption talk, Duolingo wants to break into the “archaic” industry of language proficiency tests. Basically, then, they want to make the world a better place. I seem to have heard this kind of thing before.

The tests will cost $20. Gina Gotthilf , Duolingo’s head of marketing, explains the pricing strategy: We came up with the smallest value that works for us and that a lot of people can pay. Duolingo’s main markets are now the BRICS countries. In China, for example, 1.5 million people signed up with Duolingo in just one week in April of this year, according to @TECHINASIA . Besides China, Duolingo has expanded into India, Japan, Korea, Taiwan, Hong Kong, Vietnam and Indonesia this year. (Brazil already has 2.4 million users, and there are 1.5 million in Mexico.) That’s a lot of potential customers.

So, what do you get for your twenty bucks? Not a lot, is the short answer. The test lasts about 18 minutes. There are four sections, and adaptive software analyses the testee’s responses to determine the level of difficulty of subsequent questions. The first section requires users to select real English words from a list which includes invented words. The second is a short dictation, the third is a gapfill, and the fourth is a read-aloud task which is recorded and compared to a native-speaker norm. That’s it.Item types

Duolingo claims that the test scores correlate very well with TOEFL, but the claim is based on a single study by a University of Pittsburgh professor that was sponsored by Duolingo. Will further studies replicate the findings? I, for one, wouldn’t bet on it, but I won’t insult your intelligence by explaining my reasons. Test validity and reliability, then, remain to be proved, but even John Lehoczky , interim executive vice president of Carnegie Mellon University (Duolingo was developed by researchers from Carnegie Mellon’s computer science department) acknowledges that at this point [the test] is not a fit vehicle for undergraduate admissions.

Even more of a problem than validity and reliability, however, is the question of security. The test is delivered via the web or smartphone apps (Android and iOS). Testees have to provide photo ID and a photo taken on the device they are using. There are various rules (they must be alone, no headphones, etc) and a human proctor reviews the test after it has been completed. This is unlikely to impress authorities like the British immigration authorities, which recently refused to recognise online TOEFL and TOEIC qualifications, after a BBC documentary revealed ‘systematic fraud’ in the taking of these tests.

There will always be a market of sorts for valueless qualifications (think, for example, of all the cheap TEFL courses that can be taken online), but to break into the monopoly of TOEFL and IELTS (and soon perhaps Pearson), Duolingo will need to deal with the issues of validity, reliability and security. If they don’t, few – if any – institutions of higher education will recognise the test. But if they do, they’ll need to spend more money: a team of applied linguists with expertise in testing would be a good start, and serious proctoring doesn’t come cheap. Will they be able to do this and keep the price down to $20?