Posts Tagged ‘apps’

In the world of ELT teacher blogs, magazines, webinars and conferences right now, you would be hard pressed to avoid the topic of generative AI. Ten years ago, the hot topic was ‘mobile learning’. Might there be some lessons to be learnt from casting our gaze back a little more than a decade?

One of the first ELT-related conferences about mobile learning took place in Japan in 2006. Reporting on this a year later, Dudeney and Hockly (2007: 156) observed that ‘m-learning appears to be here to stay’. By 2009, Agnes Kukulska-Hulme was asking ‘will mobile learning change language learning?’ Her answer, of course, was yes, but it took a little time for the world of ELT to latch onto this next big thing (besides a few apps). Relatively quick out of the blocks was Caroline Moore with an article in the Guardian (8 March 2011) arguing for wider use of mobile learning in ELT. As is so often the case with early promoters of edtech, Caroline had a vested interest, as a consultant in digital language learning, in advancing her basic argument. This was that the technology was so ubiquitous and so rich in potential that it would be foolish not to make the most of it.

The topic gained traction with an IATEFL LT SIG webinar in December 2011, a full-day pre-conference event at the main IATEFL conference early the following year, along with a ‘Macmillan Education Mobile Learning Debate’. Suddenly, mobile learning was everywhere and, by the end of the year, it was being described as ‘the future of learning’ (Kukulska-Hulme, A., 2012). In early 2013, ELT Journal published a defining article, ‘Mobile Learning’ (Hockly, N., 2013). By this point, it wasn’t just a case of recommending teachers to try out a few apps with their learners. The article concludes by saying that ‘the future is increasingly mobile, and it behoves us to reflect this in our teaching practice’ (Hockly, 2013: 83). The rhetorical force was easier to understand than the logical connection.

It wasn’t long before mobile learning was routinely described as the ‘future of language learning’ and apps, like DuoLingo and Busuu, were said to be ‘revolutionising language learning’. Kukulska-Hulme (Kukulska-Hulme et al., 2017) contributed a chapter entitled ‘Mobile Learning Revolution’ to a handbook of technology and second language learning.

In 2017 (books take a while to produce), OUP brought out ‘Mobile Learning’ by Shaun Wilden (2017). Shaun’s book is the place to go for practical ideas: playing around with photos, using QR codes, audio / video recording and so on. The reasons for using mobile learning continue to grow (developing 21st century skills like creativity, critical thinking and digital literacy in ‘student-centred, dynamic, and motivating ways’).

Unlike Nicky Hockly’s article (2013), Shaun acknowledges that there may be downsides to mobile technology in the classroom. The major downside, as everybody who has ever been in a classroom where phones are permitted knows, is that the technology may be a bigger source of distraction than it is of engagement. Shaun offers a page about ‘acceptable use policies’ for mobile phones in classrooms, but does not let (what he describes as) ‘media scare stories’ get in the way of his enthusiasm.

There are undoubtedly countless examples of ways in which mobile phones can (and even should) be used to further language learning, although I suspect that the QR reader would struggle to make the list. The problem is that these positive examples are all we ever hear about. The topic of distraction does not even get a mention in the chapter on mobile language learning in ‘The Routledge Handbook of Language Learning and Technology’ (Stockwell, 2016). Neither does it appear in Li Li’s (2017) ‘New Technologies and Language Learning’.

Glenda Morgan (2023) has described this as ‘Success Porn in EdTech’, where success is exaggerated, failures minimized and challenges rendered to the point that they are pretty much invisible. ‘Success porn’ is a feature of conference presentations and blog posts, genres which require relentless positivity and a ‘constructive sense of hope, optimism and ambition’ (Selwyn, 2016). Edtech Kool-Aid (ibid) is also a feature of academic writing. Do a Google Scholar search for ‘mobile learning language learning’ to see what I mean. The first article that comes up is entitled ‘Positive effects of mobile learning on foreign language learning’. Skepticism is in very short supply, as it is in most research into edtech. There are a number of reasons for this, one of which (that ‘locating one’s work in the pro-edtech zeitgeist may be a strategic choice to be part of the mainstream of the field’ (Mertala et al., 2022)) will resonate with colleagues who wish to give conference presentations and write blogs for publishers. The discourse around AI is, of course, no different (see Nemorin et al., 2022).

Anyway, back to the downside of mobile learning and the ‘media scare stories’. Most language learning takes place in primary and secondary schools. According to a recent report from Common Sense (Radesky et al., 2023), US teens use their smart phones for a median of 4 ½ hours per day, checking for notifications a median of 51 times. Almost all of them (97%) use their phones at school, mostly for social media, videos or gaming. Schools have a variety of policies, and widely varying enforcement within those policies. Your country may not be quite the same as the US, but it’s probably heading that way.

Research suggests that excessive (which is to say typical) mobile phone use has a negative impact on learning outcomes, wellbeing and issues like bullying (see this brief summary of global research). This comes as no surprise to most people – the participants at the 2012 Macmillan debate were aware of these problems. The question that needs to be asked, therefore, is not whether mobile learning can assist language learning, but whether the potential gains outweigh the potential disadvantages. Is language learning a special case?

One in four countries around the world have decided to ban phones in school. A new report from UNESCO (2023) calls for a global smart phone ban in education, pointing out that there is ‘little robust research to demonstrate digital technology inherently added value to education’. The same report delves a little into generative AI, and a summary begins ‘Generative AI may not bring the kind of change in education often discussed. Whether and how AI would be used in education is an open question (Gillani et al., 2023)’ (UNESCO, 2023: 13).

The history of the marketing of edtech has always been ‘this time it’s different’. It relies on a certain number of people repeating the mantra, since the more it is repeated, the more likely it will be perceived to be true (Fazio et al., 2019): this is the illusory truth effect or the ‘Snark rule[1]’. Mobile learning changed things for the better for some learners in some contexts: claims that it was the future of, or would revolutionize, language learning have proved somewhat exaggerated. Indeed, the proliferation of badly-designed language learning apps suggests that much mobile learning reinforces the conventional past of language learning (drilling, gamified rote learning, native-speaker models, etc.) rather than leading to positive change (see Kohn, 2023). The history of edtech is a history of broken promises and unfulfilled potential and there is no good reason why generative AI will be any different.

Perhaps, then, it behoves us to be extremely sceptical about the current discourse surrounding generative AI in ELT. Like mobile technology, it may well be an extremely useful tool, but the chances that it will revolutionize language teaching are extremely slim – much like the radio, TV, audio / video recording and playback, the photocopier, the internet and VR before it. A few people will make some money for a while, but truly revolutionary change in teaching / learning will not come about through technological innovation.

References

Dudeney, G. & Hockly, N. (2007) How to Teach English with Technology. Harlow: Pearson Education

Fazio, L. K., Rand, D. G. & Pennycook, G. (2019) Repetition increases perceived truth equally for plausible and implausible statements. Psychonomic Bulletin and Review 26: 1705–1710. https://doi.org/10.3758/s13423-019-01651-4

Hockly, N. (2013) Mobile Learning. ELT Journal, 67 (1): 80 – 84

Kohn, A. (2023) How ‘Innovative’ Ed Tech Actually Reinforces Convention. Education Week, 19 September 2023.

Kukulska-Hulme, A. (2009) Will Mobile Learning Change Language Learning? reCALL, 21 (2): 157 – 165

Kukulska-Hulme, A. (2012) Mobile Learning and the Future of Learning. International HETL Review, 2: 13 – 18

Kukulska-Hulme, A., Lee, H. & Norris, L. (2017) Mobile Learning Revolution: Implications for Language Pedagogy. In Chapelle, C. A. & Sauro, S. (Eds.) The Handbook of Technology and Second Language Teaching and Learning. John Wiley & Sons

Li, L. (2017) New Technologies and Language Learning. London: Palgrave

Mertala, P., Moens, E. & Teräs, M. (2022) Highly cited educational technology journal articles: a descriptive and critical analysis, Learning, Media and Technology, DOI: 10.1080/17439884.2022.2141253

Nemorin, S., Vlachidis, A., Ayerakwa, H. M. & Andriotis, P. (2022): AI hyped? A horizon scan of discourse on artificial intelligence in education (AIED) and development, Learning, Media and Technology, DOI: 10.1080/17439884.2022.2095568

Radesky, J., Weeks, H.M., Schaller, A., Robb, M., Mann, S., and Lenhart, A. (2023) Constant Companion: A Week in the Life of a Young Person’s Smartphone Use. San Francisco, CA: Common Sense.

Selwyn, N. (2016) Minding our Language: Why Education and Technology is Full of Bullshit … and What Might be Done About it. Learning, Media and Technology, 41 (3): 437–443

Stockwell, G. (2016) Mobile Language Learning. In Farr, F. & Murray, L. (Eds.) The Routledge Handbook of Language Learning and Technology. Abingdon: Routledge. pp. 296 – 307

UNESCO (2023) Global Education Monitoring Report 2023: Technology in Education – A Tool on whose Terms?Paris: UNESCO

Wilden, S. (2017) Mobile Learning. Oxford: OUP


[1] Named after Lewis Carroll’s poem ‘The Hunting of the Snark’ in which the Bellman cries ‘I have said it thrice: What I tell you three times is true.’

Recent years have seen a proliferation of computer-assisted pronunciations trainers (CAPTs), both as stand-alone apps and as a part of broader language courses. The typical CAPT records the learner’s voice, compares this to a model of some kind, detects differences between the learner and the model, and suggests ways that the learner may more closely approximate to the model (Agarwal & Chakraborty, 2019). Most commonly, the focus is on individual phonemes, rather than, as in Richard Cauldwell’s ‘Cool Speech’ (2012), on the features of fluent natural speech (Rogerson-Revell, 2021).

The fact that CAPTs are increasingly available and attractive ‘does not of course ensure their pedagogic value or effectiveness’ … ‘many are technology-driven rather than pedagogy-led’ (Rogerson-Revell, 2021). Rogerson-Revell (2021) points to two common criticisms of CAPTs. Firstly, their pedagogic accuracy sometimes falls woefully short. He gives the example of a unit on intonation in one app, where users are told that ‘when asking questions in English, our voice goes up in pitch’ and ‘we lower the pitch of our voice at the end of questions’. Secondly, he observes that CAPTs often adopt a one-size-fits-all approach, despite the fact that we know that issues of pronunciation are extremely context-sensitive: ‘a set of learners in one context will need certain features that learners in another context do not’ (Levis, 2018: 239).

There are, in addition, technical challenges that are not easy to resolve. Many CAPTs rely on automatic speech recognition (ASR), which can be very accurate with some accents, but much less so with other accents (including many non-native-speaker accents) (Korzekwa et al., 2022). Anyone using a CAPT will experience instances of the software identifying pronunciation problems that are not problems, and failing to identify potentially more problematic issues (Agarwal & Chakraborty, 2019).

We should not, therefore, be too surprised if these apps don’t always work terribly well. Some apps, like the English File Pronunciation app, have been shown to be effective in helping the perception and production of certain phonemes by a very unrepresentative group of Spanish learners of English (Fouz-González, 2020), but this tells us next to nothing about the overall effectiveness of the app. Most CAPTs have not been independently reviewed, and, according to a recent meta-analysis of CAPTs (Mahdi & Al Khateeb, 2019), the small number of studies are ‘all of very low quality’. This, unfortunately, renders their meta-analysis useless.

Even if the studies in the meta-analysis had not been of very low quality, we would need to pause before digesting any findings about CAPTs’ effectiveness. Before anything else, we need to develop a good understanding of what they might be effective at. It’s here that we run headlong into the problem of native-speakerism (Holliday, 2006; Kiczkowiak, 2018).

The pronunciation model that CAPTs attempt to push learners towards is a native-speaker model. In the case of ELSA Speak, for example, this is a particular kind of American accent, although ‘British and other accents’ will apparently soon be added. Xavier Anguera, co-founder and CTO of ELSA Speak, in a fascinating interview with Paul Raine of TILTAL, happily describes his product as ‘an app that is for accent reduction’. Accent reduction is certainly a more accurate way of describing CAPTs than accent promotion.

Accent reduction, or the attempt to mimic an imagined native-speaker pronunciation, is now ‘rarely put forward by teachers or researchers as a worthwhile goal’ (Levis, 2018: 33) because it is only rarely achievable and, in many contexts, inappropriate. In addition, accent reduction cannot easily be separated from accent prejudice. Accent reduction courses and products ‘operate on the assumption that some accents are more legitimate than others’ (Ennser-Kananen, et al., 2021) and there is evidence that they can ‘reinscribe racial inequalities’ (Ramjattan, 2019). Accent reduction is quintessentially native-speakerist.

Rather than striving towards a native-speaker accentedness, there is a growing recognition among teachers, methodologists and researchers that intelligibility may be a more appropriate learning goal (Levis, 2018) than accentedness. It has been over 20 years since Jennifer Jenkins (2000) developed her Lingua Franca Core (LFC), a relatively short list of pronunciation features that she considered central to intelligibility in English as a Lingua Franca contexts (i.e. the majority of contexts in which English is used). Intelligibility as the guiding principle of pronunciation teaching continues to grow in influence, spurred on by the work of Walker (2010), Kiczkowiak & Lowe (2018), Patsko & Simpson (2019) and Hancock (2020), among others.

Unfortunately, intelligibility is a deceptively simple concept. What exactly it is, is ‘not an easy question to answer’ writes John Levis (2018) before attempting his own answer in the next 250 pages. As admirable as the LFC may be as an attempt to offer a digestible and actionable list of key pronunciation features, it ‘remains controversial in many of its recommendations. It lacks robust empirical support, assumes that all NNS contexts are similar, and does not take into account the importance of stigma associated with otherwise intelligible pronunciations’ (Levis, 2018: 47). Other attempts to list features of intelligibility fare no better in Levis’s view: they are ‘a mishmash of incomplete and contradictory recommendations’ (Levis, 2018: 49).

Intelligibility is also complex because of the relationship between intelligibility and comprehensibility, or the listener’s willingness to understand – their attitude or stance towards the speaker. Comprehensibility is a mediation concept (Ennser-Kananen, et al., 2021). It is a two-way street, and intelligibility-driven approaches need to take this into account (unlike the accent-reduction approach which places all the responsibility for comprehensibility on the shoulders of the othered speaker).

The problem of intelligibility becomes even more thorny when it comes to designing a pronunciation app. Intelligibility and comprehensibility cannot easily be measured (if at all!), and an app’s algorithms need a concrete numerically-represented benchmark towards which a user / learner can be nudged. Accentedness can be measured (even if the app has to reify a ‘native-speaker accent’ to do so). Intelligibility / Comprehensibility is simply not something, as Xavier Anguera acknowledges, that technology can deal with. In this sense, CAPTs cannot avoid being native-speakerist.

At this point, we might ride off indignantly into the sunset, but a couple of further observations are in order. First of all, accentedness and comprehensibility are not mutually exclusive categories. Anguera notes that intelligibility can be partly improved by reducing accentedness, and some of the research cited by Levis (2018) backs him up on this. But precisely how much and what kind of accent reduction improves intelligibility is not knowable, so the use of CAPTs is something of an optimistic stab in the dark. Like all stabs in the dark, there are dangers. Secondly, individual language learners may be forgiven for not wanting to wait for accent prejudice to become a thing of the past: if they feel that they will suffer less from prejudice by attempting here and now to reduce their ‘foreign’ accent, it is not for me, I think, to pass judgement. The trouble, of course, is that CAPTs contribute to the perpetuation of the prejudices.

There is, however, one area where the digital evaluation of accentedness is, I think, unambiguously unacceptable. According to Rogerson-Revell (2021), ‘Australia’s immigration department uses the Pearson Test of English (PTE) Academic as one of five tests. The PTE tests speaking ability using voice recognition technology and computer scoring of test-takers’ audio recordings. However, L1 English speakers and highly proficient L2 English speakers have failed the oral fluency section of the English test, and in some cases it appears that L1 speakers achieve much higher scores if they speak unnaturally slowly and carefully’. Human evaluations are not necessarily any better.

References

Agarwal, C. & Chakraborty, P. (2019) A review of tools and techniques for computer aided pronunciation training (CAPT) in English. Education and Information Technologies, 24: 3731–3743. https://doi.org/10.1007/s10639-019-09955-7

Cauldwell, R (2012) Cool Speech app. Available at: http://www.speechinaction.org/cool-speech-2

Fouz-González, J (2020) Using apps for pronunciation training: An empirical evaluation of the English File Pronunciation app. Language Learning & Technology, 24(1): 62–85.

Ennser-Kananen, J., Halonen, M. & Saarinen, T. (2021) “Come Join Us and Lose Your Accent!” Accent Modification Courses as Hierarchization of International Students. Journal of International Students 11 (2): 322 – 340

Holliday, A. (2006) Native-speakerism. ELT Journal, 60 (4): 385 – 387

Jenkins. J. (2000) The Phonology of English as a Lingua Franca. Oxford: Oxford University Press

Hancock, M. (2020) 50 Tips for Teaching Pronunciation. Cambridge: Cambridge University Press

Kiczkowiak, M. (2018) Native Speakerism in English Language Teaching: Voices From Poland. Doctoral dissertation.

Kiczkowiak, M & Lowe, R. J. (2018) Teaching English as a Lingua Franca. Stuttgart: DELTA Publishing

Korzekwa, D., Lorenzo-Trueba, J., Thomas Drugman, T. & Kostek, B. (2022) Computer-assisted pronunciation training—Speech synthesis is almost all you need. Speech Communication, 142: 22 – 33

Levis, J. M. (2018) Intelligibility, Oral Communication, and the Teaching of Pronunciation. Cambridge: Cambridge University Press

Mahdi, H. S. & Al Khateeb, A. A. (2019) The effectiveness of computer-assisted pronunciation training: A meta-analysis. Review of Education, 7 (3): 733 – 753

Patsko, L. & Simpson, K. (2019) How to Write Pronunciation Activities. ELT Teacher 2 Writer https://eltteacher2writer.co.uk/our-books/how-to-write-pronunciation-activities/

Ramjattan, V. A. (2019) Racializing the problem of and solution to foreign accent in business. Applied Linguistics Review, 13 (4). https://doi.org/10.1515/applirev2019-0058

Rogerson-Revell, P. M. (2021) Computer-Assisted Pronunciation Training (CAPT): Current Issues and Future Directions. RELC Journal, 52(1), 189–205. https://doi.org/10.1177/0033688220977406

Walker, R. (2010) Teaching the Pronunciation of English as a Lingua Franca. Oxford: Oxford University Press

Vocab Victor is a very curious vocab app. It’s not a flashcard system, designed to extend vocabulary breadth. Rather it tests the depth of a user’s vocabulary knowledge.

The app’s website refers to the work of Paul Meara (see, for example, Meara, P. 2009. Connected Words. Amsterdam: John Benjamins). Meara explored the ways in which an analysis of the words that we associate with other words can shed light on the organisation of our mental lexicon. Described as ‘gigantic multidimensional cobwebs’ (Aitchison, J. 1987. Words in the Mind. Oxford: Blackwell, p.86), our mental lexicons do not appear to store lexical items in individual slots, but rather they are distributed across networks of associations.

The size of the web (i.e. the number of words, or the level of vocabulary breadth) is important, but equally important is the strength of the connections within the web (or vocabulary depth), as this determines the robustness of vocabulary knowledge. These connections or associations are between different words and concepts and experiences, and they are developed by repeated, meaningful, contextualised exposure to a word. In other words, the connections are firmed up through extensive opportunities to use language.

In word association research, a person is given a prompt word and asked to say the first other word that comes to their mind. For an entertaining example of this process at work, you might enjoy this clip from the comedy show ‘Help’. The research has implications for a wide range of questions, not least second language acquisition. For example, given a particular prompt, native speakers produce a relatively small number of associative responses, and these are reasonably predictable. Learners, on the other hand, typically produce a much greater variety of responses (which might seem surprising, given that they have a smaller vocabulary store to select from).

One way of classifying the different kinds of response is to divide them into two categories: syntagmatic (words that are discoursally connected to the prompt, such as collocations) and paradigmatic (words that are semantically close to the prompt and are the same part of speech). Linguists have noted that learners (both L1 children and L2 learners) show a shift from predominantly syntagmatic responses to more paradigmatic responses as their mental lexicon develops.

The developers of Vocab Victor have set out to build ‘more and stronger associations for the words your students already know, and teaches new words by associating them with existing, known words, helping students acquire native-like word networks. Furthermore, Victor teaches different types of knowledge, including synonyms, “type-of” relationships, collocations, derivations, multiple meanings and form-focused knowledge’. Since we know how important vocabulary depth is, this seems like a pretty sensible learning target.

The app attempts to develop this breadth in two main ways (see below). The ‘core game’ is called ‘Word Strike’ where learners have to pick the word on the arrow which most closely matches the word on the target. The second is called ‘Word Drop’ where a bird holds a word card and the user has to decide if it relates more to one of two other words below. Significantly, they carry out these tasks before any kind of association between form and meaning has been established. The meaning of unknown items can be checked in a monolingual dictionary later. There are a couple of other, less important games that I won’t describe now. The graphics are attractive, if a little juvenile. The whole thing is gamified with levels, leaderboards and so on. It’s free and, presumably, still under development.

Word strike backsideBird drop certain

The app claims to be for ‘English language learners of all ages [to] develop a more native-like vocabulary’. It also says that it is appropriate for ‘native speaking primary students [to] build and strengthen vocabulary for better test performance and stronger reading skills’, as well as ‘secondary students [to] prepare for the PSAT and SAT’. It was the scope of these claims that first set my alarm bells ringing. How could one app be appropriate for such diverse users? (Spoiler: it can’t, and attempts to make an edtech product suitable for everyone inevitably end up with a product that is suitable for no one.)

Rich, associative lexical networks are the result of successful vocabulary acquisition, but neither Paul Meara nor anyone else in the word association field has, to the best of my knowledge, ever suggested that deliberate study is the way to develop the networks. It is uncontentious to say that vocabulary depth (as shown by associative networks) is best developed through extensive exposure to input – reading and listening.

It is also reasonably uncontentious to say that deliberate study of vocabulary pays greatest dividends in developing vocabulary breadth (not depth), especially at lower levels, with a focus on the top three to eight thousand words in terms of frequency. It may also be useful at higher levels when a learner needs to acquire a limited number of new words for a particular purpose. An example of this would be someone who is going to study in an EMI context and would benefit from rapid learning of the words of the Academic Word List.

The Vocab Victor website says that the app ‘is uniquely focused on intermediate-level vocabulary. The app helps get students beyond this plateau by selecting intermediate-level vocabulary words for your students’. At B1 and B2 levels, learners typically know words that fall between #2500 and #3750 in the frequency tables. At level C2, they know most of the most frequent 5000 items. The less frequent a word is, the less point there is in studying it deliberately.

For deliberate study of vocabulary to serve any useful function, the target language needs to be carefully selected, with a focus on high-frequency items. It makes little sense to study words that will already be very familiar. And it makes no sense to deliberately study apparently random words that are so infrequent (i.e. outside the top 10,000) that it is unlikely they will be encountered again before the deliberate study has been forgotten. Take a look at the examples below and judge for yourself how well chosen the items are.

Year etcsmashed etc

Vocab Victor appears to focus primarily on semantic fields, as in the example above with ‘smashed’ as a key word. ‘Smashed’, ‘fractured’, ‘shattered’ and ‘cracked’ are all very close in meaning. In order to disambiguate them, it would help learners to see which nouns typically collocate with these words. But they don’t get this with the app – all they get are English-language definitions from Merriam-Webster. What this means is that learners are (1) unlikely to develop a sufficient understanding of target items to allow them to incorporate them into their productive lexicon, and (2) likely to get completely confused with a huge number of similar, low-frequency words (that weren’t really appropriate for deliberate study in the first place). What’s more, lexical sets of this kind may not be a terribly good idea, anyway (see my blog post on the topic).

Vocab Victor takes words, as opposed to lexical items, as the target learning objects. Users may be tested on the associations of any of the meanings of polysemantic items. In the example below (not perhaps the most appropriate choice for primary students!), there are two main meanings, but with other items, things get decidedly more complex (see the example with ‘toss’). Learners are also asked to do the associative tasks ‘Word Strike’ and ‘Word Drop’ before they have had a chance to check the possible meanings of either the prompt item or the associative options.

Stripper definitionStripper taskToss definition

How anyone could learn from any of this is quite beyond me. I often struggled to choose the correct answer myself; there were also a small number of items whose meaning I wasn’t sure of. I could see no clear way in which items were being recycled (there’s no spaced repetition here). The website claims that ‘adaptating [sic] to your student’s level happens automatically from the very first game’, but I could not see this happening. In fact, it’s very hard to adapt target item selection to an individual learner, since right / wrong or multiple choice answers tell us so little. Does a correct answer tell us that someone knows an item or just that they made a lucky guess? Does an incorrect answer tell us that an item is unknown or just that, under game pressure, someone tapped the wrong button? And how do you evaluate a learner’s lexical level (as a starting point),  even with very rough approximation,  without testing knowledge of at least thirty items first? All in all, then, a very curious app.

One of the most powerful associative responses to a word (especially with younger learners) is what is called a ‘klang’ response: another word which rhymes with or sounds like the prompt word. So, if someone says the word ‘app’ to you, what’s the first klang response that comes to mind?

The most widely-used and popular tool for language learners is the bilingual dictionary (Levy & Steel, 2015), and the first of its kind appeared about 4,000 years ago (2,000 years earlier than the first monolingual dictionaries), offering wordlists in Sumerian and Akkadian (Wheeler, 2013: 9 -11). Technology has come a long way since the clay tablets of the Bronze Age. Good online dictionaries now contain substantially more information (in particular audio recordings) than their print equivalents of a few decades ago. In addition, they are usually quicker and easier to use, more popular, and lead to retention rates that are comparable to, or better than, those achieved with print (Töpel, 2014). The future of dictionaries is likely to be digital, and paper dictionaries may well disappear before very long (Granger, 2012: 2).

English language learners are better served than learners of other languages, and the number of free, online bilingual dictionaries is now enormous. Speakers of less widely-spoken languages may still struggle to find a good quality service, but speakers of, for example, Polish (with approximately 40 million speakers, and a ranking of #33 in the list of the world’s most widely spoken languages) will find over twenty free, online dictionaries to choose from (Lew & Szarowska, 2017). Speakers of languages that are more widely spoken (Chinese, Spanish or Portuguese, for example) will usually find an even greater range. The choice can be bewildering and neither search engine results nor rankings from app stores can be relied on to suggest the product of the highest quality.

Language teachers are not always as enthusiastic about bilingual dictionaries as their learners. Folse (2004: 114 – 120) reports on an informal survey of English teachers which indicated that 11% did not allow any dictionaries in class at all, 37% allowed monolingual dictionaries and only 5% allowed bilingual dictionaries. Other researchers (e.g. Boonmoh & Nesi, 2008), have found a similar situation, with teachers overwhelmingly recommending the use of a monolingual learner’s dictionary: almost all of their students bought one, but the great majority hardly ever used it, preferring instead a digital bilingual version.

Teachers’ preferences for monolingual dictionaries are usually motivated in part by a fear that their students will become too reliant on translation. Whilst this concern remains widespread, much recent suggests that this fear is misguided (Nation, 2013: 424) and that monolingual dictionaries do not actually lead to greater learning gains than their bilingual counterparts. This is, in part, due to the fact that learners typically use these dictionaries in very limited ways – to see if a word exists, check spelling or look up meaning (Harvey & Yuill, 1997). If they made fuller use of the information (about frequency, collocations, syntactic patterns, etc.) on offer, it is likely that learning gains would be greater: ‘it is accessing multiplicity of information that is likely to enhance retention’ (Laufer & Hill, 2000: 77). Without training, however, this is rarely the case.  With lower-level learners, a monolingual learner’s dictionary (even one designed for Elementary level students) can be a frustrating experience, because until they have reached a vocabulary size of around 2,000 – 3,000 words, they will struggle to understand the definitions (Webb & Nation, 2017: 119).

The second reason for teachers’ preference for monolingual dictionaries is that the quality of many bilingual dictionaries is undoubtedly very poor, compared to monolingual learner’s dictionaries such as those produced by Oxford University Press, Cambridge University Press, Longman Pearson, Collins Cobuild, Merriam-Webster and Macmillan, among others. The situation has changed, however, with the rapid growth of bilingualized dictionaries. These contain all the features of a monolingual learner’s dictionary, but also include translations into the learner’s own language. Because of the wealth of information provided by a good bilingualized dictionary, researchers (e.g. Laufer & Hadar, 1997; Chen, 2011) generally consider them preferable to monolingual or normal bilingual dictionaries. They are also popular with learners. Good bilingualized online dictionaries (such as the Oxford Advanced Learner’s English-Chinese Dictionary) are not always free, but many are, and with some language pairings free software can be of a higher quality than services that incur a subscription charge.

If a good bilingualized dictionary is available, there is no longer any compelling reason to use a monolingual learner’s dictionary, unless it contains features which cannot be found elsewhere. In order to compete in a crowded marketplace, many of the established monolingual learner’s dictionaries do precisely that. Examples of good, free online dictionaries include:

Students need help in selecting a dictionary that is right for them. Without this, many end up using as a dictionary a tool such as Google Translate , which, for all its value, is of very limited use as a dictionary. They need to understand that the most appropriate dictionary will depend on what they want to use it for (receptive, reading purposes or productive, writing purposes). Teachers can help in this decision-making process by addressing the issue in class (see the activity below).

In addition to the problem of selecting an appropriate dictionary, it appears that many learners have inadequate dictionary skills (Niitemaa & Pietilä, 2018). In one experiment (Tono, 2011), only one third of the vocabulary searches in a dictionary that were carried out by learners resulted in success. The reasons for failure include focussing on only the first meaning (or translation) of a word that is provided, difficulty in finding the relevant information in long word entries, an inability to find the lemma that is needed, and spelling errors (when they had to type in the word) (Töpel, 2014). As with monolingual dictionaries, learners often only check the meaning of a word in a bilingual dictionary and fail to explore the wider range of information (e.g. collocation, grammatical patterns, example sentences, synonyms) that is available (Laufer & Kimmel, 1997; Laufer & Hill, 2000; Chen, 2010). This information is both useful and may lead to improved retention.

Most learners receive no training in dictionary skills, but would clearly benefit from it. Nation (2013: 333) suggests that at least four or five hours, spread out over a few weeks, would be appropriate. He suggests (ibid: 419 – 421) that training should encourage learners, first, to look closely at the context in which an unknown word is encountered (in order to identify the part of speech, the lemma that needs to be looked up, its possible meaning and to decide whether it is worth looking up at all), then to help learners in finding the relevant entry or sub-entry (by providing information about common dictionary abbreviations (e.g. for parts of speech, style and register)), and, finally, to check this information against the original context.

Two good resource books full of practical activities for dictionary training are available: ‘Dictionary Activities’ by Cindy Leaney (Cambridge: Cambridge University Press, 2007) and ‘Dictionaries’ by Jon Wright (Oxford: Oxford University Press, 1998). Many of the good monolingual dictionaries offer activity guides to promote effective dictionary use and I have suggested a few activities here.

Activity: Understanding a dictionary

Outline: Students explore the use of different symbols in good online dictionaries.

Level: All levels, but not appropriate for very young learners. The activity ‘Choosing a dictionary’ is a good follow-up to this activity.

1 Distribute the worksheet and ask students to follow the instructions.

act_1

2 Check the answers.

Act_1_key

Activity: Choosing a dictionary

Outline: Students explore and evaluate the features of different free, online bilingual dictionaries.

Level: All levels, but not appropriate for very young learners. The text in stage 3 is appropriate for use with levels A2 and B1. For some groups of learners, you may want to adapt (or even translate) the list of features. It may be useful to do the activity ‘Understanding a dictionary’ before this activity.

1 Ask the class which free, online bilingual dictionaries they like to use. Write some of their suggestions on the board.

2 Distribute the list of features. Ask students to work individually and tick the boxes that are important for them. Ask students to work with a partner to compare their answers.

Act_2

3 Give students a list of free, online bilingual (English and the students’ own language) dictionaries. You can use suggestions from the list below, add the suggestions that your students made in stage 1, or add your own ideas. (For many language pairings, better resources are available than those in the list below.) Give the students the following short text and ask the students to use two of these dictionaries to look up the underlined words. Ask the students to decide which dictionary they found most useful and / or easiest to use.

act_2_text

dict_list

4 Conduct feedback with the whole class.

Activity: Getting more out of a dictionary

Outline: Students use a dictionary to help them to correct a text

Level: Levels B1 and B2, but not appropriate for very young learners. For higher levels, a more complex text (with less obvious errors) would be appropriate.

1 Distribute the worksheet below and ask students to follow the instructions.

act_3

2 Check answers with the whole class. Ask how easy it was to find the information in the dictionary that they were using.

Key

When you are reading, you probably only need a dictionary when you don’t know the meaning of a word and you want to look it up. For this, a simple bilingual dictionary is good enough. But when you are writing or editing your writing, you will need something that gives you more information about a word: grammatical patterns, collocations (the words that usually go with other words), how formal the word is, and so on. For this, you will need a better dictionary. Many of the better dictionaries are monolingual (see the box), but there are also some good bilingual ones.

Use one (or more) of the online dictionaries in the box (or a good bilingual dictionary) and make corrections to this text. There are eleven mistakes (they have been underlined) in total.

References

Boonmoh, A. & Nesi, H. 2008. ‘A survey of dictionary use by Thai university staff and students with special reference to pocket electronic dictionaries’ Horizontes de Linguística Aplicada , 6(2), 79 – 90

Chen, Y. 2011. ‘Studies on Bilingualized Dictionaries: The User Perspective’. International Journal of Lexicography, 24 (2): 161–197

Folse, K. 2004. Vocabulary Myths. Ann Arbor: University of Michigan Press

Granger, S. 2012. Electronic Lexicography. Oxford: Oxford University Press

Harvey, K. & Yuill, D. 1997. ‘A study of the use of a monolingual pedagogical dictionary by learners of English engaged in writing’ Applied Linguistics, 51 (1): 253 – 78

Laufer, B. & Hadar, L. 1997. ‘Assessing the effectiveness of monolingual, bilingual and ‘bilingualized’ dictionaries in the comprehension and production of new words’. Modern Language Journal, 81 (2): 189 – 96

Laufer, B. & M. Hill 2000. ‘What lexical information do L2 learners select in a CALL dictionary and how does it affect word retention?’ Language Learning & Technology 3 (2): 58–76

Laufer, B. & Kimmel, M. 1997. ‘Bilingualised dictionaries: How learners really use them’, System, 25 (3): 361 -369

Leaney, C. 2007. Dictionary Activities. Cambridge: Cambridge University Press

Levy, M. and Steel, C. 2015. ‘Language learner perspectives on the functionality and use of electronic language dictionaries’. ReCALL, 27(2): 177–196

Lew, R. & Szarowska, A. 2017. ‘Evaluating online bilingual dictionaries: The case of popular free English-Polish dictionaries’ ReCALL 29(2): 138–159

Nation, I.S.P. 2013. Learning Vocabulary in Another Language 2nd edition. Cambridge: Cambridge University Press

Niitemaa, M.-L. & Pietilä, P. 2018. ‘Vocabulary Skills and Online Dictionaries: A Study on EFL Learners’ Receptive Vocabulary Knowledge and Success in Searching Electronic Sources for Information’, Journal of Language Teaching and Research, 9 (3): 453-462

Tono, Y. 2011. ‘Application of eye-tracking in EFL learners’ dictionary look-up process research’, International Journal of Lexicography 24 (1): 124–153

Töpel, A. 2014. ‘Review of research into the use of electronic dictionaries’ in Müller-Spitzer, C. (Ed.) 2014. Using Online Dictionaries. Berlin: De Gruyter, pp. 13 – 54

Webb, S. & Nation, P. 2017. How Vocabulary is Learned. Oxford: Oxford University Press

Wheeler, G. 2013. Language Teaching through the Ages. New York: Routledge

Wright, J. 1998. Dictionaries. Oxford: Oxford University Press

There has been wide agreement for a long time that one of the most important ways of building the mental lexicon is by having extended exposure to language input through reading and listening. Some researchers (e.g. Krashen, 2008) have gone as far as to say that direct vocabulary instruction serves little purpose, as there is no interface between explicit and implicit knowledge. This remains, however, a minority position, with a majority of researchers agreeing with Barcroft (2015) that deliberate learning plays an important role, even if it is only ‘one step towards knowing the word’ (Nation, 2013: 46).

There is even more agreement when it comes to the differences between deliberate study and extended exposure to language input, in terms of the kinds of learning that takes place. Whilst basic knowledge of lexical items (the pairings of meaning and form) may be developed through deliberate learning (e.g. flash cards), it is suggested that ‘the more ‘contextualized’ aspects of vocabulary (e.g. collocation) cannot be easily taught explicitly and are best learned implicitly through extensive exposure to the use of words in context’ (Schmitt, 2008: 333). In other words, deliberate study may develop lexical breadth, but, for lexical depth, reading and listening are the way to go.

This raises the question of how many times a learner would need to encounter a word (in reading or listening) in order to learn its meaning. Learners may well be developing other aspects of word knowledge at the same time, of course, but a precondition for this is probably that the form-meaning relationship is sorted out. Laufer and Nation (2012: 167) report that ‘researchers seem to agree that with ten exposures, there is some chance of recognizing the meaning of a new word later on’. I’ve always found this figure interesting, but strangely unsatisfactory, unsure of what, precisely, it was actually telling me. Now, with the recent publication of a meta-analysis looking at the effects of repetition on incidental vocabulary learning (Uchihara, Webb & Yanagisawa, 2019), things are becoming a little clearer.

First of all, the number ten is a ballpark figure, rather than a scientifically proven statistic. In their literature review, Uchihara et al. report that ‘the number of encounters necessary to learn words rang[es] from 6, 10, 12, to more than 20 times. That is to say, ‘the number of encounters necessary for learning of vocabulary to occur during meaning-focussed input remains unclear’. If you ask a question to which there is a great variety of answers, there is a strong probability that there is something wrong with the question. That, it would appear, is the case here.

Unsurprisingly, there is, at least, a correlation between repeated encounters of a word and learning, described by Uchihara et al as statistically significant (with a medium effect size). More interesting are the findings about the variables in the studies that were looked at. These included ‘learner variables’ (age and the current size of the learner’s lexicon), ‘treatment variables’ (the amount of spacing between the encounters, listening versus reading, the presence or absence of visual aids, the degree to which learners ‘engage’ with the words they encounter) and ‘methodological variables’ in the design of the research (the kinds of words that are being looked at, word characteristics, the use of non-words, the test format and whether or not learners were told that they were going to be tested).

Here is a selection of the findings:

  • Older learners tend to benefit more from repeated encounters than younger learners.
  • Learners with a smaller vocabulary size tend to benefit more from repeated encounters with L2 words, but this correlation was not statistically significant. ‘Beyond a certain point in vocabulary growth, learners may be able to acquire L2 words in fewer encounters and need not receive as many encounters as learners with smaller vocabulary size’.
  • Learners made greater gains when the repeated exposure took place under massed conditions (e.g. on the same day), rather than under ‘spaced conditions’ (spread out over a longer period of time).
  • Repeated exposure during reading and, to a slightly lesser extent, listening resulted in more gains than reading while listening and viewing.
  • ‘Learners presented with visual information during meaning-focused tasks benefited less from repeated encounters than those who had no access to the information’. This does not mean that visual support is counter-productive: only that the positive effect of repeated encounters is not enhanced by visual support.
  • ‘A significantly larger effect was found for treatments involving no engagement compared to treatment involving engagement’. Again, this does not mean that ‘no engagement’ is better than ‘engagement’: only that the positive effect of repeated encounters is not enhanced by ‘engagement’.
  • ‘The frequency-learning correlation does not seem to increase beyond a range of around 20 encounters with a word’.
  • Experiments using non-words may exaggerate the effect of frequent encounters (i.e. in the real world, with real words, the learning potential of repeated encounters may be less than indicated by some research).
  • Forewarning learners of an upcoming comprehension test had a positive impact on gains in vocabulary learning. Again, this does not mean that teachers should systematically test their students’ comprehension of what they have read.

For me, the most interesting finding was that ‘about 11% of the variance in word learning through meaning-focused input was explained by frequency of encounters’. This means, quite simply, that a wide range of other factors, beyond repeated encounters, will determine the likelihood of learners acquiring vocabulary items from extensive reading and listening. The frequency of word encounters is just one factor among many.

I’m still not sure what the takeaways from this meta-analysis should be, besides the fact that it’s all rather complex. The research does not, in any way, undermine the importance of massive exposure to meaning-focussed input in learning a language. But I will be much more circumspect in my teacher training work about making specific claims concerning the number of times that words need to be encountered before they are ‘learnt’. And I will be even more sceptical about claims for the effectiveness of certain online language learning programs which use algorithms to ensure that words reappear a certain number of times in written, audio and video texts that are presented to learners.

References

Barcroft, J. 2015. Lexical Input Processing and Vocabulary Learning. Amsterdam: John Benjamins

Laufer, B. & Nation, I.S.P. 2012. Vocabulary. In Gass, S.M. & Mackey, A. (Eds.) The Routledge Handbook of Second Language Acquisition (pp.163 – 176). Abingdon, Oxon.: Routledge

Nation, I.S.P. 2013. Learning Vocabulary in Another Language 2nd edition. Cambridge: Cambridge University Press

Krashen, S. 2008. The comprehension hypothesis extended. In T. Piske & M. Young-Scholten (Eds.), Input Matters in SLA (pp.81 – 94). Bristol, UK: Multilingual Matters

Schmitt, N. 2008. Review article: instructed second language vocabulary learning. Language Teaching Research 12 (3): 329 – 363

Uchihara, T., Webb, S. & Yanagisawa, A. 2019. The Effects of Repetition on Incidental Vocabulary Learning: A Meta-Analysis of Correlational Studies. Language Learning, 69 (3): 559 – 599) Available online: https://www.researchgate.net/publication/330774796_The_Effects_of_Repetition_on_Incidental_Vocabulary_Learning_A_Meta-Analysis_of_Correlational_Studies

A personalized language learning programme that is worth its name needs to offer a wide variety of paths to accommodate the varying interests, priorities, levels and preferred approaches to learning of the users of the programme. For this to be possible, a huge quantity of learning material is needed (Iwata et al., 2011: 1): the preparation and curation of this material is extremely time-consuming and expensive (despite the pittance that is paid to writers and editors). It’s not surprising, then, that a growing amount of research is being devoted to the exploration of ways of automatically generating language learning material. One area that has attracted a lot of attention is the learning of vocabulary.

Memrise screenshot 2Many simple vocabulary learning tasks are relatively simple to generate automatically. These include matching tasks of various kinds, such as the matching of words or phrases to meanings (either in English or the L1), pictures or collocations, as in many flashcard apps. Doing it well is rather harder: the definitions or translations have to be good and appropriate for learners of the level, the pictures need to be appropriate. If, as is often the case, the lexical items have come from a text or form part of a group of some kind, sense disambiguation software will be needed to ensure that the right meaning is being practised. Anyone who has used flashcard apps knows that the major problem is usually the quality of the content (whether it has been automatically generated or written by someone).

A further challenge is the generation of distractors. In the example here (from Memrise), the distractors have been so badly generated as to render the task more or less a complete waste of time. Distractors must, in some way, be viable alternatives (Smith et al., 2010) but still clearly wrong. That means they should normally be the same part of speech and true cognates should be avoided. Research into the automatic generation of distractors is well-advanced (see, for instance, Kumar at al., 2015) with Smith et al (2010), for example, using a very large corpus and various functions of Sketch Engine (the most well-known corpus query tool) to find collocates and other distractors. Their TEDDCLOG (Testing English with Data-Driven CLOze Generation) system produced distractors that were deemed acceptable 91% of the time. Whilst impressive, there is still a long way to go before human editing / rewriting is no longer needed.

One area that has attracted attention is, of course, tests, and some tasks, such as those in TOEFL (see image). Susanti et al (2015, 2017) were able, given a target word, to automatically generate a reading passage from web sources along with questions of the TOEFL kind. However, only about half of them were considered good enough to be used in actual tests. Again, that is some way off avoiding human intervention altogether, but the automatically generated texts and questions can greatly facilitate the work of human item writers.

toefl task

 

Other tools that might be useful include the University of Nottingham AWL (Academic Word List) Gapmaker . This allows users to type or paste in a text, from which items from the AWL are extracted and replaced as a gap. See the example below. It would, presumably, not be too difficult, to combine this approach with automatic distractor generation and to create multiple choice tasks.

Nottingham_AWL_Gapmaster

WordGapThere are a number of applications that offer the possibility of generating cloze tasks from texts selected by the user (learner or teacher). These have not always been designed with the language learner in mind but one that was is the Android app, WordGap (Knoop & Wilske, 2013). Described by its developers as a tool that ‘provides highly individualized exercises to support contextualized mobile vocabulary learning …. It matches the interests of the learner and increases the motivation to learn’. It may well do all that, but then again, perhaps not. As Knoop & Wilske acknowledge, it is only appropriate for adult, advanced learners and its value as a learning task is questionable. The target item that has been automatically selected is ‘novel’, a word that features in the list Oxford 2000 Keywords (as do all three distractors), and therefore ought to be well below the level of the users. Some people might find this fun, but, in terms of learning, they would probably be better off using an app that made instant look-up of words in the text possible.

More interesting, in my view, is TEDDCLOG (Smith et al., 2010), a system that, given a target learning item (here the focus is on collocations), trawls a large corpus to find the best sentence that illustrates it. ‘Good sentences’ were defined as those which were short (but not too short, or there is not enough useful context, begins with a capital letter and ends with a full stop, has a maximum of two commas; and otherwise contains only the 26 lowercase letters. It must be at a lexical and grammatical level that an intermediate level learner of English could be expected to understand. It must be well-formed and without too much superfluous material. All others were rejected. TEDDCLOG uses Sketch Engine’s GDEX function (Good Dictionary Example Extractor, Kilgarriff et al 2008) to do this.

My own interest in this area came about as a result of my work in the development of the Oxford Vocabulary Trainer . The app offers the possibility of studying both pre-determined lexical items (e.g. the vocabulary list of a coursebook that the learner is using) and free choice (any item could be activated and sent to a learning queue). In both cases, practice takes the form of sentences with the target item gapped. There are a range of hints and help options available to the learner, and feedback is both automatic and formative (i.e. if the supplied answer is not correct, hints are given to push the learner to do better on a second attempt). Leveraging some fairly heavy technology, we were able to achieve a fair amount of success in the automation of intelligent feedback, but what had, at first sight, seemed a lesser challenge – the generation of suitable ‘carrier sentences’, proved more difficult.

The sentences which ‘carry’ the gap should, ideally, be authentic: invented examples often ‘do not replicate the phraseology and collocational preferences of naturally-occurring text’ (Smith et al., 2010). The technology of corpus search tools should allow us to do a better job than human item writers. For that to be the case, we need not only good search tools but a good corpus … and some are better than others for the purposes of language learning. As Fenogenova & Kuzmenko (2016) discovered when using different corpora to automatically generate multiple choice vocabulary exercises, the British Academic Written English corpus (BAWE) was almost 50% more useful than the British National Corpus (BNC). In the development of the Oxford Vocabulary Trainer, we thought we had the best corpus we could get our hands on – the tagged corpus used for the production of the Oxford suite of dictionaries. We could, in addition and when necessary, turn to other corpora, including the BAWE and the BNC. Our requirements for acceptable carrier sentences were similar to those of Smith et al (2010), but were considerably more stringent.

To cut quite a long story short, we learnt fairly quickly that we simply couldn’t automate the generation of carrier sentences with sufficient consistency or reliability. As with some of the other examples discussed in this post, we were able to use the technology to help the writers in their work. We also learnt (rather belatedly, it has to be admitted) that we were trying to find technological solutions to problems that we hadn’t adequately analysed at the start. We hadn’t, for example, given sufficient thought to learner differences, especially the role of L1 (and other languages) in learning English. We hadn’t thought enough about the ‘messiness’ of either language or language learning. It’s possible, given enough resources, that we could have found ways of improving the algorithms, of leveraging other tools, or of deploying additional databases (especially learner corpora) in our quest for a personalised vocabulary learning system. But, in the end, it became clear to me that we were only nibbling at the problem of vocabulary learning. Deliberate learning of vocabulary may be an important part of acquiring a language, but it remains only a relatively small part. Technology may be able to help us in a variety of ways (and much more so in testing than learning), but the dreams of the data scientists (who wrote much of the research cited here) are likely to be short-lived. Experienced writers and editors of learning materials will be needed for the foreseeable future. And truly personalized vocabulary learning, fully supported by technology, will not be happening any time soon.

 

References

Fenogenova, A. & Kuzmenko, E. 2016. Automatic Generation of Lexical Exercises Available online at http://www.dialog-21.ru/media/3477/fenogenova.pdf

Iwata, T., Goto, T., Kojiri, T., Watanabe, T. & T. Yamada. 2011. ‘Automatic Generation of English Cloze Questions Based on Machine Learning’. NTT Technical Review Vol. 9 No. 10 Oct. 2011

Kilgarriff, A. et al. 2008. ‘GDEX: Automatically Finding Good Dictionary Examples in a Corpus.’ In E. Bernal and J. DeCesaris (eds.), Proceedings of the XIII EURALEX International Congress: Barcelona, 15-19 July 2008. Barcelona: l’Institut Universitari de Lingüística Aplicada (IULA) dela Universitat Pompeu Fabra, 425–432.

Knoop, S. & Wilske, S. 2013. ‘WordGap – Automatic generation of gap-filling vocabulary exercises for mobile learning’. Proceedings of the second workshop on NLP for computer-assisted language learning at NODALIDA 2013. NEALT Proceedings Series 17 / Linköping Electronic Conference Proceedings 86: 39–47. Available online at http://www.ep.liu.se/ecp/086/004/ecp13086004.pdf

Kumar, G., Banchs, R.E. & D’Haro, L.F. 2015. ‘RevUP: Automatic Gap-Fill Question Generation from Educational Texts’. Proceedings of the Tenth Workshop on Innovative Use of NLP for Building Educational Applications, 2015, pp. 154–161, Denver, Colorado, June 4, Association for Computational Linguistics

Smith, S., Avinesh, P.V.S. & Kilgariff, A. 2010. ‘Gap-fill tests for Language Learners: Corpus-Driven Item Generation’. Proceedings of ICON-2010: 8th International Conference on Natural Language Processing, Macmillan Publishers, India. Available online at https://curve.coventry.ac.uk/open/file/2b755b39-a0fa-4171-b5ae-5d39568874e5/1/smithcomb2.pdf

Susanti, Y., Iida, R. & Tokunaga, T. 2015. ‘Automatic Generation of English Vocabulary Tests’. Proceedings of 7th International Conference on Computer Supported Education. Available online https://pdfs.semanticscholar.org/aead/415c1e07803756902b859e8b6e47ce312d96.pdf

Susanti, Y., Tokunaga, T., Nishikawa, H. & H. Obari 2017. ‘Evaluation of automatically generated English vocabulary questions’ Research and Practice in Technology Enhanced Learning 12 / 11

 

Every now and then, someone recommends me to take a look at a flashcard app. It’s often interesting to see what developers have done with design, gamification and UX features, but the content is almost invariably awful. Most recently, I was encouraged to look at Word Pash. The screenshots below are from their promotional video.

word-pash-1 word-pash-2 word-pash-3 word-pash-4

The content problems are immediately apparent: an apparently random selection of target items, an apparently random mix of high and low frequency items, unidiomatic language examples, along with definitions and distractors that are less frequent than the target item. I don’t know if these are representative of the rest of the content. The examples seem to come from ‘Stage 1 Level 3’, whatever that means. (My confidence in the product was also damaged by the fact that the Word Pash website includes one testimonial from a certain ‘Janet Reed – Proud Mom’, whose son ‘was able to increase his score and qualify for academic scholarships at major universities’ after using the app. The picture accompanying ‘Janet Reed’ is a free stock image from Pexels and ‘Janet Reed’ is presumably fictional.)

According to the website, ‘WordPash is a free-to-play mobile app game for everyone in the global audience whether you are a 3rd grader or PhD, wordbuff or a student studying for their SATs, foreign student or international business person, you will become addicted to this fast paced word game’. On the basis of the promotional video, the app couldn’t be less appropriate for English language learners. It seems unlikely that it would help anyone improve their ACT or SAT test scores. The suggestion that the vocabulary development needs of 9-year-olds and doctoral students are comparable is pure chutzpah.

The deliberate study of more or less random words may be entertaining, but it’s unlikely to lead to very much in practical terms. For general purposes, the deliberate learning of the highest frequency words, up to about a frequency ranking of #7500, makes sense, because there’s a reasonably high probability that you’ll come across these items again before you’ve forgotten them. Beyond that frequency level, the value of the acquisition of an additional 1000 words tails off very quickly. Adding 1000 words from frequency ranking #8000 to #9000 is likely to result in an increase in lexical understanding of general purpose texts of about 0.2%. When we get to frequency ranks #19,000 to #20,000, the gain in understanding decreases to 0.01%[1]. In other words, deliberate vocabulary learning needs to be targeted. The data is relatively recent, but the principle goes back to at least the middle of the last century when Michael West argued that a principled approach to vocabulary development should be driven by a comparison of the usefulness of a word and its ‘learning cost’[2]. Three hundred years before that, Comenius had articulated something very similar: ‘in compiling vocabularies, my […] concern was to select the words in most frequent use[3].

I’ll return to ‘general purposes’ later in this post, but, for now, we should remember that very few language learners actually study a language for general purposes. Globally, the vast majority of English language learners study English in an academic (school) context and their immediate needs are usually exam-specific. For them, general purpose frequency lists are unlikely to be adequate. If they are studying with a coursebook and are going to be tested on the lexical content of that book, they will need to use the wordlist that matches the book. Increasingly, publishers make such lists available and content producers for vocabulary apps like Quizlet and Memrise often use them. Many examinations, both national and international, also have accompanying wordlists. Examples of such lists produced by examination boards include the Cambridge English young learners’ exams (Starters, Movers and Flyers) and Cambridge English Preliminary. Other exams do not have official word lists, but reasonably reliable lists have been produced by third parties. Examples include Cambridge First, IELTS and SAT. There are, in addition, well-researched wordlists for academic English, including the Academic Word List (AWL)  and the Academic Vocabulary List  (AVL). All of these make sensible starting points for deliberate vocabulary learning.

When we turn to other, out-of-school learners the number of reasons for studying English is huge. Different learners have different lexical needs, and working with a general purpose frequency list may be, at least in part, a waste of time. EFL and ESL learners are likely to have very different needs, as will EFL and ESP learners, as will older and younger learners, learners in different parts of the world, learners who will find themselves in English-speaking countries and those who won’t, etc., etc. For some of these demographics, specialised corpora (from which frequency-based wordlists can be drawn) exist. For most learners, though, the ideal list simply does not exist. Either it will have to be created (requiring a significant amount of time and expertise[4]) or an available best-fit will have to suffice. Paul Nation, in his recent ‘Making and Using Word Lists for Language Learning and Testing’ (John Benjamins, 2016) includes a useful chapter on critiquing wordlists. For anyone interested in better understanding the issues surrounding the development and use of wordlists, three good articles are freely available online. These are:making-and-using-word-lists-for-language-learning-and-testing

Lessard-Clouston, M. 2012 / 2013. ‘Word Lists for Vocabulary Learning and Teaching’ The CATESOL Journal 24.1: 287- 304

Lessard-Clouston, M. 2016. ‘Word lists and vocabulary teaching: options and suggestions’ Cornerstone ESL Conference 2016

Sorell, C. J. 2013. A study of issues and techniques for creating core vocabulary lists for English as an International Language. Doctoral thesis.

But, back to ‘general purposes’ …. Frequency lists are the obvious starting point for preparing a wordlist for deliberate learning, but they are very problematic. Frequency rankings depend on the corpus on which they are based and, since these are different, rankings vary from one list to another. Even drawing on just one corpus, rankings can be a little strange. In the British National Corpus, for example, ‘May’ (the month) is about twice as frequent as ‘August’[5], but we would be foolish to infer from this that the learning of ‘May’ should be prioritised over the learning of ‘August’. An even more striking example from the same corpus is the fact that ‘he’ is about twice as frequent as ‘she’[6]: should, therefore, ‘he’ be learnt before ‘she’?

List compilers have to make a number of judgement calls in their work. There is not space here to consider these in detail, but two particularly tricky questions concerning the way that words are chosen may be mentioned: Is a verb like ‘list’, with two different and unrelated meanings, one word or two? Should inflected forms be considered as separate words? The judgements are not usually informed by considerations of learners’ needs. Learners will probably best approach vocabulary development by building their store of word senses: attempting to learn all the meanings and related forms of any given word is unlikely to be either useful or successful.

Frequency lists, in other words, are not statements of scientific ‘fact’: they are interpretative documents. They have been compiled for descriptive purposes, not as ways of structuring vocabulary learning, and it cannot be assumed they will necessarily be appropriate for a purpose for which they were not designed.

A further major problem concerns the corpus on which the frequency list is based. Large databases, such as the British National Corpus or the Corpus of Contemporary American English, are collections of language used by native speakers in certain parts of the world, usually of a restricted social class. As such, they are of relatively little value to learners who will be using English in contexts that are not covered by the corpus. A context where English is a lingua franca is one such example.

A different kind of corpus is the Cambridge Learner Corpus (CLC), a collection of exam scripts produced by candidates in Cambridge exams. This has led to the development of the English Vocabulary Profile (EVP) , where word senses are tagged as corresponding to particular levels in the Common European Framework scale. At first glance, this looks like a good alternative to frequency lists based on native-speaker corpora. But closer consideration reveals many problems. The design of examination tasks inevitably results in the production of language of a very different kind from that produced in other contexts. Many high frequency words simply do not appear in the CLC because it is unlikely that a candidate would use them in an exam. Other items are very frequent in this corpus just because they are likely to be produced in examination tasks. Unsurprisingly, frequency rankings in EVP do not correlate very well with frequency rankings from other corpora. The EVP, then, like other frequency lists, can only serve, at best, as a rough guide for the drawing up of target item vocabulary lists in general purpose apps or coursebooks[7].

There is no easy solution to the problems involved in devising suitable lexical content for the ‘global audience’. Tagging words to levels (i.e. grouping them into frequency bands) will always be problematic, unless very specific user groups are identified. Writers, like myself, of general purpose English language teaching materials are justifiably irritated by some publishers’ insistence on allocating words to levels with numerical values. The policy, taken to extremes (as is increasingly the case), has little to recommend it in linguistic terms. But it’s still a whole lot better than the aleatory content of apps like Word Pash.

[1] See Nation, I.S.P. 2013. Learning Vocabulary in Another Language 2nd edition. (Cambridge: Cambridge University Press) p. 21 for statistical tables. See also Nation, P. & R. Waring 1997. ‘Vocabulary size, text coverage and word lists’ in Schmitt & McCarthy (eds.) 1997. Vocabulary: Description, Acquisition and Pedagogy. (Cambridge: Cambridge University Press) pp. 6 -19

[2] See Kelly, L.G. 1969. 25 Centuries of Language Teaching. (Rowley, Mass.: Rowley House) p.206 for a discussion of West’s ideas.

[3] Kelly, L.G. 1969. 25 Centuries of Language Teaching. (Rowley, Mass.: Rowley House) p. 184

[4] See Timmis, I. 2015. Corpus Linguistics for ELT (Abingdon: Routledge) for practical advice on doing this.

[5] Nation, I.S.P. 2016. Making and Using Word Lists for Language Learning and Testing. (Amsterdam: John Benjamins) p.58

[6] Taylor, J.R. 2012. The Mental Corpus. (Oxford: Oxford University Press) p.151

[7] For a detailed critique of the limitations of using the CLC as a guide to syllabus design and textbook development, see Swan, M. 2014. ‘A Review of English Profile Studies’ ELTJ 68/1: 89-96

Chatbots

Posted: December 1, 2016 in ed tech
Tags: , , , , , , , ,

 

Here’s a conversation I had recently with Cleverbot.

cleverbot

User (me): Hi Cleverbot. What are your views on the future of chatbots in language learning?

Cleverbot: Are you talking to yourself?

User: No, I’m asking you a question.

Cleverbot: Then what’s the question again? I forgot.

User: What are your views on the future of chatbots in language learning?

Cleverbot: Don’t you want me to finish my story?

User: No, I want you to answer my question.

Cleverbot: What is your question?

User: Do you think there is a future for chatbots in language education?

Cleverbot: Sure, if you are talking about the moon.

aliceI also tried to have a conversation with Alice, the avatar from EFL Classroom Bot, listed by Larry Ferlazzo as one of ‘the best online chatbots for practicing English’. I didn’t get any more sense out of her than out of Cleverbot.

Chatbots, apparently, are the next big thing. Again. David Mattin, head of trends and insights at trendwatching.com, writes (in the September 2016 issue of ‘Business Life’) that ‘the chatbot revolution is coming’ and that chatbots are a step towards the dream of an interface between user and technology that is so intuitive that the interface ‘simply fades away’. Chatbots have been around for some time. Remember Clippy – the Microsoft Office bot in the late 1990s – which you had to disable in order to stop yourself punching your computer screen? Since then, bots have become ubiquitous. There have been problems, such as Microsoft’s Tay bot that had to be taken down after sixteen hours earlier this year, when, after interacting with other Twitter users, it developed into an abusive Nazi. But chatbots aren’t going away and you’ve probably interacted with one to book a taxi, order food or attempt to talk to your bank. In September this year, the Guardian described them as ‘the talk of the town’ and ‘hot property in Silicon Valley’.

The real interest in chatbots is not, however, in the ‘exciting interface’ possibilities (both user interface and user experience remain pretty crude), but in the way that they are leaner, sit comfortably with the things we actually do on a phone and the fact that they offer a way of cutting out the high fees that developers have to pay to app stores . After so many start-up failures, chatbots offer a glimmer of financial hope to developers.

It’s no surprise, of course, to find the world of English language teaching beginning to sit up and take notice of this technology. A 2012 article by Ben Lehtinen in PeerSpectives enthuses about the possibilities in English language learning and reports the positive feedback of the author’s own students. ELTJam, so often so quick off the mark, developed an ELT Bot over the course of a hackathon weekend in March this year. Disappointingly, it wasn’t really a bot – more a case of humans pretending to be a bot pretending to be humans – but it probably served its exploratory purpose. duolingoAnd a few months ago Duolingo began incorporating bots. These are currently only available for French, Spanish and German learners in the iPhone app, so I haven’t been able to try it out and evaluate it. According to an infomercial in TechCrunch, ‘to make talking to the bots a bit more compelling, the company tried to give its different bots a bit of personality. There’s Chef Robert, Renee the Driver and Officer Ada, for example. They will react differently to your answers (and correct you as necessary), but for the most part, the idea here is to mimic a real conversation. These bots also allow for a degree of flexibility in your answers that most language-learning software simply isn’t designed for. There are plenty of ways to greet somebody, for example, but most services will often only accept a single answer. When you’re totally stumped for words, though, Duolingo offers a ‘help my reply’ button with a few suggested answers.’ In the last twelve months or so, Duolingo has considerably improved its ability to recognize multiple correct ways of expressing a particular idea, and its ability to recognise alternative answers to its translation tasks. However, I’m highly sceptical about its ability to mimic a real conversation any better than Cleverbot or Alice the EFL Bot, or its ability to provide systematically useful corrections.

My reasons lie in the current limitations of AI and NLP (Natural Language Processing). In a nutshell, we simply don’t know how to build a machine that can truly understand human language. Limited exchanges in restricted domains can be done pretty well (such as the early chatbot that did a good job of simulating an encounter with an evasive therapist, or, more recently ordering a taco and having a meaningless, but flirty conversation with a bot), but despite recent advances in semantic computing, we’re a long way from anything that can mimic a real conversation. As Audrey Watters puts it, we’re not even close.

When it comes to identifying language errors made by language learners, we’re not really much better off. Apps like Grammarly are not bad at identifying grammatical errors (but not good enough to be reliable), but pretty hopeless at dealing with lexical appropriacy. Much more reliable feedback to learners can be offered when the software is trained on particular topics and text types. Write & Improve does this with a relatively small selection of Cambridge English examination tasks, but a free conversation ….? Forget it.

So, how might chatbots be incorporated into language teaching / learning? A blog post from December 2015 entitled AI-powered chatbots and the future of language learning suggests one plausible possibility. Using an existing messenger service, such as WhatsApp or Telegram, an adaptive chatbot would send tasks (such as participation in a conversation thread with a predetermined topic, register, etc., or pronunciation practice or translation exercises) to a learner, provide feedback and record the work for later recycling. At the same time, the bot could send out reminders of work that needs to be done or administrative tasks that must be completed.

Kat Robb has written a very practical article about using instant messaging in English language classrooms. Her ideas are interesting (although I find the idea of students in a F2F classroom messaging each other slightly bizarre) and it’s easy to imagine ways in which her activities might be augmented with chatbot interventions. The Write & Improve app, mentioned above, could deploy a chatbot interface to give feedback instead of the flat (and, in my opinion, perfectly adequate) pop-up boxes currently in use. Come to think of it, more or less any digital language learning tool could be pimped up with a bot. Countless revisions can be envisioned.

But the overwhelming question is: would it be worth it? Bots are not likely, any time soon, to revolutionise language learning. What they might just do, however, is help to further reduce language teaching to a series of ‘mechanical and scripted gestures’. More certain is that a lot of money will be thrown down the post-truth edtech drain. Then, in the not too distant future, this latest piece of edtech will fall into the trough of disillusionment, to be replaced by the latest latest thing.

 

 

In December last year, I posted a wish list for vocabulary (flashcard) apps. At the time, I hadn’t read a couple of key research texts on the subject. It’s time for an update.

First off, there’s an article called ‘Intentional Vocabulary Learning Using Digital Flashcards’ by Hsiu-Ting Hung. It’s available online here. Given the lack of empirical research into the use of digital flashcards, it’s an important article and well worth a read. Its basic conclusion is that digital flashcards are more effective as a learning tool than printed word lists. No great surprises there, but of more interest, perhaps, are the recommendations that (1) ‘students should be educated about the effective use of flashcards (e.g. the amount and timing of practice), and this can be implemented through explicit strategy instruction in regular language courses or additional study skills workshops ‘ (Hung, 2015: 111), and (2) that digital flashcards can be usefully ‘repurposed for collaborative learning tasks’ (Hung, ibid.).

nakataHowever, what really grabbed my attention was an article by Tatsuya Nakata. Nakata’s research is of particular interest to anyone interested in vocabulary learning, but especially so to those with an interest in digital possibilities. A number of his research articles can be freely accessed via his page at ResearchGate, but the one I am interested in is called ‘Computer-assisted second language vocabulary learning in a paired-associate paradigm: a critical investigation of flashcard software’. Don’t let the title put you off. It’s a review of a pile of web-based flashcard programs: since the article is already five years old, many of the programs have either changed or disappeared, but the critical approach he takes is more or less as valid now as it was then (whether we’re talking about web-based stuff or apps).

Nakata divides his evaluation for criteria into two broad groups.

Flashcard creation and editing

(1) Flashcard creation: Can learners create their own flashcards?

(2) Multilingual support: Can the target words and their translations be created in any language?

(3) Multi-word units: Can flashcards be created for multi-word units as well as single words?

(4) Types of information: Can various kinds of information be added to flashcards besides the word meanings (e.g. parts of speech, contexts, or audios)?

(5) Support for data entry: Does the software support data entry by automatically supplying information about lexical items such as meaning, parts of speech, contexts, or frequency information from an internal database or external resources?

(6) Flashcard set: Does the software allow learners to create their own sets of flashcards?

Learning

(1) Presentation mode: Does the software have a presentation mode, where new items are introduced and learners familiarise themselves with them?

(2) Retrieval mode: Does the software have a retrieval mode, which asks learners to recall or choose the L2 word form or its meaning?

(3) Receptive recall: Does the software ask learners to produce the meanings of target words?

(4) Receptive recognition: Does the software ask learners to choose the meanings of target words?

(5) Productive recall: Does the software ask learners to produce the target word forms corresponding to the meanings provided?

(6) Productive recognition: Does the software ask learners to choose the target word forms corresponding to the meanings provided?

(7) Increasing retrieval effort: For a given item, does the software arrange exercises in the order of increasing difficulty?

(8) Generative use: Does the software encourage generative use of words, where learners encounter or use previously met words in novel contexts?

(9) Block size: Can the number of words studied in one learning session be controlled and altered?

(10) Adaptive sequencing: Does the software change the sequencing of items based on learners’ previous performance on individual items?

(11) Expanded rehearsal: Does the software help implement expanded rehearsal, where the intervals between study trials are gradually increased as learning proceeds? (Nakata, T. (2011): ‘Computer-assisted second language vocabulary learning in a paired-associate paradigm: a critical investigation of flashcard software’ Computer Assisted Language Learning, 24:1, 17-38)

It’s a rather different list from my own (there’s nothing I would disagree with here), because mine is more general and his is exclusively oriented towards learning principles. Nakata makes the point towards the end of the article that it would ‘be useful to investigate learners’ reactions to computer-based flashcards to examine whether they accept flashcard programs developed according to learning principles’ (p. 34). It’s far from clear, he points out, that conformity to learning principles are at the top of learners’ agendas. More than just users’ feelings about computer-based flashcards in general, a key concern will be the fact that there are ‘large individual differences in learners’ perceptions of [any flashcard] program’ (Nakata, N. 2008. ‘English vocabulary learning with word lists, word cards and computers: implications from cognitive psychology research for optimal spaced learning’ ReCALL 20(1), p. 18).

I was trying to make a similar point in another post about motivation and vocabulary apps. In the end, as with any language learning material, research-driven language learning principles can only take us so far. User experience is a far more difficult creature to pin down or to make generalisations about. A user’s reaction to graphics, gamification, uploading time and so on are so powerful and so subjective that learning principles will inevitably play second fiddle. That’s not to say, of course, that Nakata’s questions are not important: it’s merely to wonder whether the bigger question is truly answerable.

Nakata’s research identifies plenty of room for improvement in digital flashcards, and although the article is now quite old, not a lot had changed. Key areas to work on are (1) the provision of generative use of target words, (2) the need to increase retrieval effort, (3) the automatic provision of information about meaning, parts of speech, or contexts (in order to facilitate flashcard creation), and (4) the automatic generation of multiple-choice distractors.

In the conclusion of his study, he identifies one flashcard program which is better than all the others. Unsurprisingly, five years down the line, the software he identifies is no longer free, others have changed more rapidly in the intervening period, and who knows will be out in front next week?

 

About two and a half years ago when I started writing this blog, there was a lot of hype around adaptive learning and the big data which might drive it. Two and a half years are a long time in technology. A look at Google Trends suggests that interest in adaptive learning has been pretty static for the last couple of years. It’s interesting to note that 3 of the 7 lettered points on this graph are Knewton-related media events (including the most recent, A, which is Knewton’s latest deal with Hachette) and 2 of them concern McGraw-Hill. It would be interesting to know whether these companies follow both parts of Simon Cowell’s dictum of ‘Create the hype, but don’t ever believe it’.

Google_trends

A look at the Hype Cycle (see here for Wikipedia’s entry on the topic and for criticism of the hype of Hype Cycles) of the IT research and advisory firm, Gartner, indicates that both big data and adaptive learning have now slid into the ‘trough of disillusionment’, which means that the market has started to mature, becoming more realistic about how useful the technologies can be for organizations.

A few years ago, the Gates Foundation, one of the leading cheerleaders and financial promoters of adaptive learning, launched its Adaptive Learning Market Acceleration Program (ALMAP) to ‘advance evidence-based understanding of how adaptive learning technologies could improve opportunities for low-income adults to learn and to complete postsecondary credentials’. It’s striking that the program’s aims referred to how such technologies could lead to learning gains, not whether they would. Now, though, with the publication of a report commissioned by the Gates Foundation to analyze the data coming out of the ALMAP Program, things are looking less rosy. The report is inconclusive. There is no firm evidence that adaptive learning systems are leading to better course grades or course completion. ‘The ultimate goal – better student outcomes at lower cost – remains elusive’, the report concludes. Rahim Rajan, a senior program office for Gates, is clear: ‘There is no magical silver bullet here.’

The same conclusion is being reached elsewhere. A report for the National Education Policy Center (in Boulder, Colorado) concludes: Personalized Instruction, in all its many forms, does not seem to be the transformational technology that is needed, however. After more than 30 years, Personalized Instruction is still producing incremental change. The outcomes of large-scale studies and meta-analyses, to the extent they tell us anything useful at all, show mixed results ranging from modest impacts to no impact. Additionally, one must remember that the modest impacts we see in these meta-analyses are coming from blended instruction, which raises the cost of education rather than reducing it (Enyedy, 2014: 15 -see reference at the foot of this post). In the same vein, a recent academic study by Meg Coffin Murray and Jorge Pérez (2015, ‘Informing and Performing: A Study Comparing Adaptive Learning to Traditional Learning’) found that ‘adaptive learning systems have negligible impact on learning outcomes’.

future-ready-learning-reimagining-the-role-of-technology-in-education-1-638In the latest educational technology plan from the U.S. Department of Education (‘Future Ready Learning: Reimagining the Role of Technology in Education’, 2016) the only mentions of the word ‘adaptive’ are in the context of testing. And the latest OECD report on ‘Students, Computers and Learning: Making the Connection’ (2015), finds, more generally, that information and communication technologies, when they are used in the classroom, have, at best, a mixed impact on student performance.

There is, however, too much money at stake for the earlier hype to disappear completely. Sponsored cheerleading for adaptive systems continues to find its way into blogs and national magazines and newspapers. EdSurge, for example, recently published a report called ‘Decoding Adaptive’ (2016), sponsored by Pearson, that continues to wave the flag. Enthusiastic anecdotes take the place of evidence, but, for all that, it’s a useful read.

In the world of ELT, there are plenty of sales people who want new products which they can call ‘adaptive’ (and gamified, too, please). But it’s striking that three years after I started following the hype, such products are rather thin on the ground. Pearson was the first of the big names in ELT to do a deal with Knewton, and invested heavily in the company. Their relationship remains close. But, to the best of my knowledge, the only truly adaptive ELT product that Pearson offers is the PTE test.

Macmillan signed a contract with Knewton in May 2013 ‘to provide personalized grammar and vocabulary lessons, exam reviews, and supplementary materials for each student’. In December of that year, they talked up their new ‘big tree online learning platform’: ‘Look out for the Big Tree logo over the coming year for more information as to how we are using our partnership with Knewton to move forward in the Language Learning division and create content that is tailored to students’ needs and reactive to their progress.’ I’ve been looking out, but it’s all gone rather quiet on the adaptive / platform front.

In September 2013, it was the turn of Cambridge to sign a deal with Knewton ‘to create personalized learning experiences in its industry-leading ELT digital products for students worldwide’. This year saw the launch of a major new CUP series, ‘Empower’. It has an online workbook with personalized extra practice, but there’s nothing (yet) that anyone would call adaptive. More recently, Cambridge has launched the online version of the 2nd edition of Touchstone. Nothing adaptive there, either.

Earlier this year, Cambridge published The Cambridge Guide to Blended Learning for Language Teaching, edited by Mike McCarthy. It contains a chapter by M.O.Z. San Pedro and R. Baker on ‘Adaptive Learning’. It’s an enthusiastic account of the potential of adaptive learning, but it doesn’t contain a single reference to language learning or ELT!

So, what’s going on? Skepticism is becoming the order of the day. The early hype of people like Knewton’s Jose Ferreira is now understood for what it was. Companies like Macmillan got their fingers badly burnt when they barked up the wrong tree with their ‘Big Tree’ platform.

Noel Enyedy captures a more contemporary understanding when he writes: Personalized Instruction is based on the metaphor of personal desktop computers—the technology of the 80s and 90s. Today’s technology is not just personal but mobile, social, and networked. The flexibility and social nature of how technology infuses other aspects of our lives is not captured by the model of Personalized Instruction, which focuses on the isolated individual’s personal path to a fixed end-point. To truly harness the power of modern technology, we need a new vision for educational technology (Enyedy, 2014: 16).

Adaptive solutions aren’t going away, but there is now a much better understanding of what sorts of problems might have adaptive solutions. Testing is certainly one. As the educational technology plan from the U.S. Department of Education (‘Future Ready Learning: Re-imagining the Role of Technology in Education’, 2016) puts it: Computer adaptive testing, which uses algorithms to adjust the difficulty of questions throughout an assessment on the basis of a student’s responses, has facilitated the ability of assessments to estimate accurately what students know and can do across the curriculum in a shorter testing session than would otherwise be necessary. In ELT, Pearson and EF have adaptive tests that have been well researched and designed.

Vocabulary apps which deploy adaptive technology continue to become more sophisticated, although empirical research is lacking. Automated writing tutors with adaptive corrective feedback are also developing fast, and I’ll be writing a post about these soon. Similarly, as speech recognition software improves, we can expect to see better and better automated adaptive pronunciation tutors. But going beyond such applications, there are bigger questions to ask, and answers to these will impact on whatever direction adaptive technologies take. Large platforms (LMSs), with or without adaptive software, are already beginning to look rather dated. Will they be replaced by integrated apps, or are apps themselves going to be replaced by bots (currently riding high in the Hype Cycle)? In language learning and teaching, the future of bots is likely to be shaped by developments in natural language processing (another topic about which I’ll be blogging soon). Nobody really has a clue where the next two and a half years will take us (if anywhere), but it’s becoming increasingly likely that adaptive learning will be only one very small part of it.

 

Enyedy, N. 2014. Personalized Instruction: New Interest, Old Rhetoric, Limited Results, and the Need for a New Direction for Computer-Mediated Learning. Boulder, CO: National Education Policy Center. Retrieved 17.07.16 from http://nepc.colorado.edu/publication/personalized-instruction