Archive for the ‘apps’ Category

Recent years have seen a proliferation of computer-assisted pronunciations trainers (CAPTs), both as stand-alone apps and as a part of broader language courses. The typical CAPT records the learner’s voice, compares this to a model of some kind, detects differences between the learner and the model, and suggests ways that the learner may more closely approximate to the model (Agarwal & Chakraborty, 2019). Most commonly, the focus is on individual phonemes, rather than, as in Richard Cauldwell’s ‘Cool Speech’ (2012), on the features of fluent natural speech (Rogerson-Revell, 2021).

The fact that CAPTs are increasingly available and attractive ‘does not of course ensure their pedagogic value or effectiveness’ … ‘many are technology-driven rather than pedagogy-led’ (Rogerson-Revell, 2021). Rogerson-Revell (2021) points to two common criticisms of CAPTs. Firstly, their pedagogic accuracy sometimes falls woefully short. He gives the example of a unit on intonation in one app, where users are told that ‘when asking questions in English, our voice goes up in pitch’ and ‘we lower the pitch of our voice at the end of questions’. Secondly, he observes that CAPTs often adopt a one-size-fits-all approach, despite the fact that we know that issues of pronunciation are extremely context-sensitive: ‘a set of learners in one context will need certain features that learners in another context do not’ (Levis, 2018: 239).

There are, in addition, technical challenges that are not easy to resolve. Many CAPTs rely on automatic speech recognition (ASR), which can be very accurate with some accents, but much less so with other accents (including many non-native-speaker accents) (Korzekwa et al., 2022). Anyone using a CAPT will experience instances of the software identifying pronunciation problems that are not problems, and failing to identify potentially more problematic issues (Agarwal & Chakraborty, 2019).

We should not, therefore, be too surprised if these apps don’t always work terribly well. Some apps, like the English File Pronunciation app, have been shown to be effective in helping the perception and production of certain phonemes by a very unrepresentative group of Spanish learners of English (Fouz-González, 2020), but this tells us next to nothing about the overall effectiveness of the app. Most CAPTs have not been independently reviewed, and, according to a recent meta-analysis of CAPTs (Mahdi & Al Khateeb, 2019), the small number of studies are ‘all of very low quality’. This, unfortunately, renders their meta-analysis useless.

Even if the studies in the meta-analysis had not been of very low quality, we would need to pause before digesting any findings about CAPTs’ effectiveness. Before anything else, we need to develop a good understanding of what they might be effective at. It’s here that we run headlong into the problem of native-speakerism (Holliday, 2006; Kiczkowiak, 2018).

The pronunciation model that CAPTs attempt to push learners towards is a native-speaker model. In the case of ELSA Speak, for example, this is a particular kind of American accent, although ‘British and other accents’ will apparently soon be added. Xavier Anguera, co-founder and CTO of ELSA Speak, in a fascinating interview with Paul Raine of TILTAL, happily describes his product as ‘an app that is for accent reduction’. Accent reduction is certainly a more accurate way of describing CAPTs than accent promotion.

Accent reduction, or the attempt to mimic an imagined native-speaker pronunciation, is now ‘rarely put forward by teachers or researchers as a worthwhile goal’ (Levis, 2018: 33) because it is only rarely achievable and, in many contexts, inappropriate. In addition, accent reduction cannot easily be separated from accent prejudice. Accent reduction courses and products ‘operate on the assumption that some accents are more legitimate than others’ (Ennser-Kananen, et al., 2021) and there is evidence that they can ‘reinscribe racial inequalities’ (Ramjattan, 2019). Accent reduction is quintessentially native-speakerist.

Rather than striving towards a native-speaker accentedness, there is a growing recognition among teachers, methodologists and researchers that intelligibility may be a more appropriate learning goal (Levis, 2018) than accentedness. It has been over 20 years since Jennifer Jenkins (2000) developed her Lingua Franca Core (LFC), a relatively short list of pronunciation features that she considered central to intelligibility in English as a Lingua Franca contexts (i.e. the majority of contexts in which English is used). Intelligibility as the guiding principle of pronunciation teaching continues to grow in influence, spurred on by the work of Walker (2010), Kiczkowiak & Lowe (2018), Patsko & Simpson (2019) and Hancock (2020), among others.

Unfortunately, intelligibility is a deceptively simple concept. What exactly it is, is ‘not an easy question to answer’ writes John Levis (2018) before attempting his own answer in the next 250 pages. As admirable as the LFC may be as an attempt to offer a digestible and actionable list of key pronunciation features, it ‘remains controversial in many of its recommendations. It lacks robust empirical support, assumes that all NNS contexts are similar, and does not take into account the importance of stigma associated with otherwise intelligible pronunciations’ (Levis, 2018: 47). Other attempts to list features of intelligibility fare no better in Levis’s view: they are ‘a mishmash of incomplete and contradictory recommendations’ (Levis, 2018: 49).

Intelligibility is also complex because of the relationship between intelligibility and comprehensibility, or the listener’s willingness to understand – their attitude or stance towards the speaker. Comprehensibility is a mediation concept (Ennser-Kananen, et al., 2021). It is a two-way street, and intelligibility-driven approaches need to take this into account (unlike the accent-reduction approach which places all the responsibility for comprehensibility on the shoulders of the othered speaker).

The problem of intelligibility becomes even more thorny when it comes to designing a pronunciation app. Intelligibility and comprehensibility cannot easily be measured (if at all!), and an app’s algorithms need a concrete numerically-represented benchmark towards which a user / learner can be nudged. Accentedness can be measured (even if the app has to reify a ‘native-speaker accent’ to do so). Intelligibility / Comprehensibility is simply not something, as Xavier Anguera acknowledges, that technology can deal with. In this sense, CAPTs cannot avoid being native-speakerist.

At this point, we might ride off indignantly into the sunset, but a couple of further observations are in order. First of all, accentedness and comprehensibility are not mutually exclusive categories. Anguera notes that intelligibility can be partly improved by reducing accentedness, and some of the research cited by Levis (2018) backs him up on this. But precisely how much and what kind of accent reduction improves intelligibility is not knowable, so the use of CAPTs is something of an optimistic stab in the dark. Like all stabs in the dark, there are dangers. Secondly, individual language learners may be forgiven for not wanting to wait for accent prejudice to become a thing of the past: if they feel that they will suffer less from prejudice by attempting here and now to reduce their ‘foreign’ accent, it is not for me, I think, to pass judgement. The trouble, of course, is that CAPTs contribute to the perpetuation of the prejudices.

There is, however, one area where the digital evaluation of accentedness is, I think, unambiguously unacceptable. According to Rogerson-Revell (2021), ‘Australia’s immigration department uses the Pearson Test of English (PTE) Academic as one of five tests. The PTE tests speaking ability using voice recognition technology and computer scoring of test-takers’ audio recordings. However, L1 English speakers and highly proficient L2 English speakers have failed the oral fluency section of the English test, and in some cases it appears that L1 speakers achieve much higher scores if they speak unnaturally slowly and carefully’. Human evaluations are not necessarily any better.

References

Agarwal, C. & Chakraborty, P. (2019) A review of tools and techniques for computer aided pronunciation training (CAPT) in English. Education and Information Technologies, 24: 3731–3743. https://doi.org/10.1007/s10639-019-09955-7

Cauldwell, R (2012) Cool Speech app. Available at: http://www.speechinaction.org/cool-speech-2

Fouz-González, J (2020) Using apps for pronunciation training: An empirical evaluation of the English File Pronunciation app. Language Learning & Technology, 24(1): 62–85.

Ennser-Kananen, J., Halonen, M. & Saarinen, T. (2021) “Come Join Us and Lose Your Accent!” Accent Modification Courses as Hierarchization of International Students. Journal of International Students 11 (2): 322 – 340

Holliday, A. (2006) Native-speakerism. ELT Journal, 60 (4): 385 – 387

Jenkins. J. (2000) The Phonology of English as a Lingua Franca. Oxford: Oxford University Press

Hancock, M. (2020) 50 Tips for Teaching Pronunciation. Cambridge: Cambridge University Press

Kiczkowiak, M. (2018) Native Speakerism in English Language Teaching: Voices From Poland. Doctoral dissertation.

Kiczkowiak, M & Lowe, R. J. (2018) Teaching English as a Lingua Franca. Stuttgart: DELTA Publishing

Korzekwa, D., Lorenzo-Trueba, J., Thomas Drugman, T. & Kostek, B. (2022) Computer-assisted pronunciation training—Speech synthesis is almost all you need. Speech Communication, 142: 22 – 33

Levis, J. M. (2018) Intelligibility, Oral Communication, and the Teaching of Pronunciation. Cambridge: Cambridge University Press

Mahdi, H. S. & Al Khateeb, A. A. (2019) The effectiveness of computer-assisted pronunciation training: A meta-analysis. Review of Education, 7 (3): 733 – 753

Patsko, L. & Simpson, K. (2019) How to Write Pronunciation Activities. ELT Teacher 2 Writer https://eltteacher2writer.co.uk/our-books/how-to-write-pronunciation-activities/

Ramjattan, V. A. (2019) Racializing the problem of and solution to foreign accent in business. Applied Linguistics Review, 13 (4). https://doi.org/10.1515/applirev2019-0058

Rogerson-Revell, P. M. (2021) Computer-Assisted Pronunciation Training (CAPT): Current Issues and Future Directions. RELC Journal, 52(1), 189–205. https://doi.org/10.1177/0033688220977406

Walker, R. (2010) Teaching the Pronunciation of English as a Lingua Franca. Oxford: Oxford University Press

There’s an aspect of language learning which everyone agrees is terribly important, but no one can quite agree on what to call it. I’m talking about combinations of words, including fixed expressions, collocations, phrasal verbs and idioms. These combinations are relatively fixed, cannot always be predicted from their elements or generated by grammar rules (Laufer, 2022). They are sometimes referred to as formulaic sequences, formulaic expressions, lexical bundles or lexical chunks, among other multiword items. They matter to English language learners because a large part of English consists of such combinations. Hill (2001) suggests this may be up to 70%. More conservative estimates report 58.6% of writing and 52.3% of speech (Erman & Warren, 2000). Some of these combinations (e.g. ‘of course’, ‘at least’) are so common that they fall into lists of the 1000 most frequent lexical items in the language.

By virtue of their ubiquity and frequency, they are important both for comprehension of reading and listening texts and for the speed at which texts can be processed. This is because knowledge of these combinations ‘makes discourse relatively predictable’ (Boers, 2020). Similarly, such knowledge can significantly contribute to spoken fluency because combinations ‘can be retrieved from memory as prefabricated units rather than being assembled at the time of speaking’ (Boer, 2020).

So far, so good, but from here on, the waters get a little muddier. Given their importance, what is the best way for a learner to acquire a decent stock of them? Are they best acquired through incidental learning (through meaning-focused reading and listening) or deliberate learning (e.g. with focused exercises of flashcards)? If the former, how on earth can we help learners to make sure that they get exposure to enough combinations enough times? If the latter, what kind of practice works best and, most importantly, which combinations should be selected? With, at the very least, many tens of thousands of such combinations, life is too short to learn them all in a deliberate fashion. Some sort of triage is necessary, but how should we go about this? Frequency of occurrence would be one obvious criterion, but this merely raises the question of what kind of database should be used to calculate frequency – the spoken discourse of children will reveal very different patterns from the written discourse of, say, applied linguists. On top of that, we cannot avoid consideration of the learners’ reasons for learning the language. If, as is statistically most probable, they are learning English to use as a lingua franca, how important or relevant is it to learn combinations that are frequent, idiomatic and comprehensible in native-speaker cultures, but may be rare and opaque in many English as a Lingua Franca contexts?

There are few, if any, answers to these big questions. Research (e.g. Pellicer-Sánchez, 2020) can give us pointers, but, the bottom line is that we are left with a series of semi-informed options (see O’Keeffe et al., 2007: 58 – 99). So, when an approach comes along that claims to use software to facilitate the learning of English formulaic expressions (Lin, 2022) I am intrigued, to say the least.

The program is, slightly misleadingly, called IdiomsTube (https://www.idiomstube.com). A more appropriate title would have been IdiomaticityTube (as it focuses on ‘speech formulae, proverbs, sayings, similes, binomials, collocations, and so on’), but I guess ‘idioms’ is a more idiomatic word than ‘idiomaticity’. IdiomsTube allows learners to choose any English-captioned video from YouTube, which is then automatically analysed to identify from two to six formulaic expressions that are presented to the learner as learning objects. Learners are shown these items; the items are hyperlinked to (good) dictionary entries; learners watch the video and are then presented with a small variety of practice tasks. The system recommends particular videos, based on an automated analysis of their difficulty (speech rate and a frequency count of the lexical items they include) and on recommendations from previous users. The system is gamified and, for class use, teachers can track learner progress.

When an article by the program’s developer, Phoebe Lin, (in my view, more of an advertising piece than an academic one) came out in the ReCALL journal, she tweeted that she’d love feedback. I reached out but didn’t hear back. My response here is partly an evaluation of Dr Lin’s program, partly a reflection on how far technology can go in solving some of the knotty problems of language learning.

Incidental and deliberate learning

Researchers have long been interested in looking for ways of making incidental learning of lexical items more likely to happen (Boers, 2021: 39 ff.), of making it more likely that learners will notice lexical items while focusing on the content of a text. Most obviously, texts can be selected, written or modified so they contain multiple instances of a particular item (‘input flooding’). Alternatively, texts can be typographically enhanced so that particular items are highlighted in some way. But these approaches are not possible when learners are given the freedom to select any video from YouTube and when the written presentations are in the form of YouTube captions. Instead, IdiomsTube presents the items before the learner watches the video. They are, in effect, told to watch out for these items in advance. They are also given practice tasks after viewing.

The distinction between incidental and deliberate vocabulary learning is not always crystal-clear. In this case, it seems fairly clear that the approach is more slanted to deliberate learning, even though the selection of video by the learner is determined by a focus on content. Whether this works or not will depend on (1) the level-appropriacy of the videos that the learner watches, (2) the effectiveness of the program in recommending / identifying appropriate videos, (3) the ability of the program to identify appropriate formulaic expressions as learning targets in each video, and (4) the ability of the program to generate appropriate practice of these items.

Evaluating the level of YouTube videos

What makes a video easy or hard to understand? IdiomsTube attempts this analytical task by calculating (1) the speed of the speech and (2) the difficulty of the lexis as determined by the corpus frequency of these items. This gives a score out of five for each category (speed and difficulty). I looked at fifteen videos, all of which were recommended by the program. Most of the ones I looked at were scored at Speed #3 and Difficulty #1. One that I looked at, ‘Bruno Mars Carpool Karaoke’, had a speed of #2 and a difficulty of #1 (i.e. one of the easiest). The video is 15 minutes long. Here’s an extract from the first 90 seconds:

Let’s set this party off right, put yo’ pinky rings up to the moon, twenty four karat magic in the air, head to toe soul player, second verse for the hustlas, gangstas, bad bitches and ya ugly ass friends, I gotta show how a pimp get it in, and they waking up the rocket why you mad

Whoa! Without going into details, it’s clear that something has gone seriously wrong. Evaluating the difficulty of language, especially spoken language, is extremely complex (not least because there’s no objective measure of such a thing). It’s not completely dissimilar to the challenge of evaluating the accuracy, appropriacy and level of sophistication of a learner’s spoken language, and we’re a long way from being able to do that with any acceptable level of reliability. At least, we’re a long, long way from being able to do it well when there are no constraints on the kind of text (which is the case when taking the whole of YouTube as a potential source). Especially if we significantly restrict topic and text type, we can train software to do a much better job. However, this will require human input: it cannot be automated without.

The length of these 15 videos ranged from 3.02 to 29.27 minutes, with the mean length being about 10 minutes, and the median 8.32 minutes. Too damn long.

Selecting appropriate learning items

The automatic identification of formulaic language in a text presents many challenges: it is, as O’Keeffe et al. (2007: 82) note, only partially possible. A starting point is usually a list, and IdiomsTube begins with a list of 53,635 items compiled by the developer (Lin, 2022) over a number of years. The software has to match word combinations in the text to items in the list, and has to recognise variant forms. Formulaic language cannot always be identified just by matching to lists of forms: a piece of cake may just be a piece of cake, and therefore not a piece of cake to analyse. 53,365 items may sound like a lot, but a common estimate of the number of idioms in English is 25,000. The number of multiword units is much, much higher. 53,365 is not going to be enough for any reliable capture.

Since any given text is likely to contain a lot of formulaic language, the next task is to decide how to select for presentation (i.e. as learning objects) from those identified. The challenge is, as Lin (2022) remarks, both technical and theoretical: how can frequency and learnability be measured? There are no easy answers, and the approach of IdiomsTube is, by its own admission, crude. The algorithm prioritises longer items that contain lower frequency single items, and which have a low frequency of occurrence in a corpus of 40,000 randomly-sampled YouTube videos. The aim is to focus on formulaic language that is ‘more challenging in terms of composition (i.e. longer and made up of more difficult words) and, therefore, may be easier to miss due to their infrequent appearance on YouTube’. My immediate reaction is to question whether this approach will not prioritise items that are not worth the bother of deliberate learning in the first place.

The proof is in the proverbial pudding, so I looked at the learning items that were offered by my sample of 15 recommended videos. Sadly, IdiomsTube does not even begin to cut the mustard. The rest of this section details why the selection was so unsatisfactory: you may want to skip this and rejoin me at the start of the next section.

  • In total 85 target items were suggested. Of these 39 (just under half) were not fixed expressions. They were single items. Some of these single items (e.g. ‘blog’ and ‘password’ would be extremely easy for most learners). Of the others, 5 were opaque idioms (the most prototypical kind of idiom), the rest were collocations and fixed (but transparent) phrases and frames.
  • Some items (e.g. ‘I rest my case’) are limited in terms of the contexts in which they can be appropriately used.
  • Some items did not appear to be idiomatic in any way. ‘We need to talk’ and ‘able to do it’, for example, are strange selections, compared to others in their respective lists. They are also very ‘easy’: if you don’t readily understand items like these, you wouldn’t have a hope in hell of understanding the video.
  • There were a number of errors in the recommended target items. Errors included duplication of items within one set (‘get in the way’ + ‘get in the way of something’), misreading of an item (‘the shortest’ misread as ‘the shorts’), mislabelling of an item (‘vend’ instead of ‘vending machine’), linking to the wrong dictionary entry (e.g. ‘mini’ links to ‘miniskirt’, although in the video ‘mini’ = ‘small’, or, in another video, ‘stoke’ links to ‘stoked’, which is rather different!).
  • The selection of fixed expressions is sometimes very odd. In one video, the following items have been selected: get into an argument, vend, from the ground up, shovel, we need to talk, prefecture. The video contains others which would seem to be better candidates, including ‘You can’t tell’ (which appears twice), ‘in charge of’, ‘way too’ (which also appears twice), and ‘by the way’. It would seem, therefore, that some inappropriate items are selected, whilst other more appropriate ones are omitted.
  • There is a wide variation in the kind of target item. One set, for example, included: in order to do, friction, upcoming, run out of steam, able to do it, notification. Cross-checking with Pearson’s Global Scale of English, we have items ranging from A2 to C2+.

The challenges of automation

IdiomsTube comes unstuck on many levels. It fails to recommend appropriate videos to watch. It fails to suggest appropriate language to learn. It fails to provide appropriate practice. You wouldn’t know this from reading the article by Phoebe Lin in the ReCALL journal which does, however, suggest that ‘further improvements in the design and functions of IdiomsTube are needed’. Necessary they certainly are, but the interesting question is how possible they are.

My interest in IdiomsTube comes from my own experience in an app project which attempted to do something not completely dissimilar. We wanted to be able to evaluate the idiomaticity of learner-generated language, and this entailed identifying formulaic patterns in a large corpus. We wanted to develop a recommendation engine for learning objects (i.e. the lexical items) by combining measures of frequency and learnability. We wanted to generate tasks to practise collocational patterns, by trawling the corpus for contexts that lent themselves to gapfills. With some of these challenges, we failed. With others, we found a stopgap solution in human curation, writing and editing.

IdiomsTube is interesting, not because of what it tells us about how technology can facilitate language learning. It’s interesting because it tells us about the limits of technological applications to learning, and about the importance of sorting out theoretical challenges before the technical ones. It’s interesting as a case study is how not to go about developing an app: its ‘special enhancement features such as gamification, idiom-of-the-day posts, the IdiomsTube Teacher’s interface and IdiomsTube Facebook and Instagram pages’ are pointless distractions when the key questions have not been resolved. It’s interesting as a case study of something that should not have been published in an academic journal. It’s interesting as a case study of how techno-enthusiasm can blind you to the possibility that some learning challenges do not have solutions that can be automated.

References

Boers, F. (2020) Factors affecting the learning of multiword items. In Webb, S. (Ed.) The Routledge Handbook of Vocabulary Studies. Abingdon: Routledge. pp. 143 – 157

Boers, F. (2021) Evaluating Second Language Vocabulary and Grammar Instruction. Abingdon: Routledge

Erman, B. & Warren, B. (2000) The idiom principle and the open choice principle. Text, 20 (1): pp. 29 – 62

Hill, J. (2001) Revising priorities: from grammatical failure to collocational success. In Lewis, M. (Ed.) Teaching Collocation: further development in the Lexical Approach. Hove: LTP. Pp.47- 69

Laufer, B. (2022) Formulaic sequences and second language learning. In Szudarski, P. & Barclay, S. (Eds.) Vocabulary Theory, Patterning and Teaching. Bristol: Multilingual Matters. pp. 89 – 98

Lin, P. (2022). Developing an intelligent tool for computer-assisted formulaic language learning from YouTube videos. ReCALL 34 (2): pp.185–200.

O’Keeffe, A., McCarthy, M. & Carter, R. (2007) From Corpus to Classroom. Cambridge: Cambridge University Press

Pellicer-Sánchez, A. (2020) Learning single words vs. multiword items. In Webb, S. (Ed.) The Routledge Handbook of Vocabulary Studies. Abingdon: Routledge. pp. 158 – 173

In the latest issue of ‘Language Teaching’, there’s a ‘state-of-the-art’ article by Frank Boers entitled ‘Glossing and vocabulary learning’. The effect of glosses (‘a brief definition or synonym, either in L1 or L2, which is provided with [a] text’ (Nation, 2013: 238)) on reading comprehension and vocabulary acquisition has been well researched over the years. See Kim et al. (2020) for just one recent meta-analysis.

It’s a subject I have written about before on this blog (see here), when I focussed on Plonsky ad Ziegler (2016), a critical evaluation of a number of CALL meta-analyses, including a few that investigated glosses. Plonsky and Ziegler found that glosses can have a positive effect on language learning, that digital glosses may be more valuable than paper-based ones, and that both L1 and L2 glosses can be beneficial (clearly, the quality / accuracy of the gloss is as important as the language it is written in). Different learners have different preferences. Boers’ article covers similar ground, without, I think, adding any new takeaways. It concludes with a predictable call for further research.

Boers has a short section on the ‘future of glossing’ in which he notes that (1) ‘onscreen reading [is] becoming the default mode’, and (2) that ‘materials developers no longer need to create glosses themselves, but can insert hyperlinks to online resources’. This is not the future, but the present. In my last blog post on glossing (August 2017), I discussed Lingro, a digital dictionary tool that you can have running in the background, allowing you to click on any word on any website and bring up L1 or L2 glosses. My reservation about Lingro was that the quality of the glosses left much to be desired, relying as they did on Wiktionary. Things would be rather different if it used decent content – sourced, for example, from Oxford dictionaries, Robert (for French) or Duden (for German).

And this is where the content for the Google Dictionary for Chrome extension comes from. It’s free, and takes only seconds to install. It allows you to double-click on a word to bring up translations or English definitions. One more click will take you to a more extensive dictionary page. It also allows you to select a phrase or longer passage and bring up translations generated by Google Translate. It allows you to keep track of the items you have looked up, and to download these on a spreadsheet, which can then be converted to flashcards (e.g. Quizlet) if you wish. If you use the Safari browser, a similar tool is already installed. It has similar features to the Google extension, but also offers you the possibility of linking to examples of the targeted word in web sources like Wikipedia.

Boers was thinking of the provision of hyperlinks, but with these browser extensions it is entirely up to the reader of a text to decide how many and which items to look up, what kind of items (single words, phrases or longer passages) they want to look up, how far they want to explore the information available to them, and what they want to do with the information (e.g. store / record it).

It’s extraordinary that a ‘state-of-the-art article’ in an extremely reputable journal should be so out of date. The value of glossing in language learning is in content-focussed reading, and these tools mean that any text on the web can be glossed. I think this means that further research of the kind that Boers means would be a waste of time and effort. The availability of free technology does not, of course, solve all our problems. Learners will continue to benefit from guidance, support and motivation in selecting appropriate texts to read. They will likely benefit from training in optimal ways of using these browser extensions. They may need help in finding a balance between content-focussed reading and content-focussed reading with a language learning payoff.

References

Boers, F. (2022). Glossing and vocabulary learning. Language Teaching, 55 (1), 1 – 23

Kim, H.S., Lee, J.H. & Lee, H. (2020). The relative effects of L1 and L2 glosses on L2 learning: A meta-analysis. Language Teaching Research. December 2020.

Nation, I.S.P. (2013). Learning Vocabulary in Another Language. Cambridge: Cambridge University Press

Plonsky, L. & Ziegler, N. (2016). The CALL–SLA interface: insights from a second-order synthesis. Language Learning & Technology 20 / 2: 17 – 37

NB This is an edited version of the original review.

Words & Monsters is a new vocabulary app that has caught my attention. There are three reasons for this. Firstly, because it’s free. Secondly, because I was led to believe (falsely, as it turns out) that two of the people behind it are Charles Browne and Brent Culligan, eminently respectable linguists, who were also behind the development of the New General Service List (NGSL), based on data from the Cambridge English Corpus. And thirdly, because a lot of thought, effort and investment have clearly gone into the gamification of Words & Monsters (WAM). It’s to the last of these that I’ll turn my attention first.

WAM teaches vocabulary in the context of a battle between a player’s avatar and a variety of monsters. If users can correctly match a set of target items to definitions or translations in the available time, they ‘defeat’ the monster and accumulate points. The more points you have, the higher you advance through a series of levels and ranks. There are bonuses for meeting daily and weekly goals, there are leaderboards, and trophies and medals can be won. In addition to points, players also win ‘crystals’ after successful battles, and these crystals can be used to buy accessories which change the appearance of the avatar and give the player added ‘powers’. I was never able to fully understand precisely how these ‘powers’ affected the number of points I could win in battle. It remained as baffling to me as the whole system of values with Pokemon cards, which is presumably a large part of the inspiration here. Perhaps others, more used to games like Pokemon, would find it all much more transparent.

The system of rewards is all rather complicated, but perhaps this doesn’t matter too much. In fact, it might be the case that working out how reward systems work is part of what motivates people to play games. But there is another aspect to this: the app’s developers refer in their bumf to research by Howard-Jones and Jay (2016), which suggests that when rewards are uncertain, more dopamine is released in the mid-brain and this may lead to reinforcement of learning, and, possibly, enhancement of declarative memory function. Possibly … but Howard-Jones and Jay point out that ‘the science required to inform the manipulation of reward schedules for educational benefit is very incomplete.’ So, WAM’s developers may be jumping the gun a little and overstating the applicability of the neuroscientific research, but they’re not alone in that!

If you don’t understand a reward system, it’s certain that the rewards are uncertain. But WAM takes this further in at least two ways. Firstly, when you win a ‘battle’, you have to click on a plain treasure bag to collect your crystals, and you don’t know whether you’ll get one, two, three, or zero, crystals. You are given a semblance of agency, but, essentially, the whole thing is random. Secondly, when you want to convert your crystals into accessories for your avatar, random selection determines which accessory you receive, even though, again, there is a semblance of agency. Different accessories have different power values. This extended use of what the developers call ‘the thrill of uncertain rewards’ is certainly interesting, but how effective it is is another matter. My own reaction, after quite some time spent ‘studying’, to getting no crystals or an avatar accessory that I didn’t want was primarily frustration, rather than motivation to carry on. I have no idea how typical my reaction (more ‘treadmill’ than ‘thrill’) might be.

Unsurprisingly, for an app that has so obviously thought carefully about gamification, players are encouraged to interact with each other. As part of the early promotion, WAM is running, from 15 November to 19 December, a free ‘team challenge tournament’, allowing teams of up to 8 players to compete against each other. Ingeniously, it would appear to allow teams and players of varying levels of English to play together, with the app’s algorithms determining each individual’s level of lexical knowledge and therefore the items that will be presented / tested. Social interaction is known to be an important component of successful games (Dehghanzadeh et al., 2019), but for vocabulary apps there’s a huge challenge. In order to learn vocabulary from an app, learners need to put in time – on a regular basis. Team challenge tournaments may help with initial on-boarding of players, but, in the end, learning from a vocabulary app is inevitably and largely a solitary pursuit. Over time, social interaction is unlikely to be maintained, and it is, in any case, of a very limited nature. The other features of successful games – playful freedom and intrinsically motivating tasks (Driver, 2012) – are also absent from vocabulary apps. Playful freedom is mostly incompatible with points, badges and leaderboards. And flashcard tasks, however intrinsically motivating they may be at the outset, will always become repetitive after a while. In the end, what’s left, for those users who hang around long enough, is the reward system.

It’s also worth noting that this free challenge is of limited duration: it is a marketing device attempting to push you towards the non-free use of the app, once the initial promotion is over.

Gamified motivation tools are only of value, of course, if they motivate learners to spend their time doing things that are of clear learning value. To evaluate the learning potential of WAM, then, we need to look at the content (the ‘learning objects’) and the learning tasks that supposedly lead to acquisition of these items.

When you first use WAM, you need to play for about 20 minutes, at which point algorithms determine ‘how many words [you] know and [you can] see scores for English tests such as; TOEFL, TOEIC, IELTS, EIKEN, Kyotsu Shiken, CEFR, SAT and GRE’. The developers claim that these scores correlate pretty highly with actual test scores: ‘they are about as accurate as the tests themselves’, they say. If Browne and Culligan had been behind the app, I would have been tempted to accept the claim – with reservations: after all, it still allows for one item out of 5 to be wrongly identified. But, what is this CEFR test score that is referred to? There is no CEFR test, although many tests are correlated with CEFR. The two tools that I am most familiar with which allocate CEFR levels to individual words – Cambridge’s English Vocabulary Profile and Pearson’s Global Scale of English – often conflict in their results. I suspect that ‘CEFR’ was just thrown into the list of tests as an attempt to broaden the app’s appeal.

English target words are presented and practised with their translation ‘equivalents’ in Japanese. For the moment, Japanese is the only language available, which means the app is of little use to learners who don’t know any Japanese. It’s now well-known that bilingual pairings are more effective in deliberate language learning than using definitions in the same language as the target items. This becomes immediately apparent when, for example, a word like ‘something’ is defined (by WAM) as ‘a thing not known or specified’ and ‘anything’ as ‘a thing of whatever kind’. But although I’m in no position to judge the Japanese translations, there are reasons why I would want to check the spreadsheet before recommending the app. ‘Lady’ is defined as ‘polite word for a woman’; ‘missus’ is defined as ‘wife’; and ‘aye’ is defined as ‘yes’. All of these definitions are, at best, problematic; at worst, they are misleading. Are the Japanese translations more helpful? I wonder … Perhaps these are simply words that do not lend themselves to flashcard treatment?

Because I tested in to the app at C1 level, I was not able to evaluate the selection of words at lower levels. A pity. Instead, I was presented with words like ‘ablution’, ‘abrade’, ‘anode’, and ‘auspice’. The app claims to be suitable ‘for both second-language learners and native speakers’. For lower levels of the former, this may be true (but without looking at the lexical spreadsheets, I can’t tell). But for higher levels, however much fun this may be for some people, it seems unlikely that you’ll learn very much of any value. Outside of words in, say, the top 8000 frequency band, it is practically impossible to differentiate the ‘surrender value’ of words in any meaningful way. Deliberate learning of vocabulary only makes sense with high frequency words that you have a chance of encountering elsewhere. You’d be better off reading, extensively, rather than learning random words from an app. Words, which (for reasons I’ll come on to) you probably won’t actually learn anyway.

With very few exceptions, the learning objects in WAM are single words, rather than phrases, even when the item is of little or no value outside its use in a phrase. ‘Betide’ is defined as ‘to happen to; befall’ but this doesn’t tell a learner much that is useful. It’s practically only ever used following ‘woe’ (but what does ‘woe’ mean?!). Learning items can be checked in the ‘study guide’, which will show that ‘betide’ typically follows ‘woe’, but unless you choose to refer to the study guide (and there’s no reason, in a case like this, that you would know that you need to check things out more fully), you’ll be none the wiser. In other words, checking the study guide is unlikely to betide you. ‘Wee’, as another example, is treated as two items: (1) meaning ‘very small’ as in ‘wee baby’, and (2) meaning ‘very early in the morning’ as in ‘in the wee hours’. For the latter, ‘wee’ can only collocate with ‘in the’ and ‘hours’, so it makes little sense to present it as a single word. This is also an example of how, in some cases, different meanings of particular words are treated as separate learning objects, even when the two meanings are very close and, in my view, are hardly worth learning separately. Examples include ‘czar’ and ‘assonance’. Sometimes, cognates are treated as separate learning objects (e.g. ‘adulterate’ and ‘adulteration’ or ‘dolor’ and ‘dolorous’); with other words (e.g. ‘effulgence’), only one grammatical form appears to be given. I could not begin to figure out any rationale behind any of this.

All in all, then, there are reasons to be a little skeptical about some of the content. Up to level B2 – which, in my view, is the highest level at which it makes sense to use vocabulary flashcards – it may be of value, so long as your first language is Japanese. But given the claim that it can help you prepare for the ‘CEFR test’, I have to wonder …

The learning tasks require players to match target items to translations / definitions (in both directions), with the target item sometimes in written form, sometimes spoken. Users do not, as far as I can tell, ever have to produce the target item: they only have to select. The learning relies on spaced repetition, but there is no generative effect (known to enhance memorisation). When I was experimenting, there were a few words that I did not know, but I was usually able to get the correct answer by eliminating the distractors (a choice of one from three gives players a reasonable chance of guessing correctly). WAM does not teach users how to produce words; its focus is on receptive knowledge (of a limited kind). I learn, for example, what a word like ‘aye’ or ‘missus’ kind of means, but I learn nothing about how to use it appropriately. Contrary to the claims in WAM’s bumf (that ‘all senses and dimensions of each word are fully acquired’), reading and listening comprehension speeds may be improved, but appropriate and accurate use of these words in speaking and writing is much less likely to follow. Does WAM really ‘strengthen and expand the foundation levels of cognition that support all higher level thinking’, as is claimed?

Perhaps it’s unfair to mention some of the more dubious claims of WAM’s promotional material, but here is a small selection, anyway: ‘WAM unleashes the full potential of natural motivation’. ‘WAM promotes Flow by carefully managing the ratio of unknown words. Your mind moves freely in the channel below frustration and above boredom’.

WAM is certainly an interesting project, but, like all the vocabulary apps I have ever looked at, there have to be trade-offs between optimal task design and what will fit on a mobile screen, between freedoms and flexibility for the user and the requirements of gamified points systems, between the amount of linguistic information that is desirable and the amount that spaced repetition can deal with, between attempting to make the app suitable for the greatest number of potential users and making it especially appropriate for particular kinds of users. Design considerations are always a mix of the pedagogical and the practical / commercial. And, of course, the financial. And, like most edtech products, the claims for its efficacy need to be treated with a bucket of salt.

References

Dehghanzadeh, H., Fardanesh, H., Hatami, J., Talaee, E. & Noroozi, O. (2019) Using gamification to support learning English as a second language: a systematic review, Computer Assisted Language Learning, DOI: 10.1080/09588221.2019.1648298

Driver, P. (2012) The Irony of Gamification. In English Digital Magazine 3, British Council Portugal, pp. 21 – 24 http://digitaldebris.info/digital-debris/2011/12/31/the-irony-of-gamification-written-for-ied-magazine.html

Howard-Jones, P. & Jay, T. (2016) Reward, learning and games. Current Opinion in Behavioral Sciences, 10: 65 – 72

Take the Cambridge Assessment English website, for example. When you connect to the site, you will see, at the bottom of the screen, a familiar (to people in Europe, at least) notification about the site’s use of cookies: the cookies consent.

You probably trust the site, so ignore the notification and quickly move on to find the resource you are looking for. But if you did click on hyperlinked ‘set cookies’, what would you find? The first link takes you to the ‘Cookie policy’ where you will be told that ‘We use cookies principally because we want to make our websites and mobile applications user-friendly, and we are interested in anonymous user behaviour. Generally our cookies don’t store sensitive or personally identifiable information such as your name and address or credit card details’. Scroll down, and you will find out more about the kind of cookies that are used. Besides the cookies that are necessary to the functioning of the site, you will see that there are also ‘third party cookies’. These are explained as follows: ‘Cambridge Assessment works with third parties who serve advertisements or present offers on our behalf and personalise the content that you see. Cookies may be used by those third parties to build a profile of your interests and show you relevant adverts on other sites. They do not store personal information directly but use a unique identifier in your browser or internet device. If you do not allow these cookies, you will experience less targeted content’.

This is not factually inaccurate: personal information is not stored directly. However, it is extremely easy for this information to be triangulated with other information to identify you personally. In addition to the data that you generate by having cookies on your device, Cambridge Assessment will also directly collect data about you. Depending on your interactions with Cambridge Assessment, this will include ‘your name, date of birth, gender, contact data including your home/work postal address, email address and phone number, transaction data including your credit card number when you make a payment to us, technical data including internet protocol (IP) address, login data, browser type and technology used to access this website’. They say they may share this data ‘with other people and/or businesses who provide services on our behalf or at our request’ and ‘with social media platforms, including but not limited to Facebook, Google, Google Analytics, LinkedIn, in pseudonymised or anonymised forms’.

In short, Cambridge Assessment may hold a huge amount of data about you and they can, basically, do what they like with it.

The cookie and privacy policies are fairly standard, as is the lack of transparency in the phrasing of them. Rather more transparency would include, for example, information about which particular ad trackers you are giving your consent to. This information can be found with a browser extension tool like Ghostery, and these trackers can be blocked. As you’ll see below, there are 5 ad trackers on this site. This is rather more than other sites that English language teachers are likely to go to. ETS-TOEFL has 4, Macmillan English and Pearson have 3, CUP ELT and the British Council Teaching English have 1, OUP ELT, IATEFL, BBC Learning English and Trinity College have none. I could only find TESOL, with 6 ad trackers, which has more. The blogs for all these organisations invariably have more trackers than their websites.

The use of numerous ad trackers is probably a reflection of the importance that Cambridge Assessment gives to social media marketing. There is a research paper, produced by Cambridge Assessment, which outlines the significance of big data and social media analytics. They have far more Facebook followers (and nearly 6 million likes) than any other ELT page, and they are proud of their #1 ranking in the education category of social media. The amount of data that can be collected here is enormous and it can be analysed in myriad ways using tools like Ubervu, Yomego and Hootsuite.

A little more transparency, however, would not go amiss. According to a report in Vox, Apple has announced that some time next year ‘iPhone users will start seeing a new question when they use many of the apps on their devices: Do they want the app to follow them around the internet, tracking their behavior?’ Obviously, Google and Facebook are none too pleased about this and will be fighting back. The implications for ad trackers and online advertising, more generally, are potentially huge. I wrote to Cambridge Assessment about this and was pleased to hear that ‘Cambridge Assessment are currently reviewing the process by which we obtain users consent for the use of cookies with the intention of moving to a much more transparent model in the future’. Let’s hope that other ELT organisations are doing the same.

You may be less bothered than I am by the thought of dozens of ad trackers following you around the net so that you can be served with more personalized ads. But the digital profile about you, to which these cookies contribute, may include information about your ethnicity, disabilities and sexual orientation. This profile is auctioned to advertisers when you visit some sites, allowing them to show you ‘personalized’ adverts based on the categories in your digital profile. Contrary to EU regulations, these categories may include whether you have cancer, a substance-abuse problem, your politics and religion (as reported in Fortune https://fortune.com/2019/01/28/google-iab-sensitive-profiles/ ).

But it’s not these cookies that are the most worrying aspect about our lack of digital privacy. It’s the sheer quantity of personal data that is stored about us. Every time we ask our students to use an app or a platform, we are asking them to divulge huge amounts of data. With ClassDojo, for example, this includes names, usernames, passwords, age, addresses, photographs, videos, documents, drawings, or audio files, IP addresses and browser details, clicks, referring URL’s, time spent on site, and page views (Manolev et al., 2019; see also Williamson, 2019).

It is now widely recognized that the ‘consent’ that is obtained through cookie policies and other end-user agreements is largely spurious. These consent agreements, as Sadowski (2019) observes, are non-negotiated, and non-negotiable; you either agree or you are denied access. What’s more, he adds, citing one study, it would take 76 days, working for 8 hours a day, to read the privacy policies a person typically encounters in a year. As a result, most of us choose not to choose when we accept online services (Cobo, 2019: 25). We have little, if any, control over how the data that is collected is used (Birch et al., 2020). More importantly, perhaps, when we ask our students to sign up to an educational app, we are asking / telling them to give away their personal data, not just ours. They are unlikely to fully understand the consequences of doing so.

The extent of this ignorance is also now widely recognized. In the UK, for example, two reports (cited by Sander, 2020) indicate that ‘only a third of people know that data they have not actively chosen to share has been collected’ (Doteveryone, 2018: 5), and that ‘less than half of British adult internet users are aware that apps collect their location and information on their personal preferences’ (Ofcom, 2019: 14).

The main problem with this has been expressed by programmer and activist, Richard Stallman, in an interview with New York magazine (Kulwin, 2018): Companies are collecting data about people. The data that is collected will be abused. That’s not an absolute certainty, but it’s a practical, extreme likelihood, which is enough to make collection a problem.

The abuse that Smallman is referring to can come in a variety of forms. At the relatively trivial end is the personalized advertising. Much more serious is the way that data aggregation companies will scrape data from a variety of sources, building up individual data profiles which can be used to make significant life-impacting decisions, such as final academic grades or whether one is offered a job, insurance or credit (Manolev et al., 2019). Cathy O’Neil’s (2016) best-selling ‘Weapons of Math Destruction’ spells out in detail how this abuse of data increases racial, gender and class inequalities. And after the revelations of Edward Snowden, we all know about the routine collection by states of huge amounts of data about, well, everyone. Whether it’s used for predictive policing or straightforward repression or something else, it is simply not possible for younger people, our students, to know what personal data they may regret divulging at a later date.

Digital educational providers may try to reassure us that they will keep data private, and not use it for advertising purposes, but the reassurances are hollow. These companies may change their terms and conditions further down the line, and examples exist of when this has happened (Moore, 2018: 210). But even if this does not happen, the data can never be secure. Illegal data breaches and cyber attacks are relentless, and education ranked worst at cybersecurity out of 17 major industries in one recent analysis (Foresman, 2018). One report suggests that one in five US schools and colleges have fallen victim to cyber-crime. Two weeks ago, I learnt (by chance, as I happened to be looking at my security settings on Chrome) that my passwords for Quizlet, Future Learn, Elsevier and Science Direct had been compromised by a data breach. To get a better understanding of the scale of data breaches, you might like to look at the UK’s IT Governance site, which lists detected and publicly disclosed data breaches and cyber attacks each month (36.6 million records breached in August 2020). If you scroll through the list, you’ll see how many of them are educational sites. You’ll also see a comment about how leaky organisations have been throughout lockdown … because they weren’t prepared for the sudden shift online.

Recent years have seen a growing consensus that ‘it is crucial for language teaching to […] encompass the digital literacies which are increasingly central to learners’ […] lives’ (Dudeney et al., 2013). Most of the focus has been on the skills that are needed to use digital media. There also appears to be growing interest in developing critical thinking skills in the context of digital media (e.g. Peachey, 2016) – identifying fake news and so on. To a much lesser extent, there has been some focus on ‘issues of digital identity, responsibility, safety and ethics when students use these technologies’ (Mavridi, 2020a: 172). Mavridi (2020b: 91) also briefly discusses the personal risks of digital footprints, but she does not have the space to explore more fully the notion of critical data literacy. This literacy involves an understanding of not just the personal risks of using ‘free’ educational apps and platforms, but of why they are ‘free’ in the first place. Sander (2020b) suggests that this literacy entails ‘an understanding of datafication, recognizing the risks and benefits of the growing prevalence of data collection, analytics, automation, and predictive systems, as well as being able to critically reflect upon these developments. This includes, but goes beyond the skills of, for example, changing one’s social media settings, and rather constitutes an altered view on the pervasive, structural, and systemic levels of changing big data systems in our datafied societies’.

In my next two posts, I will, first of all, explore in more detail the idea of critical data literacy, before suggesting a range of classroom resources.

(I posted about privacy in March 2014, when I looked at the connections between big data and personalized / adaptive learning. In another post, September 2014, I looked at the claims of the CEO of Knewton who bragged that his company had five orders of magnitude more data about you than Google has. … We literally have more data about our students than any company has about anybody else about anything, and it’s not even close. You might find both of these posts interesting.)

References

Birch, K., Chiappetta, M. & Artyushina, A. (2020). ‘The problem of innovation in technoscientific capitalism: data rentiership and the policy implications of turning personal digital data into a private asset’ Policy Studies, 41:5, 468-487, DOI: 10.1080/01442872.2020.1748264

Cobo, C. (2019). I Accept the Terms and Conditions. https://adaptivelearninginelt.files.wordpress.com/2020/01/41acf-cd84b5_7a6e74f4592c460b8f34d1f69f2d5068.pdf

Doteveryone. (2018). People, Power and Technology: The 2018 Digital Attitudes Report. https://attitudes.doteveryone.org.uk

Dudeney, G., Hockly, N. & Pegrum, M. (2013). Digital Literacies. Harlow: Pearson Education

Foresman, B. (2018). Education ranked worst at cybersecurity out of 17 major industries. Edscoop, December 17, 2018. https://edscoop.com/education-ranked-worst-at-cybersecurity-out-of-17-major-industries/

Kulwin, K. (2018). F*ck Them. We Need a Law’: A Legendary Programmer Takes on Silicon Valley, New York Intelligencer, 2018, https://nymag.com/intelligencer/2018/04/richard-stallman-rms-on-privacy-data-and-free-software.html

Manolev, J., Sullivan, A. & Slee, R. (2019). ‘Vast amounts of data about our children are being harvested and stored via apps used by schools’ EduReseach Matters, February 18, 2019. https://www.aare.edu.au/blog/?p=3712

Mavridi, S. (2020a). Fostering Students’ Digital Responsibility, Ethics and Safety Skills (Dress). In Mavridi, S. & Saumell, V. (Eds.) Digital Innovations and Research in Language Learning. Faversham, Kent: IATEFL. pp. 170 – 196

Mavridi, S. (2020b). Digital literacies and the new digital divide. In Mavridi, S. & Xerri, D. (Eds.) English for 21st Century Skills. Newbury, Berks.: Express Publishing. pp. 90 – 98

Moore, M. (2018). Democracy Hacked. London: Oneworld

Ofcom. (2019). Adults: Media use and attitudes report [Report]. https://www.ofcom.org.uk/__data/assets/pdf_file/0021/149124/adults-media-use-and-attitudes-report.pdf

O’Neil, C. (2016). Weapons of Math Destruction. London: Allen Lane

Peachey, N. (2016). Thinking Critically through Digital Media. http://peacheypublications.com/

Sadowski, J. (2019). ‘When data is capital: Datafication, accumulation, and extraction’ Big Data and Society 6 (1) https://doi.org/10.1177%2F2053951718820549

Sander, I. (2020a). What is critical big data literacy and how can it be implemented? Internet Policy Review, 9 (2). DOI: 10.14763/2020.2.1479 https://www.econstor.eu/bitstream/10419/218936/1/2020-2-1479.pdf

Sander, I. (2020b). Critical big data literacy tools—Engaging citizens and promoting empowered internet usage. Data & Policy, 2: e5 doi:10.1017/dap.2020.5

Williamson, B. (2019). ‘Killer Apps for the Classroom? Developing Critical Perspectives on ClassDojo and the ‘Ed-tech’ Industry’ Journal of Professional Learning, 2019 (Semester 2) https://cpl.asn.au/journal/semester-2-2019/killer-apps-for-the-classroom-developing-critical-perspectives-on-classdojo

Vocab Victor is a very curious vocab app. It’s not a flashcard system, designed to extend vocabulary breadth. Rather it tests the depth of a user’s vocabulary knowledge.

The app’s website refers to the work of Paul Meara (see, for example, Meara, P. 2009. Connected Words. Amsterdam: John Benjamins). Meara explored the ways in which an analysis of the words that we associate with other words can shed light on the organisation of our mental lexicon. Described as ‘gigantic multidimensional cobwebs’ (Aitchison, J. 1987. Words in the Mind. Oxford: Blackwell, p.86), our mental lexicons do not appear to store lexical items in individual slots, but rather they are distributed across networks of associations.

The size of the web (i.e. the number of words, or the level of vocabulary breadth) is important, but equally important is the strength of the connections within the web (or vocabulary depth), as this determines the robustness of vocabulary knowledge. These connections or associations are between different words and concepts and experiences, and they are developed by repeated, meaningful, contextualised exposure to a word. In other words, the connections are firmed up through extensive opportunities to use language.

In word association research, a person is given a prompt word and asked to say the first other word that comes to their mind. For an entertaining example of this process at work, you might enjoy this clip from the comedy show ‘Help’. The research has implications for a wide range of questions, not least second language acquisition. For example, given a particular prompt, native speakers produce a relatively small number of associative responses, and these are reasonably predictable. Learners, on the other hand, typically produce a much greater variety of responses (which might seem surprising, given that they have a smaller vocabulary store to select from).

One way of classifying the different kinds of response is to divide them into two categories: syntagmatic (words that are discoursally connected to the prompt, such as collocations) and paradigmatic (words that are semantically close to the prompt and are the same part of speech). Linguists have noted that learners (both L1 children and L2 learners) show a shift from predominantly syntagmatic responses to more paradigmatic responses as their mental lexicon develops.

The developers of Vocab Victor have set out to build ‘more and stronger associations for the words your students already know, and teaches new words by associating them with existing, known words, helping students acquire native-like word networks. Furthermore, Victor teaches different types of knowledge, including synonyms, “type-of” relationships, collocations, derivations, multiple meanings and form-focused knowledge’. Since we know how important vocabulary depth is, this seems like a pretty sensible learning target.

The app attempts to develop this breadth in two main ways (see below). The ‘core game’ is called ‘Word Strike’ where learners have to pick the word on the arrow which most closely matches the word on the target. The second is called ‘Word Drop’ where a bird holds a word card and the user has to decide if it relates more to one of two other words below. Significantly, they carry out these tasks before any kind of association between form and meaning has been established. The meaning of unknown items can be checked in a monolingual dictionary later. There are a couple of other, less important games that I won’t describe now. The graphics are attractive, if a little juvenile. The whole thing is gamified with levels, leaderboards and so on. It’s free and, presumably, still under development.

Word strike backsideBird drop certain

The app claims to be for ‘English language learners of all ages [to] develop a more native-like vocabulary’. It also says that it is appropriate for ‘native speaking primary students [to] build and strengthen vocabulary for better test performance and stronger reading skills’, as well as ‘secondary students [to] prepare for the PSAT and SAT’. It was the scope of these claims that first set my alarm bells ringing. How could one app be appropriate for such diverse users? (Spoiler: it can’t, and attempts to make an edtech product suitable for everyone inevitably end up with a product that is suitable for no one.)

Rich, associative lexical networks are the result of successful vocabulary acquisition, but neither Paul Meara nor anyone else in the word association field has, to the best of my knowledge, ever suggested that deliberate study is the way to develop the networks. It is uncontentious to say that vocabulary depth (as shown by associative networks) is best developed through extensive exposure to input – reading and listening.

It is also reasonably uncontentious to say that deliberate study of vocabulary pays greatest dividends in developing vocabulary breadth (not depth), especially at lower levels, with a focus on the top three to eight thousand words in terms of frequency. It may also be useful at higher levels when a learner needs to acquire a limited number of new words for a particular purpose. An example of this would be someone who is going to study in an EMI context and would benefit from rapid learning of the words of the Academic Word List.

The Vocab Victor website says that the app ‘is uniquely focused on intermediate-level vocabulary. The app helps get students beyond this plateau by selecting intermediate-level vocabulary words for your students’. At B1 and B2 levels, learners typically know words that fall between #2500 and #3750 in the frequency tables. At level C2, they know most of the most frequent 5000 items. The less frequent a word is, the less point there is in studying it deliberately.

For deliberate study of vocabulary to serve any useful function, the target language needs to be carefully selected, with a focus on high-frequency items. It makes little sense to study words that will already be very familiar. And it makes no sense to deliberately study apparently random words that are so infrequent (i.e. outside the top 10,000) that it is unlikely they will be encountered again before the deliberate study has been forgotten. Take a look at the examples below and judge for yourself how well chosen the items are.

Year etcsmashed etc

Vocab Victor appears to focus primarily on semantic fields, as in the example above with ‘smashed’ as a key word. ‘Smashed’, ‘fractured’, ‘shattered’ and ‘cracked’ are all very close in meaning. In order to disambiguate them, it would help learners to see which nouns typically collocate with these words. But they don’t get this with the app – all they get are English-language definitions from Merriam-Webster. What this means is that learners are (1) unlikely to develop a sufficient understanding of target items to allow them to incorporate them into their productive lexicon, and (2) likely to get completely confused with a huge number of similar, low-frequency words (that weren’t really appropriate for deliberate study in the first place). What’s more, lexical sets of this kind may not be a terribly good idea, anyway (see my blog post on the topic).

Vocab Victor takes words, as opposed to lexical items, as the target learning objects. Users may be tested on the associations of any of the meanings of polysemantic items. In the example below (not perhaps the most appropriate choice for primary students!), there are two main meanings, but with other items, things get decidedly more complex (see the example with ‘toss’). Learners are also asked to do the associative tasks ‘Word Strike’ and ‘Word Drop’ before they have had a chance to check the possible meanings of either the prompt item or the associative options.

Stripper definitionStripper taskToss definition

How anyone could learn from any of this is quite beyond me. I often struggled to choose the correct answer myself; there were also a small number of items whose meaning I wasn’t sure of. I could see no clear way in which items were being recycled (there’s no spaced repetition here). The website claims that ‘adaptating [sic] to your student’s level happens automatically from the very first game’, but I could not see this happening. In fact, it’s very hard to adapt target item selection to an individual learner, since right / wrong or multiple choice answers tell us so little. Does a correct answer tell us that someone knows an item or just that they made a lucky guess? Does an incorrect answer tell us that an item is unknown or just that, under game pressure, someone tapped the wrong button? And how do you evaluate a learner’s lexical level (as a starting point),  even with very rough approximation,  without testing knowledge of at least thirty items first? All in all, then, a very curious app.

One of the most powerful associative responses to a word (especially with younger learners) is what is called a ‘klang’ response: another word which rhymes with or sounds like the prompt word. So, if someone says the word ‘app’ to you, what’s the first klang response that comes to mind?

Online teaching is big business. Very big business. Online language teaching is a significant part of it, expected to be worth over $5 billion by 2025. Within this market, the biggest demand is for English and the lion’s share of the demand comes from individual learners. And a sizable number of them are Chinese kids.

There are a number of service providers, and the competition between them is hot. To give you an idea of the scale of this business, here are a few details taken from a report in USA Today. VIPKid, is valued at over $3 billion, attracts celebrity investors, and has around 70,000 tutors who live in the US and Canada. 51Talk has 14,800 English teachers from a variety of English-speaking countries. BlingABC gets over 1,000 American applicants a month for its online tutoring jobs. There are many, many others.

Demand for English teachers in China is huge. The Pie News, citing a Chinese state media announcement, reported in September of last year that there were approximately 400,000 foreign citizens working in China as English language teachers, two-thirds of whom were working illegally. Recruitment problems, exacerbated by quotas and more stringent official requirements for qualifications, along with a very restricted desired teacher profile (white, native-speakers from a few countries like the US and the UK), have led more providers to look towards online solutions. Eric Yang, founder of the Shanghai-based iTutorGroup, which operates under a number of different brands and claims to be the ‘largest English-language learning institution in the world’, said that he had been expecting online tutoring to surpass F2F classes within a few years. With coronavirus, he now thinks it will come ‘much earlier’.

Typically, the work does not require much, if anything, in the way of training (besides familiarity with the platform), although a 40-hour TEFL course is usually preferred. Teachers deliver pre-packaged lessons. According to the USA Today report, Chinese students pay between $49 and $80 dollars an hour for the classes.

It’s a highly profitable business and the biggest cost to the platform providers is the rates they pay the tutors. If you google “Teaching TEFL jobs online”, you’ll quickly find claims that teachers can earn $40 / hour and up. Such claims are invariably found on the sites of recruitment agencies, who are competing for attention. However, although it’s possible that a small number of people might make this kind of money, the reality is that most will get nowhere near it. Scroll down the pages a little and you’ll discover that a more generally quoted and accepted figure is between $14 and $20 / hour. These tutors are, of course, freelancers, so the wages are before tax, and there is no health coverage or pension plan.

Reed job advertVIPKid, for example, considered to be one of the better companies, offers payment in the $14 – $22 / hour range. Others offer considerably less, especially if you are not a white, graduate US citizen. Current rates advertised on OETJobs include work for Ziktalk ($10 – 15 / hour), NiceTalk ($10 – 11 / hour), 247MyTutor ($5 – 8 / hour) and Weblio ($5 – 6 / hour). The number of hours that you get is rarely fixed and tutors need to build up a client base by getting good reviews. They will often need to upload short introductory videos, selling their skills. They are in direct competition with other tutors.

They also need to make themselves available when demand for their services is highest. Peak hours for VIPKid, for example, are between 2 and 8 in the morning, depending on where you live in the US. Weekends, too, are popular. With VIPKid, classes are scheduled in advance, but this is not always the case with other companies, where you log on to show that you are available and hope someone wants you. This is the case with, for example, Cambly (which pays $10.20 / hour … or rather $0.17 / minute) and NiceTalk. According to one review, Cambly has a ‘priority hours system [which] allows teachers who book their teaching slots in advance to feature higher on the teacher list than those who have just logged in, meaning that they will receive more calls’. Teachers have to commit to a set schedule and any changes are heavily penalised. The review states that ‘new tutors on the platform should expect to receive calls for about 50% of the time they’re logged on’.

 

Taking the gig economy to its logical conclusion, there are other companies where tutors can fix their own rates. SkimaTalk, for example, offers a deal where tutors first teach three unpaid lessons (‘to understand how the system works and build up their initial reputation on the platform’), then the system sets $16 / hour as a default rate, but tutors can change this to anything they wish. With another, Palfish, where tutors set their own rate, the typical rate is $10 – 18 / hour, and the company takes a 20% commission. With Preply, here is the deal on offer:

Your earnings depend on the hourly rate you set in your profile and how often you can provide lessons. Preply takes a 100% commission fee of your first lesson payment with every new student. For all subsequent lessons, the commission varies from 33 to 18% and depends on the number of completed lesson hours with students. The more tutoring you do through Preply, the less commission you pay.

Not one to miss a trick, Ziktalk (‘currently focusing on language learning and building global audience’) encourages teachers ‘to upload educational videos in order to attract more students’. Or, to put it another way, teachers provide free content in order to have more chance of earning $10 – 15 / hour. Ah, the joys of digital labour!

And, then, coronavirus came along. With schools shutting down, first in China and then elsewhere, tens of millions of students are migrating online. In Hong Kong, for example, the South China Morning Post reports that schools will remain closed until April 20, at the earliest, but university entrance exams will be going ahead as planned in late March. CNBC reported yesterday that classes are being cancelled across the US, and the same is happening, or is likely to happen, in many other countries.

Shares in the big online providers soared in February, with Forbes reporting that $3.2 billion had been added to the share value of China’s e-Learning leaders. Stock in New Oriental (owners of BlingABC, mentioned above) ‘rose 7.3% last month, adding $190 million to the wealth of its founder Yu Minhong [whose] current net worth is estimated at $3.4 billion’.

DingTalk, a communication and management app owned by Alibaba (and the most downloaded free app in China’s iOS App Store), has been adapted to offer online services for schools, reports Xinhua, the official state-run Chinese news agency. The scale of operations is enormous: more than 10,000 new cloud servers were deployed within just two hours.

Current impacts are likely to be dwarfed by what happens in the future. According to Terry Weng, a Shenzhen-based analyst, ‘The gradual exit of smaller education firms means there are more opportunities for TAL and New Oriental. […] Investors are more keen for their future performance.’ Zhu Hong, CTO of DingTalk, observes ‘the epidemic is like a catalyst for many enterprises and schools to adopt digital technology platforms and products’.

For edtech investors, things look rosy. Smaller, F2F providers are in danger of going under. In an attempt to mop up this market and gain overall market share, many elearning providers are offering weighty discounts and free services. Profits can come later.

For the hundreds of thousands of illegal or semi-legal English language teachers in China, things look doubly bleak. Their situation is likely to become even more precarious, with the online gig economy their obvious fall-back path. But English language teachers everywhere are likely to be affected one way or another, as will the whole world of TEFL.

Now seems like a pretty good time to find out more about precarity (see the Teachers as Workers website) and native-speakerism (see TEFL Equity Advocates).

Digital flashcard systems like Memrise and Quizlet remain among the most popular language learning apps. Their focus is on the deliberate learning of vocabulary, an approach described by Paul Nation (Nation, 2005) as ‘one of the least efficient ways of developing learners’ vocabulary knowledge but nonetheless […] an important part of a well-balanced vocabulary programme’. The deliberate teaching of vocabulary also features prominently in most platform-based language courses.

For both vocabulary apps and bigger courses, the lexical items need to be organised into sets for the purposes of both presentation and practice. A common way of doing this, especially at lower levels, is to group the items into semantic clusters (sets with a classifying superordinate, like body part, and a collection of example hyponyms, like arm, leg, head, chest, etc.).

The problem, as Keith Folse puts it, is that such clusters ‘are not only unhelpful, they actually hinder vocabulary retention’ (Folse, 2004: 52). Evidence for this claim may be found in Higa (1963), Tinkham (1993, 1997), Waring (1997), Erten & Tekin (2008) and Barcroft (2015), to cite just some of the more well-known studies. The results, says Folse, ‘are clear and, I think, very conclusive’. The explanation that is usually given draws on interference theory: semantic similarity may lead to confusion (e.g. when learners mix up days of the week, colour words or adjectives to describe personality).

It appears, then, to be long past time to get rid of semantic clusters in language teaching. Well … not so fast. First of all, although most of the research sides with Folse, not all of it does. Nakata and Suzuki (2019) in their survey of more recent research found that results were more mixed. They found one study which suggested that there was no significant difference in learning outcomes between presenting words in semantic clusters and semantically unrelated groups (Ishii, 2015). And they found four studies (Hashemi & Gowdasiaei, 2005; Hoshino, 2010; Schneider, Healy, & Bourne, 1998, 2002) where semantic clusters had a positive effect on learning.

Nakata and Suzuki (2019) offer three reasons why semantic clustering might facilitate vocabulary learning: it (1) ‘reflects how vocabulary is stored in the mental lexicon, (2) introduces desirable difficulty, and (3) leads to extra attention, effort, or engagement from learners’. Finkbeiner and Nicol (2003) make a similar point: ‘although learning semantically related words appears to take longer, it is possible that words learned under these conditions are learned better for the purpose of actual language use (e.g., the retrieval of vocabulary during production and comprehension). That is, the very difficulty associated with learning the new labels may make them easier to process once they are learned’. Both pairs of researcher cited in this paragraph conclude that semantic clusters are best avoided, but their discussion of the possible benefits of this clustering is a recognition that the research (for reasons which I will come on to) cannot lead to categorical conclusions.

The problem, as so often with pedagogical research, is the gap between research conditions and real-world classrooms. Before looking at this in a little more detail, one relatively uncontentious observation can be made. Even those scholars who advise against semantic clustering (e.g. Papathanasiou, 2009), acknowledge that the situation is complicated by other factors, especially the level of proficiency of the learner and whether or not one or more of the hyponyms are known to the learner. At higher levels (when it is more likely that one or more of the hyponyms are already, even partially, known), semantic clustering is not a problem. I would add that, on the whole at higher levels, the deliberate learning of vocabulary is even less efficient than at lower levels and should be an increasingly small part of a well-balanced vocabulary programme.

So, why is there a problem drawing practical conclusions from the research? In order to have any scientific validity at all, researchers need to control a large number of variable. They need, for example, to be sure that learners do not already know any of the items that are being presented. The only practical way of doing this is to present sets of invented words, and this is what most of the research does (Sarioğlu, 2018). These artificial words solve one problem, but create others, the most significant of which is item difficulty. Many factors impact on item difficulty, and these include word frequency (obviously a problem with invented words), word length, pronounceability and the familiarity and length of the corresponding item in L1. None of the studies which support the abandonment of semantic clusters have controlled all of these variables (Nakata and Suzuki, 2019). Indeed, it would be practically impossible to do so. Learning pseudo-words is a very different proposition to learning real words, which a learner may subsequently encounter or want to use.

Take, for example, the days of the week. It’s quite common for learners to muddle up Tuesday and Thursday. The reason for this is not just semantic similarity (Tuesday and Monday are less frequently confused). They are also very similar in terms of both spelling and pronunciation. They are ‘synforms’ (see Laufer, 2009), which, like semantic clusters, can hinder learning of new items. But, now imagine a French-speaking learner of Spanish studying the days of the week. It is much less likely that martes and jueves will be muddled, because of their similarity to the French words mardi and jeudi. There would appear to be no good reason not to teach the complete set of days of the week to a learner like this. All other things being equal, it is probably a good idea to avoid semantic clusters, but all other things are very rarely equal.

Again, in an attempt to control for variables, researchers typically present the target items in isolation (in bilingual pairings). But, again, the real world does not normally conform to this condition. Leo Sellivan (2014) suggests that semantic clusters (e.g. colours) are taught as part of collocations. He gives the examples of red dress, green grass and black coffee, and points out that the alliterative patterns can serve as mnemonic devices which will facilitate learning. The suggestion is, I think, a very good one, but, more generally, it’s worth noting that the presentation of lexical items in both digital flashcards and platform courses is rarely context-free. Contexts will inevitably impact on learning and may well obviate the risks of semantic clustering.

Finally, this kind of research typically gives participants very restricted time to memorize the target words (Sarioğlu, 2018) and they are tested in very controlled recall tasks. In the case of language platform courses, practice of target items is usually spread out over a much longer period of time, with a variety of exposure opportunities (in controlled practice tasks, exposure in texts, personalisation tasks, revision exercises, etc.) both within and across learning units. In this light, it is not unreasonable to argue that laboratory-type research offers only limited insights into what should happen in the real world of language learning and teaching. The choice of learning items, the way they are presented and practised, and the variety of activities in the well-balanced vocabulary programme are probably all more significant than the question of whether items are organised into semantic clusters.

Although semantic clusters are quite common in language learning materials, much more common are thematic clusters (i.e. groups of words which are topically related, but include a variety of parts of speech (see below). Researchers, it seems, have no problem with this way of organising lexical sets. By way of conclusion, here’s an extract from a recent book:

‘Introducing new words together that are similar in meaning (synonyms), such as scared and frightened, or forms (synforms), like contain and maintain, can be confusing, and students are less likely to remember them. This problem is known as ‘interference’. One way to avoid this is to choose words that are around the same theme, but which include a mix of different parts of speech. For example, if you want to focus on vocabulary to talk about feelings, instead of picking lots of adjectives (happy, sad, angry, scared, frightened, nervous, etc.) include some verbs (feel, enjoy, complain) and some nouns (fun, feelings, nerves). This also encourages students to use a variety of structures with the vocabulary.’ (Hughes, et al., 2015: 25)

 

References

Barcroft, J. 2015. Lexical Input Processing and Vocabulary Learning. Amsterdam: John Benjamins

Erten, I.H., & Tekin, M. 2008. Effects on vocabulary acquisition of presenting new words in semantic sets versus semantically-unrelated sets. System, 36 (3), 407-422

Finkbeiner, M. & Nicol, J. 2003. Semantic category effects in second language word learning. Applied Psycholinguistics 24 (2003), 369–383

Folse, K. S. 2004. Vocabulary Myths. Ann Arbor: University of Michigan Press

Hashemi, M.R., & Gowdasiaei, F. 2005. An attribute-treatment interaction study: Lexical-set versus semantically-unrelated vocabulary instruction. RELC Journal, 36 (3), 341-361

Higa, M. 1963. Interference effects of intralist word relationships in verbal learning. Journal of Verbal Learning and Verbal Behavior, 2, 170-175

Hoshino, Y. 2010. The categorical facilitation effects on L2 vocabulary learning in a classroom setting. RELC Journal, 41, 301–312

Hughes, S. H., Mauchline, F. & Moore, J. 2019. ETpedia Vocabulary. Shoreham-by-Sea: Pavilion Publishing and Media

Ishii, T. 2015. Semantic connection or visual connection: Investigating the true source of confusion. Language Teaching Research, 19, 712–722

Laufer, B. 2009. The concept of ‘synforms’ (similar lexical forms) in vocabulary acquisition. Language and Education, 2 (2): 113 – 132

Nakata, T. & Suzuki, Y. 2019. Effects Of Massing And Spacing On The Learning Of Semantically Related And Unrelated Words. Studies in Second Language Acquisition 41 (2), 287 – 311

Nation, P. 2005. Teaching Vocabulary. Asian EFL Journal. http://www.asian-efl-journal.com/sept_05_pn.pdf

Papathanasiou, E. 2009. An investigation of two ways of presenting vocabulary. ELT Journal 63 (4), 313 – 322

Sarioğlu, M. 2018. A Matter of Controversy: Teaching New L2 Words in Semantic Sets or Unrelated Sets. Journal of Higher Education and Science Vol 8 / 1: 172 – 183

Schneider, V. I., Healy, A. F., & Bourne, L. E. 1998. Contextual interference effects in foreign language vocabulary acquisition and retention. In Healy, A. F. & Bourne, L. E. (Eds.), Foreign language learning: Psycholinguistic studies on training and retention (pp. 77–90). Mahwah, NJ: Erlbaum

Schneider, V. I., Healy, A. F., & Bourne, L. E. 2002. What is learned under difficult conditions is hard to forget: Contextual interference effects in foreign vocabulary acquisition, retention, and transfer. Journal of Memory and Language, 46, 419–440

Sellivan, L. 2014. Horizontal alternatives to vertical lists. Blog post: http://leoxicon.blogspot.com/2014/03/horizontal-alternatives-to-vertical.html

Tinkham, T. 1993. The effect of semantic clustering on the learning of second language vocabulary. System 21 (3), 371-380.

Tinkham, T. 1997. The effects of semantic and thematic clustering on the learning of a second language vocabulary. Second Language Research, 13 (2),138-163

Waring, R. 1997. The negative effects of learning words in semantic sets: a replication. System, 25 (2), 261 – 274

Knowble, claims its developers, is a browser extension that will improve English vocabulary and reading comprehension. It also describes itself as an ‘adaptive language learning solution for publishers’. It’s currently beta and free, and sounds right up my street so I decided to give it a run.

Knowble reader

Users are asked to specify a first language (I chose French) and a level (A1 to C2): I chose B1, but this did not seem to impact on anything that subsequently happened. They are then offered a menu of about 30 up-to-date news items, grouped into 5 categories (world, science, business, sport, entertainment). Clicking on one article takes you to the article on the source website. There’s a good selection, including USA Today, CNN, Reuters, the Independent and the Torygraph from Britain, the Times of India, the Independent from Ireland and the Star from Canada. A large number of words are underlined: a single click brings up a translation in the extension box. Double-clicking on all other words will also bring up translations. Apart from that, there is one very short exercise (which has presumably been automatically generated) for each article.

For my trial run, I picked three articles: ‘Woman asks firefighters to help ‘stoned’ raccoon’ (from the BBC, 240 words), ‘Plastic straw and cotton bud ban proposed’ (also from the BBC, 823 words) and ‘London’s first housing market slump since 2009 weighs on UK price growth’ (from the Torygraph, 471 words).

Translations

Research suggests that the use of translations, rather than definitions, may lead to more learning gains, but the problem with Knowble is that it relies entirely on Google Translate. Google Translate is fast improving. Take the first sentence of the ‘plastic straw and cotton bud’ article, for example. It’s not a bad translation, but it gets the word ‘bid’ completely wrong, translating it as ‘offre’ (= offer), where ‘tentative’ (= attempt) is needed. So, we can still expect a few problems with Google Translate …

google_translateOne of the reasons that Google Translate has improved is that it no longer treats individual words as individual lexical items. It analyses groups of words and translates chunks or phrases (see, for example, the way it translates ‘as part of’). It doesn’t do word-for-word translation. Knowble, however, have set their software to ask Google for translations of each word as individual items, so the phrase ‘as part of’ is translated ‘comme’ + ‘partie’ + ‘de’. Whilst this example is comprehensible, problems arise very quickly. ‘Cotton buds’ (‘cotons-tiges’) become ‘coton’ + ‘bourgeon’ (= botanical shoots of cotton). Phrases like ‘in time’, ‘run into’, ‘sleep it off’ ‘take its course’, ‘fire station’ or ‘going on’ (all from the stoned raccoon text) all cause problems. In addition, Knowble are not using any parsing tools, so the system does not identify parts of speech, and further translation errors inevitably appear. In the short article of 240 words, about 10% are wrongly translated. Knowble claim to be using NLP tools, but there’s no sign of it here. They’re just using Google Translate rather badly.

Highlighted items

word_listNLP tools of some kind are presumably being used to select the words that get underlined. Exactly how this works is unclear. On the whole, it seems that very high frequency words are ignored and that lower frequency words are underlined. Here, for example, is the list of words that were underlined in the stoned raccoon text. I’ve compared them with (1) the CEFR levels for these words in the English Profile Text Inspector, and (2) the frequency information from the Macmillan dictionary (more stars = more frequent). In the other articles, some extremely high frequency words were underlined (e.g. price, cost, year) while much lower frequency items were not.

It is, of course, extremely difficult to predict which items of vocabulary a learner will know, even if we have a fairly accurate idea of their level. Personal interests play a significant part, so, for example, some people at even a low level will have no problem with ‘cannabis’, ‘stoned’ and ‘high’, even if these are low frequency. First language, however, is a reasonably reliable indicator as cognates can be expected to be easy. A French speaker will have no problem with ‘appreciate’, ‘unique’ and ‘symptom’. A recommendation engine that can meaningfully personalize vocabulary suggestions will, at the very least, need to consider cognates.

In short, the selection and underlining of vocabulary items, as it currently stands in Knowble, appears to serve no clear or useful function.

taskVocabulary learning

Knowble offers a very short exercise for each article. They are of three types: word completion, dictation and drag and drop (see the example). The rationale for the selection of the target items is unclear, but, in any case, these exercises are tokenistic in the extreme and are unlikely to lead to any significant learning gains. More valuable would be the possibility of exporting items into a spaced repetition flash card system.

effectiveThe claim that Knowble’s ‘learning effect is proven scientifically’ seems to me to be without any foundation. If there has been any proper research, it’s not signposted anywhere. Sure, reading lots of news articles (with a look-up function – if it works reliably) can only be beneficial for language learners, but they can do that with any decent dictionary running in the background.

Similar in many ways to en.news, which I reviewed in my last post, Knowble is another example of a technology-driven product that shows little understanding of language learning.

Last month, I wrote a post about the automated generation of vocabulary learning materials. Yesterday, I got an email from Mike Elchik, inviting me to take a look at the product that his company, WeSpeke, has developed in partnership with CNN. Called en.news, it’s a very regularly updated and wide selection of video clips and texts from CNN, which are then used to ‘automatically create a pedagogically structured, leveled and game-ified English lesson‘. Available at the AppStore and Google Play, as well as a desktop version, it’s free. Revenues will presumably be generated through advertising and later sales to corporate clients.

With 6.2 million dollars in funding so far, WeSpeke can leverage some state-of-the-art NLP and AI tools. Co-founder and chief technical adviser of the company is Jaime Carbonell, Director of the Language Technologies Institute at Carnegie Mellon University, described in Wikipedia as one of the gurus of machine learning. I decided to have a closer look.

home_page

Users are presented with a menu of CNN content (there were 38 items from yesterday alone), these are tagged with broad categories (Politics, Opinions, Money, Technology, Entertainment, etc.) and given a level, ranging from 1 to 5, although the vast majority of the material is at the two highest levels.

menu.jpg

I picked two lessons: a reading text about Mark Zuckerberg’s Congressional hearing (level 5) and a 9 minute news programme of mixed items (level 2 – illustrated above). In both cases, the lesson begins with the text. With the reading, you can click on words to bring up dictionary entries from the Collins dictionary. With the video, you can activate captions and again click on words for definitions. You can also slow down the speed. So far, so good.

There then follows a series of exercises which focus primarily on a set of words that have been automatically selected. This is where the problems began.

Level

It’s far from clear what the levels (1 – 5) refer to. The Zuckerberg text is 930 words long and is rated as B2 by one readability tool. But, using the English Profile Text Inspector, there are 19 types at C1 level, 14 at C2, and 98 which are unlisted. That suggests something substantially higher than B2. The CNN10 video is delivered at breakneck speed (as is often the case with US news shows). Yes, it can be slowed down, but that still won’t help with some passages, such as the one below:

A squirrel recently fell out of a tree in Western New York. Why would that make news?Because she bwoke her widdle leg and needed a widdle cast! Yes, there are casts for squirrels, as you can see in this video from the Orphaned Wildlife Center. A windstorm knocked the animal’s nest out of a tree, and when a woman saw that the baby squirrel was injured, she took her to a local vet. Doctors say she’s going to be just fine in a couple of weeks. Well, why ‘rodent’ she be? She’s been ‘whiskered’ away and cast in both a video and a plaster. And as long as she doesn’t get too ‘squirrelly’ before she heals, she’ll have quite a ‘tail’ to tell.

It’s hard to understand how a text like this got through the algorithms. But, as materials writers know, it is extremely hard to find authentic text that lends itself to language learning at anything below C1. On the evidence here, there is still some way to go before the process of selection can be automated. It may well be the case that CNN simply isn’t a particularly appropriate source.

Target learning items

The primary focus of these lessons is vocabulary learning, and it’s vocabulary learning of a very deliberate kind. Applied linguists are in general agreement that it makes sense for learners to approach the building of their L2 lexicon in a deliberate way (i.e. by studying individual words) for high-frequency items or items that can be identified as having a high surrender value (e.g. items from the AWL for students studying in an EMI context). Once you get to items that are less frequent than, say, the top 8,000 most frequent words, the effort expended in studying new words needs to be offset against their usefulness. Why spend a lot of time studying low frequency words when you’re unlikely to come across them again for some time … and will probably forget them before you do? Vocabulary development at higher levels is better served by extensive reading (and listening), possibly accompanied by glosses.

The target items in the Zuckerberg text were: advocacy, grilled, handicapping, sparked, diagnose, testified, hefty, imminent, deliberative and hesitant. One of these ‘grilled‘ is listed as A2 by English Vocabulary Profile, but that is with its literal, not metaphorical, meaning. Four of them are listed as C2 and the remaining five are off-list. In the CNN10 video, the target items were: strive, humble (verb), amplify, trafficked, enslaved, enacted, algae, trafficking, ink and squirrels. Of these, one is B1, two are C2 and the rest are unlisted. What is the point of studying these essentially random words? Why spend time going through a series of exercises that practise these items? Wouldn’t your time be better spent just doing some more reading? I have no idea how the automated selection of these items takes place, but it’s clear that it’s not working very well.

Practice exercises

There is plenty of variety of task-type but there are,  I think, two reasons to query the claim that these lessons are ‘pedagogically structured’. The first is the nature of the practice exercises; the second is the sequencing of the exercises. I’ll restrict my observations to a selection of the tasks.

1. Users are presented with a dictionary definition and an anagrammed target item which they must unscramble. For example:

existing for the purpose of discussing or planning something     VLREDBETEIIA

If you can’t solve the problem, you can always scroll through the text to find the answer. Burt the problem is in the task design. Dictionary definitions have been written to help language users decode a word. They simply don’t work very well when they are used for another purpose (as prompts for encoding).

2. Users are presented with a dictionary definition for which they must choose one of four words. There are many potential problems here, not the least of which is that definitions are often more complex than the word they are defining, or they present other challenges. As an example: cause to be unpretentious for to humble. On top of that, lexicographers often need or choose to embed the target item in the definition. For example:

a hefty amount of something, especially money, is very large

an event that is imminent, especially an unpleasant one, will happen very soon

When this is the case, it makes no sense to present these definitions and ask learners to find the target item from a list of four.

The two key pieces of content in this product – the CNN texts and the Collins dictionaries – are both less than ideal for their purposes.

3. Users are presented with a box of jumbled words which they must unscramble to form sentences that appeared in the text.

Rearrange_words_to_make_sentences

The sentences are usually long and hard to reconstruct. You can scroll through the text to find the answer, but I’m unclear what the point of this would be. The example above contains a mistake (vie instead of vice), but this was one of only two glitches I encountered.

4. Users are asked to select the word that they hear on an audio recording. For example:

squirreling     squirrel     squirreled     squirrels

Given the high level of challenge of both the text and the target items, this was a rather strange exercise to kick off the practice. The meaning has not yet been presented (in a matching / definition task), so what exactly is the point of this exercise?

5. Users are presented with gapped sentences from the text and asked to choose the correct grammatical form of the missing word. Some of these were hard (e.g. adjective order), others were very easy (e.g. some vs any). The example below struck me as plain weird for a lesson at this level.

________ have zero expectation that this Congress is going to make adequate changes. (I or Me ?)

6. At the end of both lessons, there were a small number of questions that tested your memory of the text. If, like me, you couldn’t remember all that much about the text after twenty minutes of vocabulary activities, you can scroll through the text to find the answers. This is not a task type that will develop reading skills: I am unclear what it could possibly develop.

Overall?

Using the lessons on offer here wouldn’t do a learner (as long as they already had a high level of proficiency) any harm, but it wouldn’t be the most productive use of their time, either. If a learner is motivated to read the text about Zuckerberg, rather than do lots of ‘busy’ work on a very odd set of words with gap-fills and matching tasks, they’d be better advised just to read the text again once or twice. They could use a look-up for words they want to understand and import them into a flashcard system with spaced repetition (en.news does have flashcards, but there’s no sign of spaced practice yet). More, they could check out another news website and read / watch other articles on the same subject (perhaps choosing websites with a different slant to CNN) and get valuable narrow-reading practice in this way.

My guess is that the technology has driven the product here, but without answering the fundamental questions about which words it’s appropriate for individual learners to study in a deliberate way and how this is best tackled, it doesn’t take learners very far.