Posts Tagged ‘English Profile’

NB This is an edited version of the original review.

Words & Monsters is a new vocabulary app that has caught my attention. There are three reasons for this. Firstly, because it’s free. Secondly, because I was led to believe (falsely, as it turns out) that two of the people behind it are Charles Browne and Brent Culligan, eminently respectable linguists, who were also behind the development of the New General Service List (NGSL), based on data from the Cambridge English Corpus. And thirdly, because a lot of thought, effort and investment have clearly gone into the gamification of Words & Monsters (WAM). It’s to the last of these that I’ll turn my attention first.

WAM teaches vocabulary in the context of a battle between a player’s avatar and a variety of monsters. If users can correctly match a set of target items to definitions or translations in the available time, they ‘defeat’ the monster and accumulate points. The more points you have, the higher you advance through a series of levels and ranks. There are bonuses for meeting daily and weekly goals, there are leaderboards, and trophies and medals can be won. In addition to points, players also win ‘crystals’ after successful battles, and these crystals can be used to buy accessories which change the appearance of the avatar and give the player added ‘powers’. I was never able to fully understand precisely how these ‘powers’ affected the number of points I could win in battle. It remained as baffling to me as the whole system of values with Pokemon cards, which is presumably a large part of the inspiration here. Perhaps others, more used to games like Pokemon, would find it all much more transparent.

The system of rewards is all rather complicated, but perhaps this doesn’t matter too much. In fact, it might be the case that working out how reward systems work is part of what motivates people to play games. But there is another aspect to this: the app’s developers refer in their bumf to research by Howard-Jones and Jay (2016), which suggests that when rewards are uncertain, more dopamine is released in the mid-brain and this may lead to reinforcement of learning, and, possibly, enhancement of declarative memory function. Possibly … but Howard-Jones and Jay point out that ‘the science required to inform the manipulation of reward schedules for educational benefit is very incomplete.’ So, WAM’s developers may be jumping the gun a little and overstating the applicability of the neuroscientific research, but they’re not alone in that!

If you don’t understand a reward system, it’s certain that the rewards are uncertain. But WAM takes this further in at least two ways. Firstly, when you win a ‘battle’, you have to click on a plain treasure bag to collect your crystals, and you don’t know whether you’ll get one, two, three, or zero, crystals. You are given a semblance of agency, but, essentially, the whole thing is random. Secondly, when you want to convert your crystals into accessories for your avatar, random selection determines which accessory you receive, even though, again, there is a semblance of agency. Different accessories have different power values. This extended use of what the developers call ‘the thrill of uncertain rewards’ is certainly interesting, but how effective it is is another matter. My own reaction, after quite some time spent ‘studying’, to getting no crystals or an avatar accessory that I didn’t want was primarily frustration, rather than motivation to carry on. I have no idea how typical my reaction (more ‘treadmill’ than ‘thrill’) might be.

Unsurprisingly, for an app that has so obviously thought carefully about gamification, players are encouraged to interact with each other. As part of the early promotion, WAM is running, from 15 November to 19 December, a free ‘team challenge tournament’, allowing teams of up to 8 players to compete against each other. Ingeniously, it would appear to allow teams and players of varying levels of English to play together, with the app’s algorithms determining each individual’s level of lexical knowledge and therefore the items that will be presented / tested. Social interaction is known to be an important component of successful games (Dehghanzadeh et al., 2019), but for vocabulary apps there’s a huge challenge. In order to learn vocabulary from an app, learners need to put in time – on a regular basis. Team challenge tournaments may help with initial on-boarding of players, but, in the end, learning from a vocabulary app is inevitably and largely a solitary pursuit. Over time, social interaction is unlikely to be maintained, and it is, in any case, of a very limited nature. The other features of successful games – playful freedom and intrinsically motivating tasks (Driver, 2012) – are also absent from vocabulary apps. Playful freedom is mostly incompatible with points, badges and leaderboards. And flashcard tasks, however intrinsically motivating they may be at the outset, will always become repetitive after a while. In the end, what’s left, for those users who hang around long enough, is the reward system.

It’s also worth noting that this free challenge is of limited duration: it is a marketing device attempting to push you towards the non-free use of the app, once the initial promotion is over.

Gamified motivation tools are only of value, of course, if they motivate learners to spend their time doing things that are of clear learning value. To evaluate the learning potential of WAM, then, we need to look at the content (the ‘learning objects’) and the learning tasks that supposedly lead to acquisition of these items.

When you first use WAM, you need to play for about 20 minutes, at which point algorithms determine ‘how many words [you] know and [you can] see scores for English tests such as; TOEFL, TOEIC, IELTS, EIKEN, Kyotsu Shiken, CEFR, SAT and GRE’. The developers claim that these scores correlate pretty highly with actual test scores: ‘they are about as accurate as the tests themselves’, they say. If Browne and Culligan had been behind the app, I would have been tempted to accept the claim – with reservations: after all, it still allows for one item out of 5 to be wrongly identified. But, what is this CEFR test score that is referred to? There is no CEFR test, although many tests are correlated with CEFR. The two tools that I am most familiar with which allocate CEFR levels to individual words – Cambridge’s English Vocabulary Profile and Pearson’s Global Scale of English – often conflict in their results. I suspect that ‘CEFR’ was just thrown into the list of tests as an attempt to broaden the app’s appeal.

English target words are presented and practised with their translation ‘equivalents’ in Japanese. For the moment, Japanese is the only language available, which means the app is of little use to learners who don’t know any Japanese. It’s now well-known that bilingual pairings are more effective in deliberate language learning than using definitions in the same language as the target items. This becomes immediately apparent when, for example, a word like ‘something’ is defined (by WAM) as ‘a thing not known or specified’ and ‘anything’ as ‘a thing of whatever kind’. But although I’m in no position to judge the Japanese translations, there are reasons why I would want to check the spreadsheet before recommending the app. ‘Lady’ is defined as ‘polite word for a woman’; ‘missus’ is defined as ‘wife’; and ‘aye’ is defined as ‘yes’. All of these definitions are, at best, problematic; at worst, they are misleading. Are the Japanese translations more helpful? I wonder … Perhaps these are simply words that do not lend themselves to flashcard treatment?

Because I tested in to the app at C1 level, I was not able to evaluate the selection of words at lower levels. A pity. Instead, I was presented with words like ‘ablution’, ‘abrade’, ‘anode’, and ‘auspice’. The app claims to be suitable ‘for both second-language learners and native speakers’. For lower levels of the former, this may be true (but without looking at the lexical spreadsheets, I can’t tell). But for higher levels, however much fun this may be for some people, it seems unlikely that you’ll learn very much of any value. Outside of words in, say, the top 8000 frequency band, it is practically impossible to differentiate the ‘surrender value’ of words in any meaningful way. Deliberate learning of vocabulary only makes sense with high frequency words that you have a chance of encountering elsewhere. You’d be better off reading, extensively, rather than learning random words from an app. Words, which (for reasons I’ll come on to) you probably won’t actually learn anyway.

With very few exceptions, the learning objects in WAM are single words, rather than phrases, even when the item is of little or no value outside its use in a phrase. ‘Betide’ is defined as ‘to happen to; befall’ but this doesn’t tell a learner much that is useful. It’s practically only ever used following ‘woe’ (but what does ‘woe’ mean?!). Learning items can be checked in the ‘study guide’, which will show that ‘betide’ typically follows ‘woe’, but unless you choose to refer to the study guide (and there’s no reason, in a case like this, that you would know that you need to check things out more fully), you’ll be none the wiser. In other words, checking the study guide is unlikely to betide you. ‘Wee’, as another example, is treated as two items: (1) meaning ‘very small’ as in ‘wee baby’, and (2) meaning ‘very early in the morning’ as in ‘in the wee hours’. For the latter, ‘wee’ can only collocate with ‘in the’ and ‘hours’, so it makes little sense to present it as a single word. This is also an example of how, in some cases, different meanings of particular words are treated as separate learning objects, even when the two meanings are very close and, in my view, are hardly worth learning separately. Examples include ‘czar’ and ‘assonance’. Sometimes, cognates are treated as separate learning objects (e.g. ‘adulterate’ and ‘adulteration’ or ‘dolor’ and ‘dolorous’); with other words (e.g. ‘effulgence’), only one grammatical form appears to be given. I could not begin to figure out any rationale behind any of this.

All in all, then, there are reasons to be a little skeptical about some of the content. Up to level B2 – which, in my view, is the highest level at which it makes sense to use vocabulary flashcards – it may be of value, so long as your first language is Japanese. But given the claim that it can help you prepare for the ‘CEFR test’, I have to wonder …

The learning tasks require players to match target items to translations / definitions (in both directions), with the target item sometimes in written form, sometimes spoken. Users do not, as far as I can tell, ever have to produce the target item: they only have to select. The learning relies on spaced repetition, but there is no generative effect (known to enhance memorisation). When I was experimenting, there were a few words that I did not know, but I was usually able to get the correct answer by eliminating the distractors (a choice of one from three gives players a reasonable chance of guessing correctly). WAM does not teach users how to produce words; its focus is on receptive knowledge (of a limited kind). I learn, for example, what a word like ‘aye’ or ‘missus’ kind of means, but I learn nothing about how to use it appropriately. Contrary to the claims in WAM’s bumf (that ‘all senses and dimensions of each word are fully acquired’), reading and listening comprehension speeds may be improved, but appropriate and accurate use of these words in speaking and writing is much less likely to follow. Does WAM really ‘strengthen and expand the foundation levels of cognition that support all higher level thinking’, as is claimed?

Perhaps it’s unfair to mention some of the more dubious claims of WAM’s promotional material, but here is a small selection, anyway: ‘WAM unleashes the full potential of natural motivation’. ‘WAM promotes Flow by carefully managing the ratio of unknown words. Your mind moves freely in the channel below frustration and above boredom’.

WAM is certainly an interesting project, but, like all the vocabulary apps I have ever looked at, there have to be trade-offs between optimal task design and what will fit on a mobile screen, between freedoms and flexibility for the user and the requirements of gamified points systems, between the amount of linguistic information that is desirable and the amount that spaced repetition can deal with, between attempting to make the app suitable for the greatest number of potential users and making it especially appropriate for particular kinds of users. Design considerations are always a mix of the pedagogical and the practical / commercial. And, of course, the financial. And, like most edtech products, the claims for its efficacy need to be treated with a bucket of salt.

References

Dehghanzadeh, H., Fardanesh, H., Hatami, J., Talaee, E. & Noroozi, O. (2019) Using gamification to support learning English as a second language: a systematic review, Computer Assisted Language Learning, DOI: 10.1080/09588221.2019.1648298

Driver, P. (2012) The Irony of Gamification. In English Digital Magazine 3, British Council Portugal, pp. 21 – 24 http://digitaldebris.info/digital-debris/2011/12/31/the-irony-of-gamification-written-for-ied-magazine.html

Howard-Jones, P. & Jay, T. (2016) Reward, learning and games. Current Opinion in Behavioral Sciences, 10: 65 – 72

I was intrigued to learn earlier this year that Oxford University Press had launched a new online test of English language proficiency, called the Oxford Test of English (OTE). At the conference where I first heard about it, I was struck by the fact that the presentation of the OUP sponsored plenary speaker was entitled ‘The Power of Assessment’ and dealt with formative assessment / assessment for learning. Oxford clearly want to position themselves as serious competitors to Pearson and Cambridge English in the testing business.

The brochure for the exam kicks off with a gem of a marketing slogan, ‘Smart. Smarter. SmarTest’ (geddit?), and the next few pages give us all the key information.

Faster and more flexible‘Traditional language proficiency tests’ is presumably intended to refer to the main competition (Pearson and Cambridge English). Cambridge First takes, in total, 3½ hours; the Pearson Test of English Academic takes 3 hours. The OTE takes, in total, 2 hours and 5 minutes. It can be taken, in theory, on any day of the year, although this depends on the individual Approved Test Centres, and, again, in theory, it can be booked as little as 14 days in advance. Results should take only two weeks to arrive. Further flexibility is offered in the way that candidates can pick ’n’ choose which of the four skills they want to have tests, just one or all four, although, as an incentive to go the whole hog, they will only get a ‘Certificate of Proficiency’ if they do all four.

A further incentive to do all four skills at the same time can be found in the price structure. One centre in Spain is currently offering the test for one single skill at Ꞓ41.50, but do the whole lot, and it will only set you back Ꞓ89. For a high-stakes test, this is cheap. In the UK right now, both Cambridge First and Pearson Academic cost in the region of £150, and IELTS a bit more than that. So, faster, more flexible and cheaper … Oxford means business.

Individual experience

The ‘individual experience’ on the next page of the brochure is pure marketing guff. This is, after all, a high-stakes, standardised test. It may be true that ‘the Speaking and Writing modules provide randomly generated tasks, making the overall test different each time’, but there can only be a certain number of permutations. What’s more, in ‘traditional tests’, like Cambridge First, where there is a live examiner or two, an individualised experience is unavoidable.

More interesting to me is the reference to adaptive technology. According to the brochure, ‘The Listening and Reading modules are adaptive, which means the test difficulty adjusts in response to your answers, quickly finding the right level for each test taker. This means that the questions are at just the right level of challenge, making the test shorter and less stressful than traditional proficiency tests’.

My curiosity piqued, I decided to look more closely at the Reading module. I found one practice test online which is the same as the demo that is available at the OTE website . Unfortunately, this example is not adaptive: it is at B1 level. The actual test records scores between 51 and 140, corresponding to levels A2, B1 and B2.

Test scores

The tasks in the Reading module are familiar from coursebooks and other exams: multiple choice, multiple matching and gapped texts.

Reading tasks

According to the exam specifications, these tasks are designed to measure the following skills:

  • Reading to identify main message, purpose, detail
  • Expeditious reading to identify specific information, opinion and attitude
  • Reading to identify text structure, organizational features of a text
  • Reading to identify attitude / opinion, purpose, reference, the meanings of words in context, global meaning

The ability to perform these skills depends, ultimately, on the candidate’s knowledge of vocabulary and grammar, as can be seen in the examples below.

Task 1Task 2

How exactly, I wonder, does the test difficulty adjust in response to the candidate’s answers? The algorithm that is used depends on measures of the difficulty of the test items. If these items are to be made harder or easier, the only significant way that I can see of doing this is by making the key vocabulary lower- or higher-frequency. This, in turn, is only possible if vocabulary and grammar has been tagged as being at a particular level. The most well-known tools for doing this have been developed by Pearson (with the GSE Teacher Toolkit ) and Cambridge English Profile . To the best of my knowledge, Oxford does not yet have a tool of this kind (at least, none that is publicly available). However, the data that OUP will accumulate from OTE scripts and recordings will be invaluable in building a database which their lexicographers can use in developing such a tool.

Even when a data-driven (and numerically precise) tool is available for modifying the difficulty of test items, I still find it hard to understand how the adaptivity will impact on the length or the stress of the reading test. The Reading module is only 35 minutes long and contains only 22 items. Anything that is significantly shorter must surely impact on the reliability of the test.

My conclusion from this is that the adaptive element of the Reading and Listening modules in the OTE is less important to the test itself than it is to building a sophisticated database (not dissimilar to the GSE Teacher Toolkit or Cambridge English Profile). The value of this will be found, in due course, in calibrating all OUP materials. The OTE has already been aligned to the Oxford Online Placement Test (OOPT) and, presumably, coursebooks will soon follow. This, in turn, will facilitate a vertically integrated business model, like Pearson and CUP, where everything from placement test, to coursework, to formative assessment, to final proficiency testing can be on offer.

Every now and then, someone recommends me to take a look at a flashcard app. It’s often interesting to see what developers have done with design, gamification and UX features, but the content is almost invariably awful. Most recently, I was encouraged to look at Word Pash. The screenshots below are from their promotional video.

word-pash-1 word-pash-2 word-pash-3 word-pash-4

The content problems are immediately apparent: an apparently random selection of target items, an apparently random mix of high and low frequency items, unidiomatic language examples, along with definitions and distractors that are less frequent than the target item. I don’t know if these are representative of the rest of the content. The examples seem to come from ‘Stage 1 Level 3’, whatever that means. (My confidence in the product was also damaged by the fact that the Word Pash website includes one testimonial from a certain ‘Janet Reed – Proud Mom’, whose son ‘was able to increase his score and qualify for academic scholarships at major universities’ after using the app. The picture accompanying ‘Janet Reed’ is a free stock image from Pexels and ‘Janet Reed’ is presumably fictional.)

According to the website, ‘WordPash is a free-to-play mobile app game for everyone in the global audience whether you are a 3rd grader or PhD, wordbuff or a student studying for their SATs, foreign student or international business person, you will become addicted to this fast paced word game’. On the basis of the promotional video, the app couldn’t be less appropriate for English language learners. It seems unlikely that it would help anyone improve their ACT or SAT test scores. The suggestion that the vocabulary development needs of 9-year-olds and doctoral students are comparable is pure chutzpah.

The deliberate study of more or less random words may be entertaining, but it’s unlikely to lead to very much in practical terms. For general purposes, the deliberate learning of the highest frequency words, up to about a frequency ranking of #7500, makes sense, because there’s a reasonably high probability that you’ll come across these items again before you’ve forgotten them. Beyond that frequency level, the value of the acquisition of an additional 1000 words tails off very quickly. Adding 1000 words from frequency ranking #8000 to #9000 is likely to result in an increase in lexical understanding of general purpose texts of about 0.2%. When we get to frequency ranks #19,000 to #20,000, the gain in understanding decreases to 0.01%[1]. In other words, deliberate vocabulary learning needs to be targeted. The data is relatively recent, but the principle goes back to at least the middle of the last century when Michael West argued that a principled approach to vocabulary development should be driven by a comparison of the usefulness of a word and its ‘learning cost’[2]. Three hundred years before that, Comenius had articulated something very similar: ‘in compiling vocabularies, my […] concern was to select the words in most frequent use[3].

I’ll return to ‘general purposes’ later in this post, but, for now, we should remember that very few language learners actually study a language for general purposes. Globally, the vast majority of English language learners study English in an academic (school) context and their immediate needs are usually exam-specific. For them, general purpose frequency lists are unlikely to be adequate. If they are studying with a coursebook and are going to be tested on the lexical content of that book, they will need to use the wordlist that matches the book. Increasingly, publishers make such lists available and content producers for vocabulary apps like Quizlet and Memrise often use them. Many examinations, both national and international, also have accompanying wordlists. Examples of such lists produced by examination boards include the Cambridge English young learners’ exams (Starters, Movers and Flyers) and Cambridge English Preliminary. Other exams do not have official word lists, but reasonably reliable lists have been produced by third parties. Examples include Cambridge First, IELTS and SAT. There are, in addition, well-researched wordlists for academic English, including the Academic Word List (AWL)  and the Academic Vocabulary List  (AVL). All of these make sensible starting points for deliberate vocabulary learning.

When we turn to other, out-of-school learners the number of reasons for studying English is huge. Different learners have different lexical needs, and working with a general purpose frequency list may be, at least in part, a waste of time. EFL and ESL learners are likely to have very different needs, as will EFL and ESP learners, as will older and younger learners, learners in different parts of the world, learners who will find themselves in English-speaking countries and those who won’t, etc., etc. For some of these demographics, specialised corpora (from which frequency-based wordlists can be drawn) exist. For most learners, though, the ideal list simply does not exist. Either it will have to be created (requiring a significant amount of time and expertise[4]) or an available best-fit will have to suffice. Paul Nation, in his recent ‘Making and Using Word Lists for Language Learning and Testing’ (John Benjamins, 2016) includes a useful chapter on critiquing wordlists. For anyone interested in better understanding the issues surrounding the development and use of wordlists, three good articles are freely available online. These are:making-and-using-word-lists-for-language-learning-and-testing

Lessard-Clouston, M. 2012 / 2013. ‘Word Lists for Vocabulary Learning and Teaching’ The CATESOL Journal 24.1: 287- 304

Lessard-Clouston, M. 2016. ‘Word lists and vocabulary teaching: options and suggestions’ Cornerstone ESL Conference 2016

Sorell, C. J. 2013. A study of issues and techniques for creating core vocabulary lists for English as an International Language. Doctoral thesis.

But, back to ‘general purposes’ …. Frequency lists are the obvious starting point for preparing a wordlist for deliberate learning, but they are very problematic. Frequency rankings depend on the corpus on which they are based and, since these are different, rankings vary from one list to another. Even drawing on just one corpus, rankings can be a little strange. In the British National Corpus, for example, ‘May’ (the month) is about twice as frequent as ‘August’[5], but we would be foolish to infer from this that the learning of ‘May’ should be prioritised over the learning of ‘August’. An even more striking example from the same corpus is the fact that ‘he’ is about twice as frequent as ‘she’[6]: should, therefore, ‘he’ be learnt before ‘she’?

List compilers have to make a number of judgement calls in their work. There is not space here to consider these in detail, but two particularly tricky questions concerning the way that words are chosen may be mentioned: Is a verb like ‘list’, with two different and unrelated meanings, one word or two? Should inflected forms be considered as separate words? The judgements are not usually informed by considerations of learners’ needs. Learners will probably best approach vocabulary development by building their store of word senses: attempting to learn all the meanings and related forms of any given word is unlikely to be either useful or successful.

Frequency lists, in other words, are not statements of scientific ‘fact’: they are interpretative documents. They have been compiled for descriptive purposes, not as ways of structuring vocabulary learning, and it cannot be assumed they will necessarily be appropriate for a purpose for which they were not designed.

A further major problem concerns the corpus on which the frequency list is based. Large databases, such as the British National Corpus or the Corpus of Contemporary American English, are collections of language used by native speakers in certain parts of the world, usually of a restricted social class. As such, they are of relatively little value to learners who will be using English in contexts that are not covered by the corpus. A context where English is a lingua franca is one such example.

A different kind of corpus is the Cambridge Learner Corpus (CLC), a collection of exam scripts produced by candidates in Cambridge exams. This has led to the development of the English Vocabulary Profile (EVP) , where word senses are tagged as corresponding to particular levels in the Common European Framework scale. At first glance, this looks like a good alternative to frequency lists based on native-speaker corpora. But closer consideration reveals many problems. The design of examination tasks inevitably results in the production of language of a very different kind from that produced in other contexts. Many high frequency words simply do not appear in the CLC because it is unlikely that a candidate would use them in an exam. Other items are very frequent in this corpus just because they are likely to be produced in examination tasks. Unsurprisingly, frequency rankings in EVP do not correlate very well with frequency rankings from other corpora. The EVP, then, like other frequency lists, can only serve, at best, as a rough guide for the drawing up of target item vocabulary lists in general purpose apps or coursebooks[7].

There is no easy solution to the problems involved in devising suitable lexical content for the ‘global audience’. Tagging words to levels (i.e. grouping them into frequency bands) will always be problematic, unless very specific user groups are identified. Writers, like myself, of general purpose English language teaching materials are justifiably irritated by some publishers’ insistence on allocating words to levels with numerical values. The policy, taken to extremes (as is increasingly the case), has little to recommend it in linguistic terms. But it’s still a whole lot better than the aleatory content of apps like Word Pash.

[1] See Nation, I.S.P. 2013. Learning Vocabulary in Another Language 2nd edition. (Cambridge: Cambridge University Press) p. 21 for statistical tables. See also Nation, P. & R. Waring 1997. ‘Vocabulary size, text coverage and word lists’ in Schmitt & McCarthy (eds.) 1997. Vocabulary: Description, Acquisition and Pedagogy. (Cambridge: Cambridge University Press) pp. 6 -19

[2] See Kelly, L.G. 1969. 25 Centuries of Language Teaching. (Rowley, Mass.: Rowley House) p.206 for a discussion of West’s ideas.

[3] Kelly, L.G. 1969. 25 Centuries of Language Teaching. (Rowley, Mass.: Rowley House) p. 184

[4] See Timmis, I. 2015. Corpus Linguistics for ELT (Abingdon: Routledge) for practical advice on doing this.

[5] Nation, I.S.P. 2016. Making and Using Word Lists for Language Learning and Testing. (Amsterdam: John Benjamins) p.58

[6] Taylor, J.R. 2012. The Mental Corpus. (Oxford: Oxford University Press) p.151

[7] For a detailed critique of the limitations of using the CLC as a guide to syllabus design and textbook development, see Swan, M. 2014. ‘A Review of English Profile Studies’ ELTJ 68/1: 89-96

In a recent interesting post on eltjam, Cleve Miller wrote the following

Knewton asks its publishing partners to organize their courses into a “knowledge graph” where content is mapped to an analyzable form that consists of the smallest meaningful chunks (called “concepts”), organized as prerequisites to specific learning goals. You can see here the influence of general learning theory and not SLA/ELT, but let’s not concern ourselves with nomenclature and just call their “knowledge graph” an “acquisition graph”, and call “concepts” anything else at all, say…“items”. Basically our acquisition graph could be something like the CEFR, and the items are the specifications in a completed English Profile project that detail the grammar, lexis, and functions necessary for each of the can-do’s in the CEFR. Now, even though this is a somewhat plausible scenario, it opens Knewton up to several objections, foremost the degree of granularity and linearity.

In this post, Cleve acknowledges that, for the time being, adaptive learning may be best suited to ‘certain self-study material, some online homework, and exam prep – anywhere the language is fairly defined and the content more amenable to algorithmic micro-adaptation.’ I would agree, but its value / usefulness will depend on getting the knowledge graph right.

Which knowledge graph, then? Cleve suggests that it could be something like the CEFR, but it couldn’t be the CEFR itself because it is, quite simply, too vague. This was recognized by Pearson when they developed their Global Scale of English (GSE), an instrument which, they claim, can provide ‘for more granular and detailed measurements of learners’ levels than is possible with the CEFR itself, with its limited number of wide levels’. This Global Scale of English will serve as ‘the metric underlying all Pearson English learning, teaching and assessment products’, including, therefore, the adaptive products under development.

gse2

‘As part of the GSE project, Pearson is creating an associated set of Pearson Syllabuses […]. These will help to link instructional content with assessments and to create a reference for authoring, instruction and testing.’ These syllabuses will contain grammar and vocabulary inventories which ‘will be expressed in the form of can-do statements with suggested sample exponents rather than as the prescriptive lists found in more traditional syllabuses.’ I haven’t been able to get my hands on one of these syllabuses yet: perhaps someone could help me out?

Informal feedback from writer colleagues working for Pearson suggests that, in practice, these inventories are much more prescriptive than Pearson claim, but this is hardly surprising, as the value of an inventory is precisely its more-or-less finite nature.

Until I see more, I will have to limit my observations to two documents in the public domain which are the closest we have to what might become knowledge graphs. The first of these is the British Council / EAQUALS Core Inventory for General EnglishScott Thornbury, back in 2011, very clearly set out the problems with this document and, to my knowledge, the reservations he expressed have not yet been adequately answered. To be fair, this inventory was never meant to be used as a knowledge graph: ‘It is a description, not a prescription’, wrote the author (North, 2010). But presumably a knowledge graph would look much like this, and it would have the same problems. The second place where we can find what a knowledge graph might look like is English Profile and this is mentioned by Cleve. Would English Profile work any better? Possibly not. Michael Swan’s critique of English Profile (ELTJ 68/1 January 2014 pp.89-96) asks some big questions that have yet, to my knowledge, to be answered.

Knewton’s Sally Searby has said that, for ELT, knowledge graphing needs to be ‘much more nuanced’. Her comment suggests a belief that knowledge graphing can be much more nuanced, but this is open to debate. Michael Swan quotes Prodeau, Lopez and Véronique (2012): ‘the sum of pragmatic and linguistic skills needed to achieve communicative success at each level makes it difficult, if not impossible, to find lexical and grammatical means that would characterize only one level’. He observes that ‘the problem may, in fact, simply not be soluble’.

So, what kind of knowledge graph are we likely to see? My best bet is that it would look a bit like a Headway syllabus.

Given what we know, it is possible to make some predictions about what the next generation of adult ELT materials will be like when they emerge a few years from now. Making predictions is always a hazardous game, but there are a number of reasonable certainties that can be identified, based on the statements and claims of the major publishers and software providers.

1 Major publishers will move gradually away from traditional coursebooks (whether in print or ebook format) towards the delivery of learning content on learning platforms. At its most limited, this will be in the form of workbook-style material with an adaptive element. At its most developed, this will be in the form of courses that can be delivered entirely without traditional coursebooks. These will allow teachers or institutions to decide the extent to which they wish to blend online and face-to-face instruction.

2 The adaptive elements of these courses will focus primarily or exclusively on discrete item grammar, vocabulary, functional language and phonology, since these lend themselves most readily to the software. These courses will be targeted mainly at lower level (B1 and below) learners.

3 The methodological approach of these courses will be significantly influenced by the expectations of the markets where they are predicted to be most popular and most profitable: South and Central America, the Arabian Gulf and Asia.

4 These courses will permit multiple modifications to suit local requirements. They will also allow additional content to be uploaded.

5 Assessment will play an important role in the design of all these courses. Things like discrete item grammar, vocabulary, functional language and phonology, which lend themselves most readily to assessment, will be prioritized over language skills, which are harder to assess.

6 The discrete items of language that are presented will be tagged to level descriptors, using scales like the Common European Framework or English Profile.

7 Language skills work will be included, but only in the more sophisticated (and better-funded) projects will these components be closely tied to the adaptive software.

8 Because of technological differences between different parts of the world, adaptive courses will co-exist with closely related, more traditional print (or ebook) courses.

9 Training for teachers (especially concerning blended learning) will become an increasingly important part of the package sold by the major publishers.

10 These courses will be more than ever driven by the publishers’ perceptions of what the market wants. There will be a concomitant decrease in the extent to which individual authors, or author teams, influence the material.

knewton-lg