Chatbots

Posted: December 1, 2016 in ed tech
Tags: , , , , , , , ,

 

Here’s a conversation I had recently with Cleverbot.

cleverbot

User (me): Hi Cleverbot. What are your views on the future of chatbots in language learning?

Cleverbot: Are you talking to yourself?

User: No, I’m asking you a question.

Cleverbot: Then what’s the question again? I forgot.

User: What are your views on the future of chatbots in language learning?

Cleverbot: Don’t you want me to finish my story?

User: No, I want you to answer my question.

Cleverbot: What is your question?

User: Do you think there is a future for chatbots in language education?

Cleverbot: Sure, if you are talking about the moon.

aliceI also tried to have a conversation with Alice, the avatar from EFL Classroom Bot, listed by Larry Ferlazzo as one of ‘the best online chatbots for practicing English’. I didn’t get any more sense out of her than out of Cleverbot.

Chatbots, apparently, are the next big thing. Again. David Mattin, head of trends and insights at trendwatching.com, writes (in the September 2016 issue of ‘Business Life’) that ‘the chatbot revolution is coming’ and that chatbots are a step towards the dream of an interface between user and technology that is so intuitive that the interface ‘simply fades away’. Chatbots have been around for some time. Remember Clippy – the Microsoft Office bot in the late 1990s – which you had to disable in order to stop yourself punching your computer screen? Since then, bots have become ubiquitous. There have been problems, such as Microsoft’s Tay bot that had to be taken down after sixteen hours earlier this year, when, after interacting with other Twitter users, it developed into an abusive Nazi. But chatbots aren’t going away and you’ve probably interacted with one to book a taxi, order food or attempt to talk to your bank. In September this year, the Guardian described them as ‘the talk of the town’ and ‘hot property in Silicon Valley’.

The real interest in chatbots is not, however, in the ‘exciting interface’ possibilities (both user interface and user experience remain pretty crude), but in the way that they are leaner, sit comfortably with the things we actually do on a phone and the fact that they offer a way of cutting out the high fees that developers have to pay to app stores . After so many start-up failures, chatbots offer a glimmer of financial hope to developers.

It’s no surprise, of course, to find the world of English language teaching beginning to sit up and take notice of this technology. A 2012 article by Ben Lehtinen in PeerSpectives enthuses about the possibilities in English language learning and reports the positive feedback of the author’s own students. ELTJam, so often so quick off the mark, developed an ELT Bot over the course of a hackathon weekend in March this year. Disappointingly, it wasn’t really a bot – more a case of humans pretending to be a bot pretending to be humans – but it probably served its exploratory purpose. duolingoAnd a few months ago Duolingo began incorporating bots. These are currently only available for French, Spanish and German learners in the iPhone app, so I haven’t been able to try it out and evaluate it. According to an infomercial in TechCrunch, ‘to make talking to the bots a bit more compelling, the company tried to give its different bots a bit of personality. There’s Chef Robert, Renee the Driver and Officer Ada, for example. They will react differently to your answers (and correct you as necessary), but for the most part, the idea here is to mimic a real conversation. These bots also allow for a degree of flexibility in your answers that most language-learning software simply isn’t designed for. There are plenty of ways to greet somebody, for example, but most services will often only accept a single answer. When you’re totally stumped for words, though, Duolingo offers a ‘help my reply’ button with a few suggested answers.’ In the last twelve months or so, Duolingo has considerably improved its ability to recognize multiple correct ways of expressing a particular idea, and its ability to recognise alternative answers to its translation tasks. However, I’m highly sceptical about its ability to mimic a real conversation any better than Cleverbot or Alice the EFL Bot, or its ability to provide systematically useful corrections.

My reasons lie in the current limitations of AI and NLP (Natural Language Processing). In a nutshell, we simply don’t know how to build a machine that can truly understand human language. Limited exchanges in restricted domains can be done pretty well (such as the early chatbot that did a good job of simulating an encounter with an evasive therapist, or, more recently ordering a taco and having a meaningless, but flirty conversation with a bot), but despite recent advances in semantic computing, we’re a long way from anything that can mimic a real conversation. As Audrey Watters puts it, we’re not even close.

When it comes to identifying language errors made by language learners, we’re not really much better off. Apps like Grammarly are not bad at identifying grammatical errors (but not good enough to be reliable), but pretty hopeless at dealing with lexical appropriacy. Much more reliable feedback to learners can be offered when the software is trained on particular topics and text types. Write & Improve does this with a relatively small selection of Cambridge English examination tasks, but a free conversation ….? Forget it.

So, how might chatbots be incorporated into language teaching / learning? A blog post from December 2015 entitled AI-powered chatbots and the future of language learning suggests one plausible possibility. Using an existing messenger service, such as WhatsApp or Telegram, an adaptive chatbot would send tasks (such as participation in a conversation thread with a predetermined topic, register, etc., or pronunciation practice or translation exercises) to a learner, provide feedback and record the work for later recycling. At the same time, the bot could send out reminders of work that needs to be done or administrative tasks that must be completed.

Kat Robb has written a very practical article about using instant messaging in English language classrooms. Her ideas are interesting (although I find the idea of students in a F2F classroom messaging each other slightly bizarre) and it’s easy to imagine ways in which her activities might be augmented with chatbot interventions. The Write & Improve app, mentioned above, could deploy a chatbot interface to give feedback instead of the flat (and, in my opinion, perfectly adequate) pop-up boxes currently in use. Come to think of it, more or less any digital language learning tool could be pimped up with a bot. Countless revisions can be envisioned.

But the overwhelming question is: would it be worth it? Bots are not likely, any time soon, to revolutionise language learning. What they might just do, however, is help to further reduce language teaching to a series of ‘mechanical and scripted gestures’. More certain is that a lot of money will be thrown down the post-truth edtech drain. Then, in the not too distant future, this latest piece of edtech will fall into the trough of disillusionment, to be replaced by the latest latest thing.

 

 

In December last year, I posted a wish list for vocabulary (flashcard) apps. At the time, I hadn’t read a couple of key research texts on the subject. It’s time for an update.

First off, there’s an article called ‘Intentional Vocabulary Learning Using Digital Flashcards’ by Hsiu-Ting Hung. It’s available online here. Given the lack of empirical research into the use of digital flashcards, it’s an important article and well worth a read. Its basic conclusion is that digital flashcards are more effective as a learning tool than printed word lists. No great surprises there, but of more interest, perhaps, are the recommendations that (1) ‘students should be educated about the effective use of flashcards (e.g. the amount and timing of practice), and this can be implemented through explicit strategy instruction in regular language courses or additional study skills workshops ‘ (Hung, 2015: 111), and (2) that digital flashcards can be usefully ‘repurposed for collaborative learning tasks’ (Hung, ibid.).

nakataHowever, what really grabbed my attention was an article by Tatsuya Nakata. Nakata’s research is of particular interest to anyone interested in vocabulary learning, but especially so to those with an interest in digital possibilities. A number of his research articles can be freely accessed via his page at ResearchGate, but the one I am interested in is called ‘Computer-assisted second language vocabulary learning in a paired-associate paradigm: a critical investigation of flashcard software’. Don’t let the title put you off. It’s a review of a pile of web-based flashcard programs: since the article is already five years old, many of the programs have either changed or disappeared, but the critical approach he takes is more or less as valid now as it was then (whether we’re talking about web-based stuff or apps).

Nakata divides his evaluation for criteria into two broad groups.

Flashcard creation and editing

(1) Flashcard creation: Can learners create their own flashcards?

(2) Multilingual support: Can the target words and their translations be created in any language?

(3) Multi-word units: Can flashcards be created for multi-word units as well as single words?

(4) Types of information: Can various kinds of information be added to flashcards besides the word meanings (e.g. parts of speech, contexts, or audios)?

(5) Support for data entry: Does the software support data entry by automatically supplying information about lexical items such as meaning, parts of speech, contexts, or frequency information from an internal database or external resources?

(6) Flashcard set: Does the software allow learners to create their own sets of flashcards?

Learning

(1) Presentation mode: Does the software have a presentation mode, where new items are introduced and learners familiarise themselves with them?

(2) Retrieval mode: Does the software have a retrieval mode, which asks learners to recall or choose the L2 word form or its meaning?

(3) Receptive recall: Does the software ask learners to produce the meanings of target words?

(4) Receptive recognition: Does the software ask learners to choose the meanings of target words?

(5) Productive recall: Does the software ask learners to produce the target word forms corresponding to the meanings provided?

(6) Productive recognition: Does the software ask learners to choose the target word forms corresponding to the meanings provided?

(7) Increasing retrieval effort: For a given item, does the software arrange exercises in the order of increasing difficulty?

(8) Generative use: Does the software encourage generative use of words, where learners encounter or use previously met words in novel contexts?

(9) Block size: Can the number of words studied in one learning session be controlled and altered?

(10) Adaptive sequencing: Does the software change the sequencing of items based on learners’ previous performance on individual items?

(11) Expanded rehearsal: Does the software help implement expanded rehearsal, where the intervals between study trials are gradually increased as learning proceeds? (Nakata, T. (2011): ‘Computer-assisted second language vocabulary learning in a paired-associate paradigm: a critical investigation of flashcard software’ Computer Assisted Language Learning, 24:1, 17-38)

It’s a rather different list from my own (there’s nothing I would disagree with here), because mine is more general and his is exclusively oriented towards learning principles. Nakata makes the point towards the end of the article that it would ‘be useful to investigate learners’ reactions to computer-based flashcards to examine whether they accept flashcard programs developed according to learning principles’ (p. 34). It’s far from clear, he points out, that conformity to learning principles are at the top of learners’ agendas. More than just users’ feelings about computer-based flashcards in general, a key concern will be the fact that there are ‘large individual differences in learners’ perceptions of [any flashcard] program’ (Nakata, N. 2008. ‘English vocabulary learning with word lists, word cards and computers: implications from cognitive psychology research for optimal spaced learning’ ReCALL 20(1), p. 18).

I was trying to make a similar point in another post about motivation and vocabulary apps. In the end, as with any language learning material, research-driven language learning principles can only take us so far. User experience is a far more difficult creature to pin down or to make generalisations about. A user’s reaction to graphics, gamification, uploading time and so on are so powerful and so subjective that learning principles will inevitably play second fiddle. That’s not to say, of course, that Nakata’s questions are not important: it’s merely to wonder whether the bigger question is truly answerable.

Nakata’s research identifies plenty of room for improvement in digital flashcards, and although the article is now quite old, not a lot had changed. Key areas to work on are (1) the provision of generative use of target words, (2) the need to increase retrieval effort, (3) the automatic provision of information about meaning, parts of speech, or contexts (in order to facilitate flashcard creation), and (4) the automatic generation of multiple-choice distractors.

In the conclusion of his study, he identifies one flashcard program which is better than all the others. Unsurprisingly, five years down the line, the software he identifies is no longer free, others have changed more rapidly in the intervening period, and who knows will be out in front next week?

 

About two and a half years ago when I started writing this blog, there was a lot of hype around adaptive learning and the big data which might drive it. Two and a half years are a long time in technology. A look at Google Trends suggests that interest in adaptive learning has been pretty static for the last couple of years. It’s interesting to note that 3 of the 7 lettered points on this graph are Knewton-related media events (including the most recent, A, which is Knewton’s latest deal with Hachette) and 2 of them concern McGraw-Hill. It would be interesting to know whether these companies follow both parts of Simon Cowell’s dictum of ‘Create the hype, but don’t ever believe it’.

Google_trends

A look at the Hype Cycle (see here for Wikipedia’s entry on the topic and for criticism of the hype of Hype Cycles) of the IT research and advisory firm, Gartner, indicates that both big data and adaptive learning have now slid into the ‘trough of disillusionment’, which means that the market has started to mature, becoming more realistic about how useful the technologies can be for organizations.

A few years ago, the Gates Foundation, one of the leading cheerleaders and financial promoters of adaptive learning, launched its Adaptive Learning Market Acceleration Program (ALMAP) to ‘advance evidence-based understanding of how adaptive learning technologies could improve opportunities for low-income adults to learn and to complete postsecondary credentials’. It’s striking that the program’s aims referred to how such technologies could lead to learning gains, not whether they would. Now, though, with the publication of a report commissioned by the Gates Foundation to analyze the data coming out of the ALMAP Program, things are looking less rosy. The report is inconclusive. There is no firm evidence that adaptive learning systems are leading to better course grades or course completion. ‘The ultimate goal – better student outcomes at lower cost – remains elusive’, the report concludes. Rahim Rajan, a senior program office for Gates, is clear: ‘There is no magical silver bullet here.’

The same conclusion is being reached elsewhere. A report for the National Education Policy Center (in Boulder, Colorado) concludes: Personalized Instruction, in all its many forms, does not seem to be the transformational technology that is needed, however. After more than 30 years, Personalized Instruction is still producing incremental change. The outcomes of large-scale studies and meta-analyses, to the extent they tell us anything useful at all, show mixed results ranging from modest impacts to no impact. Additionally, one must remember that the modest impacts we see in these meta-analyses are coming from blended instruction, which raises the cost of education rather than reducing it (Enyedy, 2014: 15 -see reference at the foot of this post). In the same vein, a recent academic study by Meg Coffin Murray and Jorge Pérez (2015, ‘Informing and Performing: A Study Comparing Adaptive Learning to Traditional Learning’) found that ‘adaptive learning systems have negligible impact on learning outcomes’.

future-ready-learning-reimagining-the-role-of-technology-in-education-1-638In the latest educational technology plan from the U.S. Department of Education (‘Future Ready Learning: Reimagining the Role of Technology in Education’, 2016) the only mentions of the word ‘adaptive’ are in the context of testing. And the latest OECD report on ‘Students, Computers and Learning: Making the Connection’ (2015), finds, more generally, that information and communication technologies, when they are used in the classroom, have, at best, a mixed impact on student performance.

There is, however, too much money at stake for the earlier hype to disappear completely. Sponsored cheerleading for adaptive systems continues to find its way into blogs and national magazines and newspapers. EdSurge, for example, recently published a report called ‘Decoding Adaptive’ (2016), sponsored by Pearson, that continues to wave the flag. Enthusiastic anecdotes take the place of evidence, but, for all that, it’s a useful read.

In the world of ELT, there are plenty of sales people who want new products which they can call ‘adaptive’ (and gamified, too, please). But it’s striking that three years after I started following the hype, such products are rather thin on the ground. Pearson was the first of the big names in ELT to do a deal with Knewton, and invested heavily in the company. Their relationship remains close. But, to the best of my knowledge, the only truly adaptive ELT product that Pearson offers is the PTE test.

Macmillan signed a contract with Knewton in May 2013 ‘to provide personalized grammar and vocabulary lessons, exam reviews, and supplementary materials for each student’. In December of that year, they talked up their new ‘big tree online learning platform’: ‘Look out for the Big Tree logo over the coming year for more information as to how we are using our partnership with Knewton to move forward in the Language Learning division and create content that is tailored to students’ needs and reactive to their progress.’ I’ve been looking out, but it’s all gone rather quiet on the adaptive / platform front.

In September 2013, it was the turn of Cambridge to sign a deal with Knewton ‘to create personalized learning experiences in its industry-leading ELT digital products for students worldwide’. This year saw the launch of a major new CUP series, ‘Empower’. It has an online workbook with personalized extra practice, but there’s nothing (yet) that anyone would call adaptive. More recently, Cambridge has launched the online version of the 2nd edition of Touchstone. Nothing adaptive there, either.

Earlier this year, Cambridge published The Cambridge Guide to Blended Learning for Language Teaching, edited by Mike McCarthy. It contains a chapter by M.O.Z. San Pedro and R. Baker on ‘Adaptive Learning’. It’s an enthusiastic account of the potential of adaptive learning, but it doesn’t contain a single reference to language learning or ELT!

So, what’s going on? Skepticism is becoming the order of the day. The early hype of people like Knewton’s Jose Ferreira is now understood for what it was. Companies like Macmillan got their fingers badly burnt when they barked up the wrong tree with their ‘Big Tree’ platform.

Noel Enyedy captures a more contemporary understanding when he writes: Personalized Instruction is based on the metaphor of personal desktop computers—the technology of the 80s and 90s. Today’s technology is not just personal but mobile, social, and networked. The flexibility and social nature of how technology infuses other aspects of our lives is not captured by the model of Personalized Instruction, which focuses on the isolated individual’s personal path to a fixed end-point. To truly harness the power of modern technology, we need a new vision for educational technology (Enyedy, 2014: 16).

Adaptive solutions aren’t going away, but there is now a much better understanding of what sorts of problems might have adaptive solutions. Testing is certainly one. As the educational technology plan from the U.S. Department of Education (‘Future Ready Learning: Re-imagining the Role of Technology in Education’, 2016) puts it: Computer adaptive testing, which uses algorithms to adjust the difficulty of questions throughout an assessment on the basis of a student’s responses, has facilitated the ability of assessments to estimate accurately what students know and can do across the curriculum in a shorter testing session than would otherwise be necessary. In ELT, Pearson and EF have adaptive tests that have been well researched and designed.

Vocabulary apps which deploy adaptive technology continue to become more sophisticated, although empirical research is lacking. Automated writing tutors with adaptive corrective feedback are also developing fast, and I’ll be writing a post about these soon. Similarly, as speech recognition software improves, we can expect to see better and better automated adaptive pronunciation tutors. But going beyond such applications, there are bigger questions to ask, and answers to these will impact on whatever direction adaptive technologies take. Large platforms (LMSs), with or without adaptive software, are already beginning to look rather dated. Will they be replaced by integrated apps, or are apps themselves going to be replaced by bots (currently riding high in the Hype Cycle)? In language learning and teaching, the future of bots is likely to be shaped by developments in natural language processing (another topic about which I’ll be blogging soon). Nobody really has a clue where the next two and a half years will take us (if anywhere), but it’s becoming increasingly likely that adaptive learning will be only one very small part of it.

 

Enyedy, N. 2014. Personalized Instruction: New Interest, Old Rhetoric, Limited Results, and the Need for a New Direction for Computer-Mediated Learning. Boulder, CO: National Education Policy Center. Retrieved 17.07.16 from http://nepc.colorado.edu/publication/personalized-instruction

Ok, let’s be honest here. This post is about teacher training, but ‘development’ sounds more respectful, more humane, more modern. Teacher development (self-initiated, self-evaluated, collaborative and holistic) could be adaptive, but it’s unlikely that anyone will want to spend the money on developing an adaptive teacher development platform any time soon. Teacher training (top-down, pre-determined syllabus and externally evaluated) is another matter. If you’re not too clear about this distinction, see Penny Ur’s article in The Language Teacher.

decoding_adaptive jpgThe main point of adaptive learning tools is to facilitate differentiated instruction. They are, as Pearson’s latest infomercial booklet describes them, ‘educational technologies that can respond to a student’s interactions in real-time by automatically providing the student with individual support’. Differentiation or personalization (or whatever you call it) is, as I’ve written before  , the declared goal of almost everyone in educational power these days. What exactly it is may be open to question (see Michael Feldstein’s excellent article), as may be the question of whether or not it is actually such a desideratum (see, for example, this article ). But, for the sake of argument, let’s agree that it’s mostly better than one-size-fits-all.

Teachers around the world are being encouraged to adopt a differentiated approach with their students, and they are being encouraged to use technology to do so. It is technology that can help create ‘robust personalized learning environments’ (says the White House)  . Differentiation for language learners could be facilitated by ‘social networking systems, podcasts, wikis, blogs, encyclopedias, online dictionaries, webinars, online English courses,’ etc. (see Alexandra Chistyakova’s post on eltdiary ).

But here’s the crux. If we want teachers to adopt a differentiated approach, they really need to have experienced it themselves in their training. An interesting post on edweek  sums this up: If professional development is supposed to lead to better pedagogy that will improve student learning AND we are all in agreement that modeling behaviors is the best way to show people how to do something, THEN why not ensure all professional learning opportunities exhibit the qualities we want classroom teachers to have?

Differentiated teacher development / training is rare. According to the Center for Public Education’s Teaching the Teachers report , almost all teachers participate in ‘professional development’ (PD) throughout the year. However, a majority of those teachers find the PD in which they participate ineffective. Typically, the development is characterised by ‘drive-by’ workshops, one-size-fits-all presentations, ‘been there, done that’ topics, little or no modelling of what is being taught, a focus on rotating fads and a lack of follow-up. This report is not specifically about English language teachers, but it will resonate with many who are working in English language teaching around the world.cindy strickland

The promotion of differentiated teacher development is gaining traction: see here or here , for example, or read Cindy A. Strickland’s ‘Professional Development for Differentiating Instruction’.

Remember, though, that it’s really training, rather than development, that we’re talking about. After all, if one of the objectives is to equip teachers with a skills set that will enable them to become more effective instructors of differentiated learning, this is most definitely ‘training’ (notice the transitivity of the verbs ‘enable’ and ‘equip’!). In this context, a necessary starting point will be some sort of ‘knowledge graph’ (which I’ve written about here ). For language teachers, these already exist, including the European Profiling Grid , the Eaquals Framework for Language Teacher Training and Development, the Cambridge English Teaching Framework and the British Council’s Continuing Professional Development Framework (CPD) for Teachers  . We can expect these to become more refined and more granularised, and a partial move in this direction is the Cambridge English Digital Framework for Teachers  . Once a knowledge graph is in place, the next step will be to tag particular pieces of teacher training content (e.g. webinars, tasks, readings, etc.) to locations in the framework that is being used. It would not be too complicated to engineer dynamic frameworks which could be adapted to individual or institutional needs.cambridge_english_teaching_framework jpg

This process will be facilitated by the fact that teacher training content is already being increasingly granularised. Whether it’s an MA in TESOL or a shorter, more practically oriented course, things are getting more and more bite-sized, with credits being awarded to these short bites, as course providers face stiffer competition and respond to market demands.

Visible classroom home_page_screenshotClassroom practice could also form part of such an adaptive system. One tool that could be deployed would be Visible Classroom , an automated system for providing real-time evaluative feedback for teachers. There is an ‘online dashboard providing teachers with visual information about their teaching for each lesson in real-time. This includes proportion of teacher talk to student talk, number and type of questions, and their talking speed.’ John Hattie, who is behind this project, says that teachers ‘account for about 30% of the variance in student achievement and [are] the largest influence outside of individual student effort.’ Teacher development with a tool like Visible Classroom is ultimately all about measuring teacher performance (against a set of best-practice benchmarks identified by Hattie’s research) in order to improve the learning outcomes of the students.Visible_classroom_panel_image jpg

You may have noticed the direction in which this part of this blog post is going. I began by talking about social networking systems, podcasts, wikis, blogs and so on, and just now I’ve mentioned the summative, credit-bearing possibilities of an adaptive teacher development training programme. It’s a tension that is difficult to resolve. There’s always a paradox in telling anyone that they are going to embark on a self-directed course of professional development. Whoever pays the piper calls the tune and, if an institution decides that it is worth investing significant amounts of money in teacher development, they will want a return for their money. The need for truly personalised teacher development is likely to be overridden by the more pressing need for accountability, which, in turn, typically presupposes pre-determined course outcomes, which can be measured in some way … so that quality (and cost-effectiveness and so on) can be evaluated.

Finally, it’s worth asking if language teaching (any more than language learning) can be broken down into small parts that can be synthesized later into a meaningful and valuable whole. Certainly, there are some aspects of language teaching (such as the ability to use a dashboard on an LMS) which lend themselves to granularisation. But there’s a real danger of losing sight of the forest of teaching if we focus on the individual trees that can be studied and measured.

I have been putting in a lot of time studying German vocabulary with Memrise lately, but this is not a review of the Memrise app. For that, I recommend you read Marek Kiczkowiak’s second post on this app. Like me, he’s largely positive, although I am less enthusiastic about Memrise’s USP, the use of mnemonics. It’s not that mnemonics don’t work – there’s a lot of evidence that they do: it’s just that there is little or no evidence that they’re worth the investment of time.

Time … as I say, I have been putting in the hours. Every day, for over a month, averaging a couple of hours a day, it’s enough to get me very near the top of the leader board (which I keep a very close eye on) and it means that I am doing more work than 99% of other users. And, yes, my German is improving.

Putting in the time is the sine qua non of any language learning and a well-designed app must motivate users to do this. Relevant content will be crucial, as will satisfactory design, both visual and interactive. But here I’d like to focus on the two other key elements: task design / variety and gamification.

Memrise offers a limited range of task types: presentation cards (with word, phrase or sentence with translation and audio recording), multiple choice (target item with four choices), unscrambling letters or words, and dictation (see below).

Screenshot_2016-05-24-08-10-42Screenshot_2016-05-24-08-10-57Screenshot_2016-05-24-08-11-24Screenshot_2016-05-24-08-11-45Screenshot_2016-05-24-08-12-51Screenshot_2016-05-24-08-13-44

As Marek writes, it does get a bit repetitive after a while (although less so than thumbing through a pack of cardboard flashcards). The real problem, though, is that there are only so many things an app designer can do with standard flashcards, if they are to contribute to learning. True, there could be a few more game-like tasks (as with Quizlet), races against the clock as you pop word balloons or something of the sort, but, while these might, just might, help with motivation, these games rarely, if ever, contribute much to learning.

What’s more, you’ll get fed up with the games sooner or later if you’re putting in serious study hours. Even if Memrise were to double the number of activity types, I’d have got bored with them by now, in the same way I got bored with the Quizlet games. Bear in mind, too, that I’ve only done a month: I have at least another two months to go before I finish the level I’m working on. There’s another issue with ‘fun’ activities / games which I’ll come on to later.

The options for task variety in vocabulary / memory apps are therefore limited. Let’s look at gamification. Memrise has leader boards (weekly, monthly, ‘all time’), streak badges, daily goals, email reminders and (in the laptop and premium versions) a variety of graphs that allow you to analyse your study patterns. Your degree of mastery of learning items is represented by a growing flower that grows leaves, flowers and withers. None of this is especially original or different from similar apps.

Screenshot_2016-05-24-19-17-14The trouble with all of this is that it can only work for a certain time and, for some people, never. There’s always going to be someone like me who can put in a couple of hours a day more than you can. Or someone, in my case, like ‘Nguyenduyha’, who must be doing about four hours a day, and who, I know, is out of my league. I can’t compete and the realisation slowly dawns that my life would be immeasurably sadder if I tried to.

Having said that, I have tried to compete and the way to do so is by putting in the time on the ‘speed review’. This is the closest that Memrise comes to a game. One hundred items are flashed up with four multiple choices and these are against the clock. The quicker you are, the more points you get, and if you’re too slow, or you make a mistake, you lose a life. That’s how you gain lots of points with Memrise. The problem is that, at best, this task only promotes receptive knowledge of the items, which is not what I need by this stage. At worst, it serves no useful learning function at all because I have learnt ways of doing this well which do not really involve me processing meaning at all. As Marek says in his post (in reference to Quizlet), ‘I had the feeling that sometimes I was paying more attention to ‘winning’ the game and scoring points, rather than to the words on the screen.’ In my case, it is not just a feeling: it’s an absolute certainty.

desktop_dashboard

Sadly, the gamification is working against me. The more time I spend on the U-Bahn doing Memrise, the less time I spend reading the free German-language newspapers, the less time I spend eavesdropping on conversations. Two hours a day is all I have time for for my German study, and Memrise is eating it all up. I know that there are other, and better, ways of learning. In order to do what I know I should be doing, I need to ignore the gamification. For those, more reasonable, students, who can regularly do their fifteen minutes a day, day in – day out, the points and leader boards serve no real function at all.

Cheating at gamification, or gaming the system, is common in app-land. A few years ago, Memrise had to take down their leader board when they realised that cheating was taking place. There’s an inexorable logic to this: gamification is an attempt to motivate by rewarding through points, rather than the reward coming from the learning experience. The logic of the game overtakes itself. Is ‘Nguyenduyha’ cheating, or do they simply have nothing else to do all day? Am I cheating by finding time to do pointless ‘speed reviews’ that earn me lots of points?

For users like myself, then, gamification design needs to be a delicate balancing act. For others, it may be largely an irrelevance. I’ve been working recently on a general model of vocabulary app design that looks at two very different kinds of user. On the one hand, there are the self-motivated learners like myself or the millions of other who have chosen to use self-study apps. On the other, there are the millions of students in schools and colleges, studying English among other subjects, some of whom are now being told to use the vocabulary apps that are beginning to appear packaged with their coursebooks (or other learning material). We’ve never found entirely satisfactory ways of making these students do their homework, and the fact that this homework is now digital will change nothing (except, perhaps, in the very, very short term). The incorporation of games and gamification is unlikely to change much either: there will always be something more interesting and motivating (and unconnected with language learning) elsewhere.

Teachers and college principals may like the idea of gamification (without having really experienced it themselves) for their students. But more important for most of them is likely to be the teacher dashboard: the means by which they can check that their students are putting the time in. Likewise, they will see the utility of automated email reminders that a student is not working hard enough to meet their learning objectives, more and more regular tests that contribute to overall course evaluation, comparisons with college, regional or national benchmarks. Technology won’t solve the motivation issue, but it does offer efficient means of control.

If you’re going to teach vocabulary, you need to organise it in some way. Almost invariably, this organisation is topical, with words grouped into what are called semantic sets. In coursebooks, the example below (from Rogers, M., Taylore-Knowles, J. & S. Taylor-Knowles. 2010. Open Mind Level 1. London: Macmillan, p.68) is fairly typical.

open mind

Coursebooks are almost always organised in a topical way. The example above comes in a unit (of 10 pages), entitled ‘You have talent!’, which contains two main vocabulary sections. It’s unsurprising to find a section called ‘personality adjectives’ in such a unit. What’s more, such an approach lends itself to the requisite, but largely, spurious ‘can-do’ statement in the self-evaluation section: I can talk about people’s positive qualities. We must have clearly identifiable learning outcomes, after all.

There is, undeniably, a certain intuitive logic in this approach. An alternative might entail a radical overhaul of coursebook architecture – this might not be such a bad thing, but might not go down too well in the markets. How else, after all, could the vocabulary strand of the syllabus be organised?

Well, there are a number of ways in which a vocabulary syllabus could be organised. Including the standard approach described above, here are four possibilities:

1 semantic sets (e.g. bee, butterfly, fly, mosquito, etc.)

2 thematic sets (e.g. ‘pets’: cat, hate, flea, feed, scratch, etc.)

3 unrelated sets

4 sets determined by a group of words’ occurrence in a particular text

Before reading further, you might like to guess what research has to say about the relative effectiveness of these four approaches.

The answer depends, to some extent, on the level of the learner. For advanced learners, it appears to make no, or little, difference (Al-Jabri, 2005, cited by Ellis & Shintani, 2014: 106). But, for the vast majority of English language learners (i.e. those at or below B2 level), the research is clear: the most effective way of organising vocabulary items to be learnt is by grouping them into thematic sets (2) or by mixing words together in a semantically unrelated way (3) – not by teaching sets like ‘personality adjectives’. It is surprising how surprising this finding is to so many teachers and materials writers. It goes back at least to 1988 and West’s article on ‘Catenizing’ in ELTJ, which argued that semantic grouping made little sense from a psycho-linguistic perspective. Since then, a large amount of research has taken place. This is succinctly summarised by Paul Nation (2013: 128) in the following terms: Avoid interference from related words. Words which are similar in form (Laufer, 1989) or meaning (Higa, 1963; Nation, 2000; Tinkham, 1993; Tinkham, 1997; Waring, 1997) are more difficult to learn together than they are to learn separately. For anyone who is interested, the most up-to-date review of this research that I can find is in chapter 11 of Barcroft (2105).

The message is clear. So clear that you have to wonder how it is not getting through to materials designers. Perhaps, coursebooks are different. They regularly eschew research findings for commercial reasons. But vocabulary apps? There is rarely, if ever, any pressure on the content-creation side of vocabulary apps (except those that are tied to coursebooks) to follow the popular misconceptions that characterise so many coursebooks. It wouldn’t be too hard to organise vocabulary into thematic sets (like, for example, the approach in the A2 level of Memrise German that I’m currently using). Is it simply because the developers of so many vocabulary apps just don’t know much about language learning?

References

Barcroft, J. 2015. Lexical Input Processing and Vocabulary Learning. Amsterdam: John Benjamins

Nation, I. S. P. 2013. Learning Vocabulary in Another Language 2nd edition. Cambridge: Cambridge University Press

Ellis, R. & N. Shintani, N. 2014. Exploring Language Pedagogy through Second Language Acquisition Research. Abingdon, Oxon: Routledge

West, M. 1988. ‘Catenizing’ English Language Teaching Journal 6: 147 – 151

It’s practically impossible to keep up to date with all the new language learning tools that appear, even with the help of curated lists like Nik Peachey’s Scoop.it! (which is one of the most useful I know of). The trouble with such lists is that they are invariably positive, but when you actually find the time to look at the product, you often wish you hadn’t. I decided to save time for people like me by occasionally writing short posts about things that you can safely forget about. This is the first.

Nik’s take on Vocabulist was this:

Nik_Peachey

It sounds useful,  but for anyone involved in language teaching or learning, there is, unfortunately, nothing remotely useful about this tool.

Here’s how it works:

Vocabulist is super easy to use!

Here’s how:

1.Upload a Word, PDF, or Text document. You could also copy and paste text.

2.Wait a minute. Feel free to check Facebook while Vocabulist does some thinking.

3.Select the words that you want, confirm spelling, and confirm the correct definition.

4.All Done! Now print it, export it, and study it.

To try it out, I copied and pasted the text above. This is what you get for the first two lines:

vocabulist

The definitions are taken from Merriam-Webster. You scroll down until you find the definition for the best fit, and you can then save the list as a pdf or export it to Quizlet.

export

For language learners, there are far too many definitions to choose from. For ‘super’, for example, there are 24 definitions and, because they are from Merriam-Webster, they are all harder than the word being defined.

The idea behind Vocabulist could be adapted for language learners if there was a selection of dictionary resources that users could choose from (a selection of good bilingual or semi-bilingual dictionaries and a good monolingual learner’s dictionary). But, as it stands, here’s an app you can forget.

Screenshot_2016-04-29-09-48-05I call Lern Deutsch a vocabulary app, although it’s more of a game than anything else. Developed by the Goethe Institute, the free app was probably designed primarily as a marketing tool rather than a serious attempt to develop an educational language app. It’s available for speakers of Arabic, English, Spanish, Italian, French, Italian, Portuguese and Russian. It’s aimed at A1 learners.

Users of the app create an avatar and roam around a virtual city, learning new vocabulary and practising situational language. They can interact in language challenges with other players. As they explore, they earn Goethe coins, collect accessories for their avatars and progress up a leader board.Screenshot_2016-04-29-09-50-12

As they explore the virtual city, populated by other avatars, they find objects that can be clicked on to add to their vocabulary list. They hear a recording of an example sentence containing the target word, with the word gapped and three multiple choice possibilities. They are then required to type the missing word (see the image below). After collecting a certain number of words, they complete exercises which include the following task types:

  • Jumbled sentences
  • Audio recording of individual words and multiple choice selection
  • Gapped sentences with multiple choice answers
  • Dictation
  • Example sentences containing target item and multiple choice pictures
  • Typing sentences which are buried in a string of random letters

Screenshot_2016-05-02-14-23-07Screenshot_2016-05-02-14-26-13

Screenshot_2016-05-02-14-27-21Screenshot_2016-05-02-14-31-49

 

 

 

 

 

 

 

 

 

The developers have focused their attention on providing variety: engagement and ‘fun’ override other considerations. But how does the app stand up as a language learning tool? Surprisingly, for something developed by the Goethe Institute, it’s less than impressive.

The words that you collect as you navigate the virtual city are all nouns (Hotel, Auto, Mann, Banane, etc), but some (e.g. Sehenswurdigkeit) seem out of level. Any app that uses illustrations as the basic means of conveying meaning runs into problems when it moves away from concrete nouns, but a diet of nouns only (as here) is of necessarily limited value. Other parts of speech are introduced via the example sentences, but no help with meaning is provided so when you come across the word for ‘egg’, for example, your example sentence is ‘Ich möchte das Frühstück mit Ei.’ It’s all very well embedding the target vocabulary in example sentences that have a functional value, but example sentences are only of value if they are understandable: the app badly needs a look-up function for the surrounding language.

The practice exercises are varied, too, but they also vary in their level of difficulty. It makes sense to do receptive / recognition tasks before productive ones, but there is no evidence that I could see of pedagogical considerations of this kind. Neither does there seem to be any spaced repetition at work: the app is driven by the needs of the game design rather than any learning principles.

It’s unclear to me who the app is for. The functional language that is presented is adult: the situations are adult situations (buying a bed, booking a hotel room, ordering a beer). However, the graphic design and the gamification features are juvenile (adding a pirate patch to your avatar, for example).

The lack of attention to the business of learning is especially striking in the English of the English language version that I used. The number of examples of dodgy English that I came across do not inspire confidence.

  • Quite alright! You win your first Goethe coin.
  • What sightseeings do you spot in the city center and the train station?
  • Have a picknick in the park. You now have a picnic in the park with the musician.
  • You still search for your teacher. Whom do you meet in the park? What do they work?

 

All in all, it’s an interesting example of a gamified approach to language, and other app developers may find ideas here that they could do something with. It’s of less interest, though, to anyone who wants to learn a bit of German.

Having spent a lot of time recently looking at vocabulary apps, I decided to put together a Christmas wish list of the features of my ideal vocabulary app. The list is not exhaustive and I’ve given more attention to some features than others. What (apart from testing) have I missed out?

1             Spaced repetition

Since the point of a vocabulary app is to help learners memorise vocabulary items, it is hard to imagine a decent system that does not incorporate spaced repetition. Spaced repetition algorithms offer one well-researched way of improving the brain’s ‘forgetting curve’. These algorithms come in different shapes and sizes, and I am not technically competent to judge which is the most efficient. However, as Peter Ellis Jones, the developer of a flashcard system called CardFlash, points out, efficiency is only one half of the rote memorisation problem. If you are not motivated to learn, the cleverness of the algorithm is moot. Fundamentally, learning software needs to be fun, rewarding, and give a solid sense of progression.

2             Quantity, balance and timing of new and ‘old’ items

A spaced repetition algorithm determines the optimum interval between repetitions, but further algorithms will be needed to determine when and with what frequency new items will be added to the deck. Once a system knows how many items a learner needs to learn and the time in which they have to do it, it is possible to determine the timing and frequency of the presentation of new items. But the system cannot know in advance how well an individual learner will learn the items (for any individual, some items will be more readily learnable than others) nor the extent to which learners will live up to their own positive expectations of time spent on-app. As most users of flashcard systems know, it is easy to fall behind, feel swamped and, ultimately, give up. An intelligent system needs to be able to respond to individual variables in order to ensure that the learning load is realistic.

3             Task variety

A standard flashcard system which simply asks learners to indicate whether they ‘know’ a target item before they flip over the card rapidly becomes extremely boring. A system which tests this knowledge soon becomes equally dull. There needs to be a variety of ways in which learners interact with an app, both for reasons of motivation and learning efficiency. It may be the case that, for an individual user, certain task types lead to more rapid gains in learning. An intelligent, adaptive system should be able to capture this information and modify the selection of task types.

Most younger learners and some adult learners will respond well to the inclusion of games within the range of task types. Examples of such games include the puzzles developed by Oliver Rose in his Phrase Maze app to accompany Quizlet practice.Phrase Maze 1Phrase Maze 2

4             Generative use

Memory researchers have long known about the ‘Generation Effect’ (see for example this piece of research from the Journal of Verbal Learning and Learning Behavior, 1978). Items are better learnt when the learner has to generate, in some (even small) way, the target item, rather than simply reading it. In vocabulary learning, this could be, for example, typing in the target word or, more simply, inserting some missing letters. Systems which incorporate task types that require generative use are likely to result in greater learning gains than simple, static flashcards with target items on one side and definitions or translations on the other.

5             Receptive and productive practice

The most basic digital flashcard systems require learners to understand a target item, or to generate it from a definition or translation prompt. Valuable as this may be, it won’t help learners much to use these items productively, since these systems focus exclusively on meaning. In order to do this, information must be provided about collocation, colligation, register, etc and these aspects of word knowledge will need to be focused on within the range of task types. At the same time, most vocabulary apps that I have seen focus primarily on the written word. Although any good system will offer an audio recording of the target item, and many will offer the learner the option of recording themselves, learners are invariably asked to type in their answers, rather than say them. For the latter, speech recognition technology will be needed. Ideally, too, an intelligent system will compare learner recordings with the audio models and provide feedback in such a way that the learner is guided towards a closer reproduction of the model.

6             Scaffolding and feedback

feebuMost flashcard systems are basically low-stakes, practice self-testing. Research (see, for example, Dunlosky et al’s metastudy ‘Improving Students’ Learning With Effective Learning Techniques: Promising Directions From Cognitive and Educational Psychology’) suggests that, as a learning strategy, practice testing has high utility – indeed, of higher utility than other strategies like keyword mnemonics or highlighting. However, an element of tutoring is likely to enhance practice testing, and, for this, scaffolding and feedback will be needed. If, for example, a learner is unable to produce a correct answer, they will probably benefit from being guided towards it through hints, in the same way as a teacher would elicit in a classroom. Likewise, feedback on why an answer is wrong (as opposed to simply being told that you are wrong), followed by encouragement to try again, is likely to enhance learning. Such feedback might, for example, point out that there is perhaps a spelling problem in the learner’s attempted answer, that the attempted answer is in the wrong part of speech, or that it is semantically close to the correct answer but does not collocate with other words in the text. The incorporation of intelligent feedback of this kind will require a number of NLP tools, since it will never be possible for a human item-writer to anticipate all the possible incorrect answers. A current example of intelligent feedback of this kind can be found in the Oxford English Vocabulary Trainer app.

7             Content

At the very least, a decent vocabulary app will need good definitions and translations (how many different languages?), and these will need to be tagged to the senses of the target items. These will need to be supplemented with all the other information that you find in a good learner’s dictionary: syntactic patterns, collocations, cognates, an indication of frequency, etc. The only way of getting this kind of high-quality content is by paying to license it from a company with expertise in lexicography. It doesn’t come cheap.

There will also need to be example sentences, both to illustrate meaning / use and for deployment in tasks. Dictionary databases can provide some of these, but they cannot be relied on as a source. This is because the example sentences in dictionaries have been selected and edited to accompany the other information provided in the dictionary, and not as items in practice exercises, which have rather different requirements. Once more, the solution doesn’t come cheap: experienced item writers will be needed.

Dictionaries describe and illustrate how words are typically used. But examples of typical usage tend to be as dull as they are forgettable. Learning is likely to be enhanced if examples are cognitively salient: weird examples with odd collocations, for example. Another thing for the item writers to think about.

A further challenge for an app which is not level-specific is that both the definitions and example sentences need to be level-specific. An A1 / A2 learner will need the kind of content that is found in, say, the Oxford Essential dictionary; B2 learners and above will need content from, say, the OALD.

8             Artwork and design

My wordbook2It’s easy enough to find artwork or photos of concrete nouns, but try to find or commission a pair of pictures that differentiate, for example, the adjectives ‘wild’ and ‘dangerous’ … What kind of pictures might illustrate simple verbs like ‘learn’ or ‘remember’? Will such illustrations be clear enough when squeezed into a part of a phone screen? Animations or very short video clips might provide a solution in some cases, but these are more expensive to produce and video files are much heavier.

With a few notable exceptions, such as the British Councils’s MyWordBook 2, design in vocabulary apps has been largely forgotten.

9             Importable and personalisable lists

Many learners will want to use a vocabulary app in association with other course material (e.g. coursebooks). Teachers, however, will inevitably want to edit these lists, deleting some items, adding others. Learners will want to do the same. This is a huge headache for app designers. If new items are going to be added to word lists, how will the definitions, example sentences and illustrations be generated? Will the database contain audio recordings of these words? How will these items be added to the practice tasks (if these include task types that go beyond simple double-sided flashcards)? NLP tools are not yet good enough to trawl a large corpus in order to select (and possibly edit) sentences that illustrate the right meaning and which are appropriate for interactive practice exercises. We can personalise the speed of learning and even the types of learning tasks, so long as the target language is predetermined. But as soon as we allow for personalisation of content, we run into difficulties.

10          Gamification

Maintaining motivation to use a vocabulary app is not easy. Gamification may help. Measuring progress against objectives will be a start. Stars and badges and leaderboards may help some users. Rewards may help others. But gamification features need to be built into the heart of the system, into the design and selection of tasks, rather than simply tacked on as an afterthought. They need to be trialled and tweaked, so analytics will be needed.

11          Teacher support

Although the use of vocabulary flashcards is beginning to catch on with English language teachers, teachers need help with ways to incorporate them in the work they do with their students. What can teachers do in class to encourage use of the app? In what ways does app use require teachers to change their approach to vocabulary work in the classroom? Reporting functions can help teachers know about the progress their students are making and provide very detailed information about words that are causing problems. But, as anyone involved in platform-based course materials knows, teachers need a lot of help.

12          And, of course, …

Apps need to be usable with different operating systems. Ideally, they should be (partially) usable offline. Loading times need to be short. They need to be easy and intuitive to use.

It’s unlikely that I’ll be seeing a vocabulary app with all of these features any time soon. Or, possibly, ever. The cost of developing something that could do all this would be extremely high, and there is no indication that there is a market that would be ready to pay the sort of prices that would be needed to cover the costs of development and turn a profit. We need to bear in mind, too, the fact that vocabulary apps can only ever assist in the initial acquisition of vocabulary: apps alone can’t solve the vocabulary learning problem (despite the silly claims of some app developers). The need for meaningful communicative use, extensive reading and listening, will not go away because a learner has been using an app. So, how far can we go in developing better and better vocabulary apps before users decide that a cheap / free app, with all its shortcomings, is actually good enough?

I posted a follow up to this post in October 2016.

51Fgn6C4sWL__SY344_BO1,204,203,200_Decent research into adaptive learning remains very thin on the ground. Disappointingly, the Journal of Learning Analytics has only managed one issue so far in 2015, compared to three in 2014. But I recently came across an article in Vol. 18 (pp. 111 – 125) of  Informing Science: the International Journal of an Emerging Transdiscipline entitled Informing and performing: A study comparing adaptive learning to traditional learning by Murray, M. C., & Pérez, J. of Kennesaw State University.

The article is worth reading, not least because of the authors’ digestible review of  adaptive learning theory and their discussion of levels of adaptation, including a handy diagram (see below) which they have reproduced from a white paper by Tyton Partners ‘Learning to Adapt: Understanding the Adaptive Learning Supplier Landscape’. Murray and Pérez make clear that adaptive learning theory is closely connected to the belief that learning is improved when instruction is personalized — adapted to individual learning styles, but their approach is surprisingly uncritical. They write, for example, that the general acceptance of learning styles is evidenced in recommended teaching strategies in nearly every discipline, and learning styles continue to inform the evolution of adaptive learning systems, and quote from the much-quoted Pashler, H., McDaniel, M., Rohrer, D., & Bjork, R. (2008) Learning styles: concepts and evidence, Psychological Science in the Public Interest, 9, 105–119. But Pashler et al concluded that the current evidence supporting the use of learning style-matched approaches is virtually non-existent (see here for a review of Pashler et al). And, in the world of ELT, an article in the latest edition of ELTJ by Carol Lethaby and Patricia Harries disses learning styles and other neuromyths. Given the close connection between adaptive learning theory and learning styles, one might reasonably predict that a comparative study of adaptive learning and traditional learning would not come out with much evidence in support of the former.

adaptive_taxonomyMurray and Pérez set out, anyway, to explore the hypothesis that adapting instruction to an individual’s learning style results in better learning outcomes. Their study compared adaptive and traditional methods in a university-level digital literacy course. Their conclusion? This study and a few others like it indicate that today’s adaptive learning systems have negligible impact on learning outcomes.

I was, however, more interested in the comments which followed this general conclusion. They point out that learning outcomes are only one measure of quality. Others, such as student persistence and engagement, they claim, can be positively affected by the employment of adaptive systems. I am not convinced. I think it’s simply far too soon to be able to judge this, and we need to wait quite some time for novelty effects to wear off. Murray and Pérez provide two references in support of their claim. One is an article by Josh Jarrett, Bigfoot, Goldilocks, and Moonshots: A Report from the Frontiers of Personalized Learning in Educause. Jarrett is Deputy Director for Postsecondary Success at the Bill & Melinda Gates Foundation and Educause is significantly funded by the Gates Foundation. Not, therefore, an entirely unbiased and trustworthy source. The other is a journalistic piece in Forbes. It’s by Tim Zimmer, entitled Rethinking higher ed: A case for adaptive learning and it reads like an advert. Zimmer is a ‘CCAP contributor’. CCAP is the Centre for College Affordability and Productivity, a libertarian, conservative foundation with a strong privatization agenda. Not, therefore, a particularly reliable source, either.

Despite their own findings, Murray and Pérez follow up their claim about student persistence and engagement with what they describe as a more compelling still argument for adaptive learning. This, they say, is the intuitively appealing case for adaptive learning systems as engines with which institutions can increase access and reduce costs. Ah, now we’re getting to the point!

 

 

 

 

 

 

 

.