Posts Tagged ‘Duolingo’

Introduction

Allowing learners to determine the amount of time they spend studying, and, therefore (in theory at least) the speed of their progress is a key feature of most personalized learning programs. In cases where learners follow a linear path of pre-determined learning items, it is often the only element of personalization that the programs offer. In the Duolingo program that I am using, there are basically only two things that can be personalized: the amount of time I spend studying each day, and the possibility of jumping a number of learning items by ‘testing out’.

Self-regulated learning or self-pacing, as this is commonly referred to, has enormous intuitive appeal. It is clear that different people learn different things at different rates. We’ve known for a long time that ‘the developmental stages of child growth and the individual differences among learners make it impossible to impose a single and ‘correct’ sequence on all curricula’ (Stern, 1983: 439). It therefore follows that it makes even less sense for a group of students (typically determined by age) to be obliged to follow the same curriculum at the same pace in a one-size-fits-all approach. We have probably all experienced, as students, the frustration of being behind, or ahead of, the rest of our colleagues in a class. One student who suffered from the lockstep approach was Sal Khan, founder of the Khan Academy. He has described how he was fed up with having to follow an educational path dictated by his age and how, as a result, individual pacing became an important element in his educational approach (Ferster, 2014: 132-133). As teachers, we have all experienced the challenges of teaching a piece of material that is too hard or too easy for many of the students in the class.

Historical attempts to facilitate self-paced learning

Charles_W__Eliot_cph_3a02149An interest in self-paced learning can be traced back to the growth of mass schooling and age-graded classes in the 19th century. In fact, the ‘factory model’ of education has never existed without critics who saw the inherent problems of imposing uniformity on groups of individuals. These critics were not marginal characters. Charles Eliot (president of Harvard from 1869 – 1909), for example, described uniformity as ‘the curse of American schools’ and argued that ‘the process of instructing students in large groups is a quite sufficient school evil without clinging to its twin evil, an inflexible program of studies’ (Grittner, 1975: 324).

Attempts to develop practical solutions were not uncommon and these are reasonably well-documented. One of the earliest, which ran from 1884 to 1894, was launched in Pueblo, Colorado and was ‘a self-paced plan that required each student to complete a sequence of lessons on an individual basis’ (Januszewski, 2001: 58-59). More ambitious was the Burk Plan (at its peak between 1912 and 1915), named after Frederick Burk of the San Francisco State Normal School, which aimed to allow students to progress through materials (including language instruction materials) at their own pace with only a limited amount of teacher presentations (Januszewski, ibid.). Then, there was the Winnetka Plan (1920s), developed by Carlton Washburne, an associate of Frederick Burk and the superintendent of public schools in Winnetka, Illinois, which also ‘allowed learners to proceed at different rates, but also recognised that learners proceed at different rates in different subjects’ (Saettler, 1990: 65). The Winnetka Plan is especially interesting in the way it presaged contemporary attempts to facilitate individualized, self-paced learning. It was described by its developers in the following terms:

A general technique [consisting] of (a) breaking up the common essentials curriculum into very definite units of achievement, (b) using complete diagnostic tests to determine whether a child has mastered each of these units, and, if not, just where his difficulties lie and, (c) the full use of self-instructive, self corrective practice materials. (Washburne, C., Vogel, M. & W.S. Gray. 1926. A Survey of the Winnetka Public Schools. Bloomington, IL: Public School Press)

Not dissimilar was the Dalton (Massachusetts) Plan in the 1920s which also used a self-paced program to accommodate the different ability levels of the children and deployed contractual agreements between students and teachers (something that remains common educational practice around the world). There were many others, both in the U.S. and other parts of the world.

The personalization of learning through self-pacing was not, therefore, a minor interest. Between 1910 and 1924, nearly 500 articles can be documented on the subject of individualization (Grittner, 1975: 328). In just three years (1929 – 1932) of one publication, The Education Digest, there were fifty-one articles dealing with individual instruction and sixty-three entries treating individual differences (Chastain, 1975: 334). Foreign language teaching did not feature significantly in these early attempts to facilitate self-pacing, but see the Burk Plan described above. Only a handful of references to language learning and self-pacing appeared in articles between 1916 and 1924 (Grittner, 1975: 328).

Disappointingly, none of these initiatives lasted long. Both costs and management issues had been significantly underestimated. Plans such as those described above were seen as progress, but not the hoped-for solution. Problems included the fact that the materials themselves were not individualized and instructional methods were too rigid (Pendleton, 1930: 199). However, concomitant with the interest in individualization (mostly, self-pacing), came the advent of educational technology.

Sidney L. Pressey, the inventor of what was arguably the first teaching machine, was inspired by his experiences with schoolchildren in rural Indiana in the 1920s where he ‘was struck by the tremendous variation in their academic abilities and how they were forced to progress together at a slow, lockstep pace that did not serve all students well’ (Ferster, 2014: 52). Although Pressey failed in his attempts to promote his teaching machines, he laid the foundation stones in the synthesizing of individualization and technology.Pressey machine

Pressey may be seen as the direct precursor of programmed instruction, now closely associated with B. F. Skinner (see my post on Behaviourism and Adaptive Learning). It is a quintessentially self-paced approach and is described by John Hattie as follows:

Programmed instruction is a teaching method of presenting new subject matter to students in graded sequence of controlled steps. A book version, for example, presents a problem or issue, then, depending on the student’s answer to a question about the material, the student chooses from optional answers which refers them to particular pages of the book to find out why they were correct or incorrect – and then proceed to the next part of the problem or issue. (Hattie, 2009: 231)

Programmed instruction was mostly used for the teaching of mathematics, but it is estimated that 4% of programmed instruction programs were for foreign languages (Saettler, 1990: 297). It flourished in the 1960s and 1970s, but even by 1968 foreign language instructors were sceptical (Valdman, 1968). A survey carried out by the Center for Applied Linguistics revealed then that only about 10% of foreign language teachers at college and university reported the use of programmed materials in their departments. (Valdman, 1968: 1).grolier min max

Research studies had failed to demonstrate the effectiveness of programmed instruction (Saettler, 1990: 303). Teachers were often resistant and students were often bored, finding ‘ingenious ways to circumvent the program, including the destruction of their teaching machines!’ (Saettler, ibid.).

In the case of language learning, there were other problems. For programmed instruction to have any chance of working, it was necessary to specify rigorously the initial and terminal behaviours of the learner so that the intermediate steps leading from the former to the latter could be programmed. As Valdman (1968: 4) pointed out, this is highly problematic when it comes to languages (a point that I have made repeatedly in this blog). In addition, students missed the personal interaction that conventional instruction offered, got bored and lacked motivation (Valdman, 1968: 10).

Programmed instruction worked best when teachers were very enthusiastic, but perhaps the most significant lesson to be learned from the experiments was that it was ‘a difficult, time-consuming task to introduce programmed instruction’ (Saettler, 1990: 299). It entailed changes to well-established practices and attitudes, and for such changes to succeed there must be consideration of the social, political, and economic contexts. As Saettler (1990: 306), notes, ‘without the support of the community and the entire teaching staff, sustained innovation is unlikely’. In this light, Hattie’s research finding that ‘when comparisons are made between many methods, programmed instruction often comes near the bottom’ (Hattie, 2009: 231) comes as no great surprise.

Just as programmed instruction was in its death throes, the world of language teaching discovered individualization. Launched as a deliberate movement in the early 1970s at the Stanford Conference (Altman & Politzer, 1971), it was a ‘systematic attempt to allow for individual differences in language learning’ (Stern, 1983: 387). Inspired, in part, by the work of Carl Rogers, this ‘humanistic turn’ was a recognition that ‘each learner is unique in personality, abilities, and needs. Education must be personalized to fit the individual; the individual must not be dehumanized in order to meet the needs of an impersonal school system’ (Disick, 1975:38). In ELT, this movement found many adherents and remains extremely influential to this day.

In language teaching more generally, the movement lost impetus after a few years, ‘probably because its advocates had underestimated the magnitude of the task they had set themselves in trying to match individual learner characteristics with appropriate teaching techniques’ (Stern, 1983: 387). What precisely was meant by individualization was never adequately defined or agreed (a problem that remains to the present time). What was left was self-pacing. In 1975, it was reported that ‘to date the majority of the programs in second-language education have been characterized by a self-pacing format […]. Practice seems to indicate that ‘individualized’ instruction is being defined in the class room as students studying individually’ (Chastain, 1975: 344).

Lessons to be learned

This brief account shows that historical attempts to facilitate self-pacing have largely been characterised by failure. The starting point of all these attempts remains as valid as ever, but it is clear that practical solutions are less than simple. To avoid the insanity of doing the same thing over and over again and expecting different results, we should perhaps try to learn from the past.

One of the greatest challenges that teachers face is dealing with different levels of ability in their classes. In any blended scenario where the online component has an element of self-pacing, the challenge will be magnified as ability differentials are likely to grow rather than decrease as a result of the self-pacing. Bart Simpson hit the nail on the head in a memorable line: ‘Let me get this straight. We’re behind the rest of the class and we’re going to catch up to them by going slower than they are? Coo coo!’ Self-pacing runs into immediate difficulties when it comes up against standardised tests and national or state curriculum requirements. As Ferster observes, ‘the notion of individual pacing [remains] antithetical to […] a graded classroom system, which has been the model of schools for the past century. Schools are just not equipped to deal with students who do not learn in age-processed groups, even if this system is clearly one that consistently fails its students (Ferster, 2014: 90-91).bart_simpson

Ability differences are less problematic if the teacher focusses primarily on communicative tasks in F2F time (as opposed to more teaching of language items), but this is a big ‘if’. Many teachers are unsure of how to move towards a more communicative style of teaching, not least in large classes in compulsory schooling. Since there are strong arguments that students would benefit from a more communicative, less transmission-oriented approach anyway, it makes sense to focus institutional resources on equipping teachers with the necessary skills, as well as providing support, before a shift to a blended, more self-paced approach is implemented.

Such issues are less important in private institutions, which are not age-graded, and in self-study contexts. However, even here there may be reasons to proceed cautiously before buying into self-paced approaches. Self-pacing is closely tied to autonomous goal-setting (which I will look at in more detail in another post). Both require a degree of self-awareness at a cognitive and emotional level (McMahon & Oliver, 2001), but not all students have such self-awareness (Magill, 2008). If students do not have the appropriate self-regulatory strategies and are simply left to pace themselves, there is a chance that they will ‘misregulate their learning, exerting control in a misguided or counterproductive fashion and not achieving the desired result’ (Kirschner & van Merriënboer, 2013: 177). Before launching students on a path of self-paced language study, ‘thought needs to be given to the process involved in users becoming aware of themselves and their own understandings’ (McMahon & Oliver, 2001: 1304). Without training and support provided both before and during the self-paced study, the chances of dropping out are high (as we see from the very high attrition rate in language apps).

However well-intentioned, many past attempts to facilitate self-pacing have also suffered from the poor quality of the learning materials. The focus was more on the technology of delivery, and this remains the case today, as many posts on this blog illustrate. Contemporary companies offering language learning programmes show relatively little interest in the content of the learning (take Duolingo as an example). Few app developers show signs of investing in experienced curriculum specialists or materials writers. Glossy photos, contemporary videos, good UX and clever gamification, all of which become dull and repetitive after a while, do not compensate for poorly designed materials.

Over forty years ago, a review of self-paced learning concluded that the evidence on its benefits was inconclusive (Allison, 1975: 5). Nothing has changed since. For some people, in some contexts, for some of the time, self-paced learning may work. Claims that go beyond that cannot be substantiated.

References

Allison, E. 1975. ‘Self-Paced Instruction: A Review’ The Journal of Economic Education 7 / 1: 5 – 12

Altman, H.B. & Politzer, R.L. (eds.) 1971. Individualizing Foreign Language Instruction: Proceedings of the Stanford Conference, May 6 – 8, 1971. Washington, D.C.: Office of Education, U.S. Department of Health, Education, and Welfare

Chastain, K. 1975. ‘An Examination of the Basic Assumptions of “Individualized” Instruction’ The Modern Language Journal 59 / 7: 334 – 344

Disick, R.S. 1975 Individualizing Language Instruction: Strategies and Methods. New York: Harcourt Brace Jovanovich

Ferster, B. 2014. Teaching Machines. Baltimore: John Hopkins University Press

Grittner, F. M. 1975. ‘Individualized Instruction: An Historical Perspective’ The Modern Language Journal 59 / 7: 323 – 333

Hattie, J. 2009. Visible Learning. Abingdon, Oxon.: Routledge

Januszewski, A. 2001. Educational Technology: The Development of a Concept. Englewood, Colorado: Libraries Unlimited

Kirschner, P. A. & van Merriënboer, J. J. G. 2013. ‘Do Learners Really Know Best? Urban Legends in Education’ Educational Psychologist, 48:3, 169-183

Magill, D. S. 2008. ‘What Part of Self-Paced Don’t You Understand?’ University of Wisconsin 24th Annual Conference on Distance Teaching & Learning Conference Proceedings.

McMahon, M. & Oliver, R. 2001. ‘Promoting self-regulated learning in an on-line environment’ in C. Montgomerie & J. Viteli (eds.), Proceedings of World Conference on Educational Multimedia, Hypermedia and Telecommunications 2001 (pp. 1299-1305). Chesapeake, VA: AACE

Pendleton, C. S. 1930. ‘Personalizing English Teaching’ Peabody Journal of Education 7 / 4: 195 – 200

Saettler, P. 1990. The Evolution of American Educational Technology. Denver: Libraries Unlimited

Stern, H.H. 1983. Fundamental Concepts of Language Teaching. Oxford: Oxford University Press

Valdman, A. 1968. ‘Programmed Instruction versus Guided Learning in Foreign Language Acquisition’ Die Unterrichtspraxis / Teaching German 1 / 2: 1 – 14

 

Advertisements

Chatbots

Posted: December 1, 2016 in ed tech
Tags: , , , , , , , ,

 

Here’s a conversation I had recently with Cleverbot.

cleverbot

User (me): Hi Cleverbot. What are your views on the future of chatbots in language learning?

Cleverbot: Are you talking to yourself?

User: No, I’m asking you a question.

Cleverbot: Then what’s the question again? I forgot.

User: What are your views on the future of chatbots in language learning?

Cleverbot: Don’t you want me to finish my story?

User: No, I want you to answer my question.

Cleverbot: What is your question?

User: Do you think there is a future for chatbots in language education?

Cleverbot: Sure, if you are talking about the moon.

aliceI also tried to have a conversation with Alice, the avatar from EFL Classroom Bot, listed by Larry Ferlazzo as one of ‘the best online chatbots for practicing English’. I didn’t get any more sense out of her than out of Cleverbot.

Chatbots, apparently, are the next big thing. Again. David Mattin, head of trends and insights at trendwatching.com, writes (in the September 2016 issue of ‘Business Life’) that ‘the chatbot revolution is coming’ and that chatbots are a step towards the dream of an interface between user and technology that is so intuitive that the interface ‘simply fades away’. Chatbots have been around for some time. Remember Clippy – the Microsoft Office bot in the late 1990s – which you had to disable in order to stop yourself punching your computer screen? Since then, bots have become ubiquitous. There have been problems, such as Microsoft’s Tay bot that had to be taken down after sixteen hours earlier this year, when, after interacting with other Twitter users, it developed into an abusive Nazi. But chatbots aren’t going away and you’ve probably interacted with one to book a taxi, order food or attempt to talk to your bank. In September this year, the Guardian described them as ‘the talk of the town’ and ‘hot property in Silicon Valley’.

The real interest in chatbots is not, however, in the ‘exciting interface’ possibilities (both user interface and user experience remain pretty crude), but in the way that they are leaner, sit comfortably with the things we actually do on a phone and the fact that they offer a way of cutting out the high fees that developers have to pay to app stores . After so many start-up failures, chatbots offer a glimmer of financial hope to developers.

It’s no surprise, of course, to find the world of English language teaching beginning to sit up and take notice of this technology. A 2012 article by Ben Lehtinen in PeerSpectives enthuses about the possibilities in English language learning and reports the positive feedback of the author’s own students. ELTJam, so often so quick off the mark, developed an ELT Bot over the course of a hackathon weekend in March this year. Disappointingly, it wasn’t really a bot – more a case of humans pretending to be a bot pretending to be humans – but it probably served its exploratory purpose. duolingoAnd a few months ago Duolingo began incorporating bots. These are currently only available for French, Spanish and German learners in the iPhone app, so I haven’t been able to try it out and evaluate it. According to an infomercial in TechCrunch, ‘to make talking to the bots a bit more compelling, the company tried to give its different bots a bit of personality. There’s Chef Robert, Renee the Driver and Officer Ada, for example. They will react differently to your answers (and correct you as necessary), but for the most part, the idea here is to mimic a real conversation. These bots also allow for a degree of flexibility in your answers that most language-learning software simply isn’t designed for. There are plenty of ways to greet somebody, for example, but most services will often only accept a single answer. When you’re totally stumped for words, though, Duolingo offers a ‘help my reply’ button with a few suggested answers.’ In the last twelve months or so, Duolingo has considerably improved its ability to recognize multiple correct ways of expressing a particular idea, and its ability to recognise alternative answers to its translation tasks. However, I’m highly sceptical about its ability to mimic a real conversation any better than Cleverbot or Alice the EFL Bot, or its ability to provide systematically useful corrections.

My reasons lie in the current limitations of AI and NLP (Natural Language Processing). In a nutshell, we simply don’t know how to build a machine that can truly understand human language. Limited exchanges in restricted domains can be done pretty well (such as the early chatbot that did a good job of simulating an encounter with an evasive therapist, or, more recently ordering a taco and having a meaningless, but flirty conversation with a bot), but despite recent advances in semantic computing, we’re a long way from anything that can mimic a real conversation. As Audrey Watters puts it, we’re not even close.

When it comes to identifying language errors made by language learners, we’re not really much better off. Apps like Grammarly are not bad at identifying grammatical errors (but not good enough to be reliable), but pretty hopeless at dealing with lexical appropriacy. Much more reliable feedback to learners can be offered when the software is trained on particular topics and text types. Write & Improve does this with a relatively small selection of Cambridge English examination tasks, but a free conversation ….? Forget it.

So, how might chatbots be incorporated into language teaching / learning? A blog post from December 2015 entitled AI-powered chatbots and the future of language learning suggests one plausible possibility. Using an existing messenger service, such as WhatsApp or Telegram, an adaptive chatbot would send tasks (such as participation in a conversation thread with a predetermined topic, register, etc., or pronunciation practice or translation exercises) to a learner, provide feedback and record the work for later recycling. At the same time, the bot could send out reminders of work that needs to be done or administrative tasks that must be completed.

Kat Robb has written a very practical article about using instant messaging in English language classrooms. Her ideas are interesting (although I find the idea of students in a F2F classroom messaging each other slightly bizarre) and it’s easy to imagine ways in which her activities might be augmented with chatbot interventions. The Write & Improve app, mentioned above, could deploy a chatbot interface to give feedback instead of the flat (and, in my opinion, perfectly adequate) pop-up boxes currently in use. Come to think of it, more or less any digital language learning tool could be pimped up with a bot. Countless revisions can be envisioned.

But the overwhelming question is: would it be worth it? Bots are not likely, any time soon, to revolutionise language learning. What they might just do, however, is help to further reduce language teaching to a series of ‘mechanical and scripted gestures’. More certain is that a lot of money will be thrown down the post-truth edtech drain. Then, in the not too distant future, this latest piece of edtech will fall into the trough of disillusionment, to be replaced by the latest latest thing.

 

 

Adaptive learning providers make much of their ability to provide learners with personalised feedback and to provide teachers with dashboard feedback on the performance of both individuals and groups. All well and good, but my interest here is in the automated feedback that software could provide on very specific learning tasks. Scott Thornbury, in a recent talk, ‘Ed Tech: The Mouse that Roared?’, listed six ‘problems’ of language acquisition that educational technology for language learning needs to address. One of these he framed as follows: ‘The feedback problem, i.e. how does the learner get optimal feedback at the point of need?’, and suggested that technological applications ‘have some way to go.’ He was referring, not to the kind of feedback that dashboards can provide, but to the kind of feedback that characterises a good language teacher: corrective feedback (CF) – the way that teachers respond to learner utterances (typically those containing errors, but not necessarily restricted to these) in what Ellis and Shintani call ‘form-focused episodes’[1]. These responses may include a direct indication that there is an error, a reformulation, a request for repetition, a request for clarification, an echo with questioning intonation, etc. Basically, they are correction techniques.

These days, there isn’t really any debate about the value of CF. There is a clear research consensus that it can aid language acquisition. Discussing learning in more general terms, Hattie[2] claims that ‘the most powerful single influence enhancing achievement is feedback’. The debate now centres around the kind of feedback, and when it is given. Interestingly, evidence[3] has been found that CF is more effective in the learning of discrete items (e.g. some grammatical structures) than in communicative activities. Since it is precisely this kind of approach to language learning that we are more likely to find in adaptive learning programs, it is worth exploring further.

What do we know about CF in the learning of discrete items? First of all, it works better when it is explicit than when it is implicit (Li, 2010), although this needs to be nuanced. In immediate post-tests, explicit CF is better than implicit variations. But over a longer period of time, implicit CF provides better results. Secondly, formative feedback (as opposed to right / wrong testing-style feedback) strengthens retention of the learning items: this typically involves the learner repairing their error, rather than simply noticing that an error has been made. This is part of what cognitive scientists[4] sometimes describe as the ‘generation effect’. Whilst learners may benefit from formative feedback without repairing their errors, Ellis and Shintani (2014: 273) argue that the repair may result in ‘deeper processing’ and, therefore, assist learning. Thirdly, there is evidence that some delay in receiving feedback aids subsequent recall, especially over the longer term. Ellis and Shintani (2014: 276) suggest that immediate CF may ‘benefit the development of learners’ procedural knowledge’, while delayed CF is ‘perhaps more likely to foster metalinguistic understanding’. You can read a useful summary of a meta-analysis of feedback effects in online learning here, or you can buy the whole article here.

I have yet to see an online language learning program which can do CF well, but I think it’s a matter of time before things improve significantly. First of all, at the moment, feedback is usually immediate, or almost immediate. This is unlikely to change, for a number of reasons – foremost among them being the pride that ed tech takes in providing immediate feedback, and the fact that online learning is increasingly being conceptualised and consumed in bite-sized chunks, something you do on your phone between doing other things. What will change in better programs, however, is that feedback will become more formative. As things stand, tasks are usually of a very closed variety, with drag-and-drop being one of the most popular. Only one answer is possible and feedback is usually of the right / wrong-and-here’s-the-correct-answer kind. But tasks of this kind are limited in their value, and, at some point, tasks are needed where more than one answer is possible.

Here’s an example of a translation task from Duolingo, where a simple sentence could be translated into English in quite a large number of ways.

i_am_doing_a_basketDecontextualised as it is, the sentence could be translated in the way that I have done it, although it’s unlikely. The feedback, however, is of relatively little help to the learner, who would benefit from guidance of some sort. The simple reason that Duolingo doesn’t offer useful feedback is that the programme is static. It has been programmed to accept certain answers (e.g. in this case both the present simple and the present continuous are acceptable), but everything else will be rejected. Why? Because it would take too long and cost too much to anticipate and enter in all the possible answers. Why doesn’t it offer formative feedback? Because in order to do so, it would need to identify the kind of error that has been made. If we can identify the kind of error, we can make a reasonable guess about the cause of the error, and select appropriate CF … this is what good teachers do all the time.

Analysing the kind of error that has been made is the first step in providing appropriate CF, and it can be done, with increasing accuracy, by current technology, but it requires a lot of computing. Let’s take spelling as a simple place to start. If you enter ‘I am makeing a basket for my mother’ in the Duolingo translation above, the program tells you ‘Nice try … there’s a typo in your answer’. Given the configuration of keyboards, it is highly unlikely that this is a typo. It’s a simple spelling mistake and teachers recognise it as such because they see it so often. For software to achieve the same insight, it would need, as a start, to trawl a large English dictionary database and a large tagged database of learner English. The process is quite complicated, but it’s perfectably do-able, and learners could be provided with CF in the form of a ‘spelling hint’.i_am_makeing_a_basket

Rather more difficult is the error illustrated in my first screen shot. What’s the cause of this ‘error’? Teachers know immediately that this is probably a classic confusion of ‘do’ and ‘make’. They know that the French verb ‘faire’ can be translated into English as ‘make’ or ‘do’ (among other possibilities), and the error is a common language transfer problem. Software could do the same thing. It would need a large corpus (to establish that ‘make’ collocates with ‘a basket’ more often than ‘do’), a good bilingualised dictionary (plenty of these now exist), and a tagged database of learner English. Again, appropriate automated feedback could be provided in the form of some sort of indication that ‘faire’ is only sometimes translated as ‘make’.

These are both relatively simple examples, but it’s easy to think of others that are much more difficult to analyse automatically. Duolingo rejects ‘I am making one basket for my mother’: it’s not very plausible, but it’s not wrong. Teachers know why learners do this (again, it’s probably a transfer problem) and know how to respond (perhaps by saying something like ‘Only one?’). Duolingo also rejects ‘I making a basket for my mother’ (a common enough error), but is unable to provide any help beyond the correct answer. Automated CF could, however, be provided in both cases if more tools are brought into play. Multiple parsing machines (one is rarely accurate enough on its own) and semantic analysis will be needed. Both the range and the complexity of the available tools are increasing so rapidly (see here for the sort of research that Google is doing and here for an insight into current applications of this research in language learning) that Duolingo-style right / wrong feedback will very soon seem positively antediluvian.

One further development is worth mentioning here, and it concerns feedback and gamification. Teachers know from the way that most learners respond to written CF that they are usually much more interested in knowing what they got right or wrong, rather than the reasons for this. Most students are more likely to spend more time looking at the score at the bottom of a corrected piece of written work than at the laborious annotations of the teacher throughout the text. Getting students to pay close attention to the feedback we provide is not easy. Online language learning systems with gamification elements, like Duolingo, typically reward learners for getting things right, and getting things right in the fewest attempts possible. They encourage learners to look for the shortest or cheapest route to finding the correct answers: learning becomes a sexed-up form of test. If, however, the automated feedback is good, this sort of gamification encourages the wrong sort of learning behaviour. Gamification designers will need to shift their attention away from the current concern with right / wrong, and towards ways of motivating learners to look at and respond to feedback. It’s tricky, because you want to encourage learners to take more risks (and reward them for doing so), but it makes no sense to penalise them for getting things right. The probable solution is to have a dual points system: one set of points for getting things right, another for employing positive learning strategies.

The provision of automated ‘optimal feedback at the point of need’ may not be quite there yet, but it seems we’re on the way for some tasks in discrete-item learning. There will probably always be some teachers who can outperform computers in providing appropriate feedback, in the same way that a few top chess players can beat ‘Deep Blue’ and its scions. But the rest of us had better watch our backs: in the provision of some kinds of feedback, computers are catching up with us fast.

[1] Ellis, R. & N. Shintani (2014) Exploring Language Pedagogy through Second Language Acquisition Research. Abingdon: Routledge p. 249

[2] Hattie, K. (2009) Visible Learning. Abingdon: Routledge p.12

[3] Li, S. (2010) ‘The effectiveness of corrective feedback in SLA: a meta-analysis’ Language Learning 60 / 2: 309 -365

[4] Brown, P.C., Roediger, H.L. & McDaniel, M. A. Make It Stick (Cambridge, Mass.: Belknap Press, 2014)

There are a number of reasons why we sometimes need to describe a person’s language competence using a single number. Most of these are connected to the need for a shorthand to differentiate people, in summative testing or in job selection, for example. Numerical (or grade) allocation of this kind is so common (and especially in times when accountability is greatly valued) that it is easy to believe that this number is an objective description of a concrete entity, rather than a shorthand description of an abstract concept. In the process, the abstract concept (language competence) becomes reified and there is a tendency to stop thinking about what it actually is.

Language is messy. It’s a complex, adaptive system of communication which has a fundamentally social function. As Diane Larsen-Freeman and others have argued patterns of use strongly affect how language is acquired, is used, and changes. These processes are not independent of one another but are facets of the same complex adaptive system. […] The system consists of multiple agents (the speakers in the speech community) interacting with one another [and] the structures of language emerge from interrelated patterns of experience, social interaction, and cognitive mechanisms.

As such, competence in language use is difficult to measure. There are ways of capturing some of it. Think of the pages and pages of competency statements in the Common European Framework, but there has always been something deeply unsatisfactory about documents of this kind. How, for example, are we supposed to differentiate, exactly and objectively, between, say, can participate fully in an interview (C1) and can carry out an effective, fluent interview (B2)? The short answer is that we can’t. There are too many of these descriptors anyway and, even if we did attempt to use such a detailed tool to describe language competence, we would still be left with a very incomplete picture. There is at least one whole book devoted to attempts to test the untestable in language education (edited by Amos Paran and Lies Sercu, Multilingual Matters, 2010).

So, here is another reason why we are tempted to use shorthand numerical descriptors (such as A1, A2, B1, etc.) to describe something which is very complex and abstract (‘overall language competence’) and to reify this abstraction in the process. From there, it is a very short step to making things even more numerical, more scientific-sounding. Number-creep in recent years has brought us the Pearson Global Scale of English which can place you at a precise point on a scale from 10 to 90. Not to be outdone, Cambridge English Language Assessment now has a scale that runs from 80 points to 230, although Cambridge does, at least, allocate individual scores for four language skills.

As the title of this post suggests (in its reference to Stephen Jay Gould’s The Mismeasure of Man), I am suggesting that there are parallels between attempts to measure language competence and the sad history of attempts to measure ‘general intelligence’. Both are guilty of the twin fallacies of reification and ranking – the ordering of complex information as a gradual ascending scale. These conceptual fallacies then lead us, through the way that they push us to think about language, into making further conceptual errors about language learning. We start to confuse language testing with the ways that language learning can be structured.

We begin to granularise language. We move inexorably away from difficult-to-measure hazy notions of language skills towards what, on the surface at least, seem more readily measurable entities: words and structures. We allocate to them numerical values on our testing scales, so that an individual word can be deemed to be higher or lower on the scale than another word. And then we have a syllabus, a synthetic syllabus, that lends itself to digital delivery and adaptive manipulation. We find ourselves in a situation where materials writers for Pearson, writing for a particular ‘level’, are only allowed to use vocabulary items and grammatical structures that correspond to that ‘level’. We find ourselves, in short, in a situation where the acquisition of a complex and messy system is described as a linear, additive process. Here’s an example from the Pearson website: If you score 29 on the scale, you should be able to identify and order common food and drink from a menu; at 62, you should be able to write a structured review of a film, book or play. And because the GSE is so granular in nature, you can conquer smaller steps more often; and you are more likely to stay motivated as you work towards your goal. It’s a nonsense, a nonsense that is dictated by the needs of testing and adaptive software, but the sciency-sounding numbers help to hide the conceptual fallacies that lie beneath.

Perhaps, though, this doesn’t matter too much for most language learners. In the early stages of language learning (where most language learners are to be found), there are countless millions of people who don’t seem to mind the granularised programmes of Duolingo or Rosetta Stone, or the Grammar McNuggets of coursebooks. In these early stages, anything seems to be better than nothing, and the testing is relatively low-stakes. But as a learner’s interlanguage becomes more complex, and as the language she needs to acquire becomes more complex, attempts to granularise it and to present it in a linearly additive way become more problematic. It is for this reason, I suspect, that the appeal of granularised syllabuses declines so rapidly the more progress a learner makes. It comes as no surprise that, the further up the scale you get, the more that both teachers and learners want to get away from pre-determined syllabuses in coursebooks and software.

Adaptive language learning software is continuing to gain traction in the early stages of learning, in the initial acquisition of basic vocabulary and structures and in coming to grips with a new phonological system. It will almost certainly gain even more. But the challenge for the developers and publishers will be to find ways of making adaptive learning work for more advanced learners. Can it be done? Or will the mismeasure of language make it impossible?

The cheer-leading for big data in education continues unabated. Almost everything you read online on the subject is an advertisement, usually disguised as a piece of news or a blog post, but which can invariably be traced back to an organisation with a vested interest in digital disruption.  A typical example is this advergraphic which comes under a banner that reads ‘Big Data Improves Education’. The site, Datafloq, is selling itself as ‘the one-stop-shop around Big Data.’ Their ‘vision’ is ‘Connecting Data and People and [they] aim to achieve that by spurring the understanding, acceptance and application of Big Data in order to drive innovation and economic growth.’

Critical voices are rare, but growing. There’s a very useful bibliography of recent critiques here. And in the world of English language teaching, I was pleased to see that there’s a version of Gavin Dudeney’s talk, ‘Of Big Data & Little Data’, now up on YouTube. The slides which accompany his talk can be accessed here.

His main interest is in reclaiming the discourse of edtech in ELT, in moving away from the current obsession with numbers, and in returning the focus to what he calls ‘old edtech’ – the everyday technological practices of the vast majority of ELT practitioners.2014-12-01_2233

It’s a stimulating and deadpan-entertaining talk and well worth 40 minutes of your time. Just fast-forward the bit when he talks about me.

If you’re interested in hearing more critical voices, you may also like to listen to a series of podcasts, put together by the IATEFL Learning Technologies and Global Issues Special Interest Groups. In the first of these, I interview Neil Selwyn and, in the second, Lindsay Clandfield interviews Audrey Watters of Hack Education.

 

Duolingo testing

Posted: September 6, 2014 in testing
Tags: , , , , ,

After a break of two years, I recently returned to Duolingo in an attempt to build my German vocabulary. The attempt lasted a week. A few small things had changed, but the essentials had not, and my amusement at translating sentences like The duck eats oranges, A red dog wears white clothes or The fly is important soon turned to boredom and irritation. There are better, free ways of building vocabulary in another language.

Whilst little is new in the learning experience of Duolingo, there are significant developments at the company. The first of these is a new funding round in which they raised a further $20 million, bringing total investment to close to $40 million. Duolingo now has more than 25 million users, half of whom are described as ‘active’, and, according to Louis von Ahn,  the company’s founder, their ambition is to dominate the language learning market. Approaching their third anniversary, though, Duolingo will need, before long, to turn a profit or, at least, to break even. The original plan, to use the language data generated by users of the site to power a paying translation service, is beginning to bear fruit, with contracts with CNN and BuzzFeed. But Duolingo is going to need other income streams. This may well be part of the reason behind their decision to develop and launch their own test.

Duolingo’s marketing people, however, are trying to get another message across: Every year, over 30 million job seekers and students around the world are forced to take a test to prove that they know English in order to apply for a job or school. For some, these tests can cost their family an entire month’s salary. And not only that, taking them typically requires traveling to distant examination facilities and waiting weeks for the results. We believe there should be a better way. This is why today I’m proud to announce the beta release of the Duolingo Test Center, which was created to give everyone equal access to jobs and educational opportunities. Now anyone can conveniently certify their English skills from home, on their mobile device, and for only $20. That’s 1/10th the cost of existing tests. Talking the creative disruption talk, Duolingo wants to break into the “archaic” industry of language proficiency tests. Basically, then, they want to make the world a better place. I seem to have heard this kind of thing before.

The tests will cost $20. Gina Gotthilf , Duolingo’s head of marketing, explains the pricing strategy: We came up with the smallest value that works for us and that a lot of people can pay. Duolingo’s main markets are now the BRICS countries. In China, for example, 1.5 million people signed up with Duolingo in just one week in April of this year, according to @TECHINASIA . Besides China, Duolingo has expanded into India, Japan, Korea, Taiwan, Hong Kong, Vietnam and Indonesia this year. (Brazil already has 2.4 million users, and there are 1.5 million in Mexico.) That’s a lot of potential customers.

So, what do you get for your twenty bucks? Not a lot, is the short answer. The test lasts about 18 minutes. There are four sections, and adaptive software analyses the testee’s responses to determine the level of difficulty of subsequent questions. The first section requires users to select real English words from a list which includes invented words. The second is a short dictation, the third is a gapfill, and the fourth is a read-aloud task which is recorded and compared to a native-speaker norm. That’s it.Item types

Duolingo claims that the test scores correlate very well with TOEFL, but the claim is based on a single study by a University of Pittsburgh professor that was sponsored by Duolingo. Will further studies replicate the findings? I, for one, wouldn’t bet on it, but I won’t insult your intelligence by explaining my reasons. Test validity and reliability, then, remain to be proved, but even John Lehoczky , interim executive vice president of Carnegie Mellon University (Duolingo was developed by researchers from Carnegie Mellon’s computer science department) acknowledges that at this point [the test] is not a fit vehicle for undergraduate admissions.

Even more of a problem than validity and reliability, however, is the question of security. The test is delivered via the web or smartphone apps (Android and iOS). Testees have to provide photo ID and a photo taken on the device they are using. There are various rules (they must be alone, no headphones, etc) and a human proctor reviews the test after it has been completed. This is unlikely to impress authorities like the British immigration authorities, which recently refused to recognise online TOEFL and TOEIC qualifications, after a BBC documentary revealed ‘systematic fraud’ in the taking of these tests.

There will always be a market of sorts for valueless qualifications (think, for example, of all the cheap TEFL courses that can be taken online), but to break into the monopoly of TOEFL and IELTS (and soon perhaps Pearson), Duolingo will need to deal with the issues of validity, reliability and security. If they don’t, few – if any – institutions of higher education will recognise the test. But if they do, they’ll need to spend more money: a team of applied linguists with expertise in testing would be a good start, and serious proctoring doesn’t come cheap. Will they be able to do this and keep the price down to $20?

 

 

Personalization is one of the key leitmotifs in current educational discourse. The message is clear: personalization is good, one-size-fits-all is bad. ‘How to personalize learning and how to differentiate instruction for diverse classrooms are two of the great educational challenges of the 21st century,’ write Trilling and Fadel, leading lights in the Partnership for 21st Century Skills (P21)[1]. Barack Obama has repeatedly sung the praises of, and the need for, personalized learning and his policies are fleshed out by his Secretary of State, Arne Duncan, in speeches and on the White House blog: ‘President Obama described the promise of personalized learning when he launched the ConnectED initiative last June. Technology is a powerful tool that helps create robust personalized learning environments.’ In the UK, personalized learning has been government mantra for over 10 years. The EU, UNESCO, OECD, the Gates Foundation – everyone, it seems, is singing the same tune.

Personalization, we might all agree, is a good thing. How could it be otherwise? No one these days is going to promote depersonalization or impersonalization in education. What exactly it means, however, is less clear. According to a UNESCO Policy Brief[2], the term was first used in the context of education in the 1970s by Victor Garcìa Hoz, a senior Spanish educationalist and member of Opus Dei at the University of Madrid. This UNESCO document then points out that ‘unfortunately, up to this date there is no single definition of this concept’.

In ELT, the term has been used in a very wide variety of ways. These range from the far-reaching ideas of people like Gertrude Moskowitz, who advocated a fundamentally learner-centred form of instruction, to the much more banal practice of getting students to produce a few personalized examples of an item of grammar they have just studied. See Scott Thornbury’s A-Z blog for an interesting discussion of personalization in ELT.

As with education in general, and ELT in particular, ‘personalization’ is also bandied around the adaptive learning table. Duolingo advertises itself as the opposite of one-size-fits-all, and as an online equivalent of the ‘personalized education you can get from a small classroom teacher or private tutor’. Babbel offers a ‘personalized review manager’ and Rosetta Stone’s Classroom online solution allows educational institutions ‘to shift their language program away from a ‘one-size-fits-all-curriculum’ to a more individualized approach’. As far as I can tell, the personalization in these examples is extremely restricted. The language syllabus is fixed and although users can take different routes up the ‘skills tree’ or ‘knowledge graph’, they are totally confined by the pre-determination of those trees and graphs. This is no more personalized learning than asking students to make five true sentences using the present perfect. Arguably, it is even less!

This is not, in any case, the kind of personalization that Obama, the Gates Foundation, Knewton, et al have in mind when they conflate adaptive learning with personalization. Their definition is much broader and summarised in the US National Education Technology Plan of 2010: ‘Personalized learning means instruction is paced to learning needs, tailored to learning preferences, and tailored to the specific interests of different learners. In an environment that is fully personalized, the learning objectives and content as well as the method and pace may all vary (so personalization encompasses differentiation and individualization).’ What drives this is the big data generated by the students’ interactions with the technology (see ‘Part 4: big data and analytics’ of ‘The Guide’ on this blog).

What remains unclear is exactly how this might work in English language learning. Adaptive software can only personalize to the extent that the content of an English language learning programme allows it to do so. It may be true that each student using adaptive software ‘gets a more personalised experience no matter whose content the student is consuming’, as Knewton’s David Liu puts it. But the potential for any really meaningful personalization depends crucially on the nature and extent of this content, along with the possibility of variable learning outcomes. For this reason, we are not likely to see any truly personalized large-scale adaptive learning programs for English any time soon.

Nevertheless, technology is now central to personalized language learning. A good learning platform, which allows learners to connect to ‘social networking systems, podcasts, wikis, blogs, encyclopedias, online dictionaries, webinars, online English courses, various apps’, etc (see Alexandra Chistyakova’s eltdiary), means that personalization could be more easily achieved.

For the time being, at least, adaptive learning systems would seem to work best for ‘those things that can be easily digitized and tested like math problems and reading passages’ writes Barbara Bray . Or low level vocabulary and grammar McNuggets, we might add. Ideal for, say, ‘English Grammar in Use’. But meaningfully personalized language learning?

student-data-and-personalization

‘Personalized learning’ sounds very progressive, a utopian educational horizon, and it sounds like it ought to be the future of ELT (as Cleve Miller argues). It also sounds like a pretty good slogan on which to hitch the adaptive bandwagon. But somehow, just somehow, I suspect that when it comes to adaptive learning we’re more likely to see more testing, more data collection and more depersonalization.

[1] Trilling, B. & Fadel, C. 2009 21st Century Skills (San Francisco: Wiley) p.33

[2] Personalized learning: a new ICT­enabled education approach, UNESCO Institute for Information Technologies in Education, Policy Brief March 2012 iite.unesco.org/pics/publications/en/files/3214716.pdf

 

busuu is an online language learning service. I did not refer to it in the ‘guide’ because it does not seem to use any adaptive learning software yet, but this is set to change. According to founder Bernhard Niesner, the company is already working on incorporation of adaptive software.

A few statistics will show the significance of busuu. The site currently has over 40 million users (El Pais, 8 February 2014) and is growing by 40,000 a day. The basic service is free, but the premium service costs Euro 69.99 a year. The company will not give detailed user statistics, but say that ‘hundreds of thousands’ are paying for the premium service, that turnover was a 7-figure number last year and will rise to 8 figures this year.

It is easy to understand why traditional publishers might be worried about competition like busuu and why they are turning away from print-based courses.

Busuu offers 12 languages, but, as a translation-based service, any one of these languages can only be studied if you speak one of the other languages on offer. The levels of the different courses are tagged to the CEFR.

busuuframe

In some ways, busuu is not so different from competitors like duolingo. Students are presented with bilingual vocabulary sets, accompanied by pictures, which are tested in a variety of ways. As with duolingo, some of this is a little strange. For German at level A1, I did a vocabulary set on ‘pets’ which presented the German words for a ferret, a tortoise and a guinea-pig, among others. There are dialogues, which are both written and recorded, that are sometimes surreal.

Child: Mum, look over there, there’s a dog without a collar, can we take it?

Mother: No, darling, our house is too small to have a dog.

Child: Mum your bedroom is very big, it can sleep with dad and you.

Mother: Come on, I’ll buy you a toy dog.

The dialogues are followed up by multiple choice questions which test your memory of the dialogue. There are also writing exercises where you are given a picture from National Geographic and asked to write about it. It’s not always clear what one is supposed to write. What would you say about a photo that showed a large number of parachutes in the sky, beyond ‘I can see a lot of parachutes’?

There are also many gamification elements. There is a learning carrot where you can set your own learning targets and users can earn ‘busuuberries’ which can then be traded in for animations in a ‘language garden’.

2014-02-25_0911

But in one significant respect, busuu differs from its competitors. It combines the usual vocabulary, grammar and dialogue work with social networking. Users can interact with text or video, and feedback on written work comes from other users. My own experience with this was mixed, but the potential is clear. Feedback on other learners’ work is encouraged by the awarding of ‘busuuberries’.

We will have to wait and see what busuu does with adaptive software and what it will do with the big data it is generating. For the moment, its interest lies in illustrating what could be done with a learning platform and adaptive software. The big ELT publishers know they have a new kind of competition and, with a lot more money to invest than busuu, we have to assume that what they will launch a few years from now will do everything that busuu does, and more. Meanwhile, busuu are working on site redesign and adaptivity. They would do well, too, to sort out their syllabus!

‘Adaptive learning’ can mean slightly different things to different people. According to one provider of adaptive learning software (Smart Sparrow https://www.smartsparrow.com/adaptive-elearning), it is ‘an online learning and teaching medium that uses an Intelligent Tutoring System to adapt online learning to the student’s level of knowledge. Adaptive eLearning provides students with customised educational content and the unique feedback that they need, when they need it.’ Essentially, it is software that analyzes the work that a student is doing online, and tailors further learning tasks to the individual learner’s needs (as analyzed by the software).

A relatively simple example of adaptive language learning is Duolingo, a free online service that currently offers seven languages, including English (www.duolingo.com/ ), with over 10 million users in November 2013. Learners progress through a series of translation, dictation and multiple choice exercises that are organised into a ‘skill tree’ of vocabulary and grammar areas. Because translation plays such a central role, the program is only suitable for speakers of one of the languages on offer in combination with one of the other languages on offer. Duolingo’s own blog describes the approach in the following terms: ‘Every time you finish a Duolingo lesson, translation, test, or practice session, you provide valuable data about what you know and what you’re struggling with. Our system uses this info to plan future lessons and select translation tasks specifically for your skills and needs. Similar to how an online store uses your previous purchases to customize your shopping experience, Duolingo uses your learning history to customize your learning experience’ (http://blog.duolingo.com/post/41960192602/duolingos-data-driven-approach-to-education).duolingo skilltree

Example of a ‘skill tree’ from http://www.duolingo.com

For anyone with a background in communicative language teaching, the experience can be slightly surreal. Examples of sentences that need to be translated include: The dog eats the bird, the boy has a cow, and the fly is eating bread. The system allows you to compete and communicate with other learners, and to win points and rewards (see ‘Gamification’ next post).

Duolingo describes its crowd-sourced, free, adaptive approach as ‘pretty unique’, but uniquely unique it is not. It is essentially a kind of memory trainer, and there are a number available on the market. One of the most well-known is Cerego’s cloud-based iKnow!, which describes itself as a ‘memory management platform’. Particularly strong in Japan, corporate and individual customers pay a monthly subscription to access its English, Chinese and Japanese language programs. A free trial of some of the products is available at http://iknow.jp/  and I experimented with their ‘Erudite English’ program. This presented a series of words which included ‘defalcate’, ‘fleer’ and ‘kvetch’ through English-only definitions, followed by multiple choice and dictated gap-fill exercises. As with Duolingo, there seemed to be no obvious principle behind the choice of items, and example sentences included things like ‘Michael arrogates a slice of carrot cake, unbeknownst to his sister,’ or ‘She found a place in which to posit the flowerpot.’ Based on a user’s performance, Cerego’s algorithms decide which items will be presented, and select the frequency and timing of opportunities for review. The program can be accessed through ordinary computers, as well as iPhone and Android apps. The platform has been designed in such a way as to allow other content to be imported, and then presented and practised in a similar way.

In a similar vein, the Rosetta Stone software also uses spaced repetition to teach grammar and vocabulary. It describes its adaptive learning as ‘Adaptive Recall™’ According to their website, this provides review activities for each lesson ‘at intervals that are determined by your performance in that review. Exceed the program’s expectations for you and the review gets pushed out further. Fall short and you’ll see it sooner. The program gives you a likely date and automatically notifies you when it’s time to take the review again’. Rosetta Stone has won numerous awards and claims that over 20,000 educational institutions around the world have formed partnerships with them. These include the US military, the University of Barcelona and Harrogate Grammar school in the UK (http://www.rosettastone.co.uk/faq ).

Slightly more sophisticated than the memory-trainers described above is the GRE (the Graduate Record Examinations, a test for admission into many graduate schools in the US) online preparation program that is produced by Barron’s (www.barronstestprep.com//gre ). Although this is not an English language course, it provides a useful example of how simple adaptive learning programs can be taken a few steps further. At the time of writing, it is possible to do a free trial, and this gives a good taste of adaptive learning. Barron’s highlights the way that their software delivers individualized study programs: it is not, they say, a case of ‘one size fits all’. After entering the intended test date, the intended number of hours of study, and a simple self-evaluation of different reasoning skills, a diagnostic test completes the information required to set up a personalized ‘prep plan’. This determines the lessons you will be given. As you progress through the course, the ‘prep plan’ adapts to the work that you do, comparing your performance to other students who have taken the course. Measuring your progress and modifying your ‘skill profile’, the order of the lessons and the selection of the 1000+ practice questions can change.