Posts Tagged ‘AI’

Colloquium

At the beginning of March, I’ll be going to Cambridge to take part in a Digital Learning Colloquium (for more information about the event, see here ). One of the questions that will be explored is how research might contribute to the development of digital language learning. In this, the first of two posts on the subject, I’ll be taking a broad overview of the current state of play in edtech research.

I try my best to keep up to date with research. Of the main journals, there are Language Learning and Technology, which is open access; CALICO, which offers quite a lot of open access material; and reCALL, which is the most restricted in terms of access of the three. But there is something deeply frustrating about most of this research, and this is what I want to explore in these posts. More often than not, research articles end with a call for more research. And more often than not, I find myself saying ‘Please, no, not more research like this!’

First, though, I would like to turn to a more reader-friendly source of research findings. Systematic reviews are, basically literature reviews which can save people like me from having to plough through endless papers on similar subjects, all of which contain the same (or similar) literature review in the opening sections. If only there were more of them. Others agree with me: the conclusion of one systematic review of learning and teaching with technology in higher education (Lillejord et al., 2018) was that more systematic reviews were needed.

Last year saw the publication of a systematic review of research on artificial intelligence applications in higher education (Zawacki-Richter, et al., 2019) which caught my eye. The first thing that struck me about this review was that ‘out of 2656 initially identified publications for the period between 2007 and 2018, 146 articles were included for final synthesis’. In other words, only just over 5% of the research was considered worthy of inclusion.

The review did not paint a very pretty picture of the current state of AIEd research. As the second part of the title of this review (‘Where are the educators?’) makes clear, the research, taken as a whole, showed a ‘weak connection to theoretical pedagogical perspectives’. This is not entirely surprising. As Bates (2019) has noted: ‘since AI tends to be developed by computer scientists, they tend to use models of learning based on how computers or computer networks work (since of course it will be a computer that has to operate the AI). As a result, such AI applications tend to adopt a very behaviourist model of learning: present / test / feedback.’ More generally, it is clear that technology adoption (and research) is being driven by technology enthusiasts, with insufficient expertise in education. The danger is that edtech developers ‘will simply ‘discover’ new ways to teach poorly and perpetuate erroneous ideas about teaching and learning’ (Lynch, 2017).

This, then, is the first of my checklist of things that, collectively, researchers need to do to improve the value of their work. The rest of this list is drawn from observations mostly, but not exclusively, from the authors of systematic reviews, and mostly come from reviews of general edtech research. In the next blog post, I’ll look more closely at a recent collection of ELT edtech research (Mavridi & Saumell, 2020) to see how it measures up.

1 Make sure your research is adequately informed by educational research outside the field of edtech

Unproblematised behaviourist assumptions about the nature of learning are all too frequent. References to learning styles are still fairly common. The most frequently investigated skill that is considered in the context of edtech is critical thinking (Sosa Neira, et al., 2017), but this is rarely defined and almost never problematized, despite a broad literature that questions the construct.

2 Adopt a sceptical attitude from the outset

Know your history. Decades of technological innovation in education have shown precious little in the way of educational gains and, more than anything else, have taught us that we need to be sceptical from the outset. ‘Enthusiasm and praise that are directed towards ‘virtual education, ‘school 2.0’, ‘e-learning and the like’ (Selwyn, 2014: vii) are indications that the lessons of the past have not been sufficiently absorbed (Levy, 2016: 102). The phrase ‘exciting potential’, for example, should be banned from all edtech research. See, for example, a ‘state-of-the-art analysis of chatbots in education’ (Winkler & Söllner, 2018), which has nothing to conclude but ‘exciting potential’. Potential is fine (indeed, it is perhaps the only thing that research can unambiguously demonstrate – see section 3 below), but can we try to be a little more grown-up about things?

3 Know what you are measuring

Measuring learning outcomes is tricky, to say the least, but it’s understandable that researchers should try to focus on them. Unfortunately, ‘the vast array of literature involving learning technology evaluation makes it challenging to acquire an accurate sense of the different aspects of learning that are evaluated, and the possible approaches that can be used to evaluate them’ (Lai & Bower, 2019). Metrics such as student grades are hard to interpret, not least because of the large number of variables and the danger of many things being conflated in one score. Equally, or possibly even more, problematic, are self-reporting measures which are rarely robust. It seems that surveys are the most widely used instrument in qualitative research (Sosa Neira, et al., 2017), but these will tell us little or nothing when used for short-term interventions (see point 5 below).

4 Ensure that the sample size is big enough to mean something

In most of the research into digital technology in education that was analysed in a literature review carried out for the Scottish government (ICF Consulting Services Ltd, 2015), there were only ‘small numbers of learners or teachers or schools’.

5 Privilege longitudinal studies over short-term projects

The Scottish government literature review (ICF Consulting Services Ltd, 2015), also noted that ‘most studies that attempt to measure any outcomes focus on short and medium term outcomes’. The fact that the use of a particular technology has some sort of impact over the short or medium term tells us very little of value. Unless there is very good reason to suspect the contrary, we should assume that it is a novelty effect that has been captured (Levy, 2016: 102).

6 Don’t forget the content

The starting point of much edtech research is the technology, but most edtech, whether it’s a flashcard app or a full-blown Moodle course, has content. Research reports rarely give details of this content, assuming perhaps that it’s just fine, and all that’s needed is a little tech to ‘present learners with the ‘right’ content at the ‘right’ time’ (Lynch, 2017). It’s a foolish assumption. Take a random educational app from the Play Store, a random MOOC or whatever, and the chances are you’ll find it’s crap.

7 Avoid anecdotal accounts of technology use in quasi-experiments as the basis of a ‘research article’

Control (i.e technology-free) groups may not always be possible but without them, we’re unlikely to learn much from a single study. What would, however, be extremely useful would be a large, collated collection of such action-research projects, using the same or similar technology, in a variety of settings. There is a marked absence of this kind of work.

8 Enough already of higher education contexts

Researchers typically work in universities where they have captive students who they can carry out research on. But we have a problem here. The systematic review of Lundin et al (2018), for example, found that ‘studies on flipped classrooms are dominated by studies in the higher education sector’ (besides lacking anchors in learning theory or instructional design). With some urgency, primary and secondary contexts need to be investigated in more detail, not just regarding flipped learning.

9 Be critical

Very little edtech research considers the downsides of edtech adoption. Online safety, privacy and data security are hardly peripheral issues, especially with younger learners. Ignoring them won’t make them go away.

More research?

So do we need more research? For me, two things stand out. We might benefit more from, firstly, a different kind of research, and, secondly, more syntheses of the work that has already been done. Although I will probably continue to dip into the pot-pourri of articles published in the main CALL journals, I’m looking forward to a change at the CALICO journal. From September of this year, one issue a year will be thematic, with a lead article written by established researchers which will ‘first discuss in broad terms what has been accomplished in the relevant subfield of CALL. It should then outline which questions have been answered to our satisfaction and what evidence there is to support these conclusions. Finally, this article should pose a “soft” research agenda that can guide researchers interested in pursuing empirical work in this area’. This will be followed by two or three empirical pieces that ‘specifically reflect the research agenda, methodologies, and other suggestions laid out in the lead article’.

But I think I’ll still have a soft spot for some of the other journals that are coyer about their impact factor and that can be freely accessed. How else would I discover (it would be too mean to give the references here) that ‘the effective use of new technologies improves learners’ language learning skills’? Presumably, the ineffective use of new technologies has the opposite effect? Or that ‘the application of modern technology represents a significant advance in contemporary English language teaching methods’?

References

Bates, A. W. (2019). Teaching in a Digital Age Second Edition. Vancouver, B.C.: Tony Bates Associates Ltd. Retrieved from https://pressbooks.bccampus.ca/teachinginadigitalagev2/

ICF Consulting Services Ltd (2015). Literature Review on the Impact of Digital Technology on Learning and Teaching. Edinburgh: The Scottish Government. https://dera.ioe.ac.uk/24843/1/00489224.pdf

Lai, J.W.M. & Bower, M. (2019). How is the use of technology in education evaluated? A systematic review. Computers & Education, 133(1), 27-42. Elsevier Ltd. Retrieved January 14, 2020 from https://www.learntechlib.org/p/207137/

Levy, M. 2016. Researching in language learning and technology. In Farr, F. & Murray, L. (Eds.) The Routledge Handbook of Language Learning and Technology. Abingdon, Oxon.: Routledge. pp.101 – 114

Lillejord S., Børte K., Nesje K. & Ruud E. (2018). Learning and teaching with technology in higher education – a systematic review. Oslo: Knowledge Centre for Education https://www.forskningsradet.no/siteassets/publikasjoner/1254035532334.pdf

Lundin, M., Bergviken Rensfeldt, A., Hillman, T. et al. (2018). Higher education dominance and siloed knowledge: a systematic review of flipped classroom research. International Journal of Educational Technology in Higher Education 15, 20 (2018) doi:10.1186/s41239-018-0101-6

Lynch, J. (2017). How AI Will Destroy Education. Medium, November 13, 2017. https://buzzrobot.com/how-ai-will-destroy-education-20053b7b88a6

Mavridi, S. & Saumell, V. (Eds.) (2020). Digital Innovations and Research in Language Learning. Faversham, Kent: IATEFL

Selwyn, N. (2014). Distrusting Educational Technology. New York: Routledge

Sosa Neira, E. A., Salinas, J. and de Benito Crosetti, B. (2017). Emerging Technologies (ETs) in Education: A Systematic Review of the Literature Published between 2006 and 2016. International Journal of Emerging Technologies in Education, 12 (5). https://online-journals.org/index.php/i-jet/article/view/6939

Winkler, R. & Söllner, M. (2018): Unleashing the Potential of Chatbots in Education: A State-Of-The-Art Analysis. In: Academy of Management Annual Meeting (AOM). Chicago, USA. https://www.alexandria.unisg.ch/254848/1/JML_699.pdf

Zawacki-Richter, O., Bond, M., Marin, V. I. And Gouverneur, F. (2019). Systematic review of research on artificial intelligence applications in higher education – where are the educators? International Journal of Educational Technology in Higher Education 2019

At the start of the last decade, ELT publishers were worried, Macmillan among them. The financial crash of 2008 led to serious difficulties, not least in their key Spanish market. In 2011, Macmillan’s parent company was fined ₤11.3 million for corruption. Under new ownership, restructuring was a constant. At the same time, Macmillan ELT was getting ready to move from its Oxford headquarters to new premises in London, a move which would inevitably lead to the loss of a sizable proportion of its staff. On top of that, Macmillan, like the other ELT publishers, was aware that changes in the digital landscape (the first 3G iPhone had appeared in June 2008 and wifi access was spreading rapidly around the world) meant that they needed to shift away from the old print-based model. With her finger on the pulse, Caroline Moore, wrote an article in October 2010 entitled ‘No Future? The English Language Teaching Coursebook in the Digital Age’ . The publication (at the start of the decade) and runaway success of the online ‘Touchstone’ course, from arch-rivals, Cambridge University Press, meant that Macmillan needed to change fast if they were to avoid being left behind.

Macmillan already had a platform, Campus, but it was generally recognised as being clunky and outdated, and something new was needed. In the summer of 2012, Macmillan brought in two new executives – people who could talk the ‘creative-disruption’ talk and who believed in the power of big data to shake up English language teaching and publishing. At the time, the idea of big data was beginning to reach public consciousness and ‘Big Data: A Revolution that Will Transform how We Live, Work, and Think’ by Viktor Mayer-Schönberger and Kenneth Cukier, was a major bestseller in 2013 and 2014. ‘Big data’ was the ‘hottest trend’ in technology and peaked in Google Trends in October 2014. See the graph below.

Big_data_Google_Trend

Not long after taking up their positions, the two executives began negotiations with Knewton, an American adaptive learning company. Knewton’s technology promised to gather colossal amounts of data on students using Knewton-enabled platforms. Its founder, Jose Ferreira, bragged that Knewton had ‘more data about our students than any company has about anybody else about anything […] We literally know everything about what you know and how you learn best, everything’. This data would, it was claimed, enable publishers to multiply, by orders of magnitude, the efficacy of learning materials, allowing publishers, like Macmillan, to provide a truly personalized and optimal offering to learners using their platform.

The contract between Macmillan and Knewton was agreed in May 2013 ‘to build next-generation English Language Learning and Teaching materials’. Perhaps fearful of being left behind in what was seen to be a winner-takes-all market (Pearson already had a financial stake in Knewton), Cambridge University Press duly followed suit, signing a contract with Knewton in September of the same year, in order ‘to create personalized learning experiences in [their] industry-leading ELT digital products’. Things moved fast because, by the start of 2014 when Macmillan’s new catalogue appeared, customers were told to ‘watch out for the ‘Big Tree’’, Macmillans’ new platform, which would be powered by Knewton. ‘The power that will come from this world of adaptive learning takes my breath away’, wrote the international marketing director.

Not a lot happened next, at least outwardly. In the following year, 2015, the Macmillan catalogue again told customers to ‘look out for the Big Tree’ which would offer ‘flexible blended learning models’ which could ‘give teachers much more freedom to choose what they want to do in the class and what they want the students to do online outside of the classroom’.

Macmillan_catalogue_2015

But behind the scenes, everything was going wrong. It had become clear that a linear model of language learning, which was a necessary prerequisite of the Knewton system, simply did not lend itself to anything which would be vaguely marketable in established markets. Skills development, not least the development of so-called 21st century skills, which Macmillan was pushing at the time, would not be facilitated by collecting huge amounts of data and algorithms offering personalized pathways. Even if it could, teachers weren’t ready for it, and the projections for platform adoptions were beginning to seem very over-optimistic. Costs were spiralling. Pushed to meet unrealistic deadlines for a product that was totally ill-conceived in the first place, in-house staff were suffering, and this was made worse by what many staffers thought was a toxic work environment. By the end of 2014 (so, before the copy for the 2015 catalogue had been written), the two executives had gone.

For some time previously, skeptics had been joking that Macmillan had been barking up the wrong tree, and by the time that the 2016 catalogue came out, the ‘Big Tree’ had disappeared without trace. The problem was that so much time and money had been thrown at this particular tree that not enough had been left to develop new course materials (for adults). The whole thing had been a huge cock-up of an extraordinary kind.

Cambridge, too, lost interest in their Knewton connection, but were fortunate (or wise) not to have invested so much energy in it. Language learning was only ever a small part of Knewton’s portfolio, and the company had raised over $180 million in venture capital. Its founder, Jose Ferreira, had been a master of marketing hype, but the business model was not delivering any better than the educational side of things. Pearson pulled out. In December 2016, Ferreira stepped down and was replaced as CEO. The company shifted to ‘selling digital courseware directly to higher-ed institutions and students’ but this could not stop the decline. In September of 2019, Knewton was sold for something under $17 million dollars, with investors taking a hit of over $160 million. My heart bleeds.

It was clear, from very early on (see, for example, my posts from 2014 here and here) that Knewton’s product was little more than what Michael Feldstein called ‘snake oil’. Why and how could so many people fall for it for so long? Why and how will so many people fall for it again in the coming decade, although this time it won’t be ‘big data’ that does the seduction, but AI (which kind of boils down to the same thing)? The former Macmillan executives are still at the game, albeit in new companies and talking a slightly modified talk, and Jose Ferreira (whose new venture has already raised $3.7 million) is promising to revolutionize education with a new start-up which ‘will harness the power of technology to improve both access and quality of education’ (thanks to Audrey Watters for the tip). Investors may be desperate to find places to spread their portfolio, but why do the rest of us lap up the hype? It’s a question to which I will return.

 

 

 

 

ltsigIt’s hype time again. Spurred on, no doubt, by the current spate of books and articles  about AIED (artificial intelligence in education), the IATEFL Learning Technologies SIG is organising an online event on the topic in November of this year. Currently, the most visible online references to AI in language learning are related to Glossika , basically a language learning system that uses spaced repetition, whose marketing department has realised that references to AI might help sell the product. GlossikaThey’re not alone – see, for example, Knowble which I reviewed earlier this year .

In the wider world of education, where AI has made greater inroads than in language teaching, every day brings more stuff: How artificial intelligence is changing teaching , 32 Ways AI is Improving Education , How artificial intelligence could help teachers do a better job , etc., etc. There’s a full-length book by Anthony Seldon, The Fourth Education Revolution: will artificial intelligence liberate or infantilise humanity? (2018, University of Buckingham Press) – one of the most poorly researched and badly edited books on education I’ve ever read, although that won’t stop it selling – and, no surprises here, there’s a Pearson commissioned report called Intelligence Unleashed: An argument for AI in Education (2016) which is available free.

Common to all these publications is the claim that AI will radically change education. When it comes to language teaching, a similar claim has been made by Donald Clark (described by Anthony Seldon as an education guru but perhaps best-known to many in ELT for his demolition of Sugata Mitra). In 2017, Clark wrote a blog post for Cambridge English (now unavailable) entitled How AI will reboot language learning, and a more recent version of this post, called AI has and will change language learning forever (sic) is available on Clark’s own blog. Given the history of the failure of education predictions, Clark is making bold claims. Thomas Edison (1922) believed that movies would revolutionize education. Radios were similarly hyped in the 1940s and in the 1960s it was the turn of TV. In the 1980s, Seymour Papert predicted the end of schools – ‘the computer will blow up the school’, he wrote. Twenty years later, we had the interactive possibilities of Web 2.0. As each technology failed to deliver on the hype, a new generation of enthusiasts found something else to make predictions about.

But is Donald Clark onto something? Developments in AI and computational linguistics have recently resulted in enormous progress in machine translation. Impressive advances in automatic speech recognition and generation, coupled with the power that can be packed into a handheld device, mean that we can expect some re-evaluation of the value of learning another language. Stephen Heppell, a specialist at Bournemouth University in the use of ICT in Education, has said: ‘Simultaneous translation is coming, making language teachers redundant. Modern languages teaching in future may be more about navigating cultural differences’ (quoted by Seldon, p.263). Well, maybe, but this is not Clark’s main interest.

Less a matter of opinion and much closer to the present day is the issue of assessment. AI is becoming ubiquitous in language testing. Cambridge, Pearson, TELC, Babbel and Duolingo are all using or exploring AI in their testing software, and we can expect to see this increase. Current, paper-based systems of testing subject knowledge are, according to Rosemary Luckin and Kristen Weatherby, outdated, ineffective, time-consuming, the cause of great anxiety and can easily be automated (Luckin, R. & Weatherby, K. 2018. ‘Learning analytics, artificial intelligence and the process of assessment’ in Luckin, R. (ed.) Enhancing Learning and Teaching with Technology, 2018. UCL Institute of Education Press, p.253). By capturing data of various kinds throughout a language learner’s course of study and by using AI to analyse learning development, continuous formative assessment becomes possible in ways that were previously unimaginable. ‘Assessment for Learning (AfL)’ or ‘Learning Oriented Assessment (LOA)’ are two terms used by Cambridge English to refer to the potential that AI offers which is described by Luckin (who is also one of the authors of the Pearson paper mentioned earlier). In practical terms, albeit in a still very limited way, this can be seen in the CUP course ‘Empower’, which combines CUP course content with validated LOA from Cambridge Assessment English.

Will this reboot or revolutionise language teaching? Probably not and here’s why. AIED systems need to operate with what is called a ‘domain knowledge model’. This specifies what is to be learnt and includes an analysis of the steps that must be taken to reach that learning goal. Some subjects (especially STEM subjects) ‘lend themselves much more readily to having their domains represented in ways that can be automatically reasoned about’ (du Boulay, D. et al., 2018. ‘Artificial intelligences and big data technologies to close the achievement gap’ in Luckin, R. (ed.) Enhancing Learning and Teaching with Technology, 2018. UCL Institute of Education Press, p.258). This is why most AIED systems have been built to teach these areas. Language are rather different. We simply do not have a domain knowledge model, except perhaps for the very lowest levels of language learning (and even that is highly questionable). Language learning is probably not, or not primarily, about acquiring subject knowledge. Debate still rages about the relationship between explicit language knowledge and language competence. AI-driven formative assessment will likely focus most on explicit language knowledge, as does most current language teaching. This will not reboot or revolutionise anything. It will more likely reinforce what is already happening: a model of language learning that assumes there is a strong interface between explicit knowledge and language competence. It is not a model that is shared by most SLA researchers.

So, one thing that AI can do (and is doing) for language learning is to improve the algorithms that determine the way that grammar and vocabulary are presented to individual learners in online programs. AI-optimised delivery of ‘English Grammar in Use’ may lead to some learning gains, but they are unlikely to be significant. It is not, in any case, what language learners need.

AI, Donald Clark suggests, can offer personalised learning. Precisely what kind of personalised learning this might be, and whether or not this is a good thing, remains unclear. A 2015 report funded by the Gates Foundation found that we currently lack evidence about the effectiveness of personalised learning. We do not know which aspects of personalised learning (learner autonomy, individualised learning pathways and instructional approaches, etc.) or which combinations of these will lead to gains in language learning. The complexity of the issues means that we may never have a satisfactory explanation. You can read my own exploration of the problems of personalised learning starting here .

What’s left? Clark suggests that chatbots are one area with ‘huge potential’. I beg to differ and I explained my reasons eighteen months ago . Chatbots work fine in very specific domains. As Clark says, they can be used for ‘controlled practice’, but ‘controlled practice’ means practice of specific language knowledge, the practice of limited conversational routines, for example. It could certainly be useful, but more than that? Taking things a stage further, Clark then suggests more holistic speaking and listening practice with Amazon Echo, Alexa or Google Home. If and when the day comes that we have general, as opposed to domain-specific, AI, chatting with one of these tools would open up vast new possibilities. Unfortunately, general AI does not exist, and until then Alexa and co will remain a poor substitute for human-human interaction (which is readily available online, anyway). Incidentally, AI could be used to form groups of online language learners to carry out communicative tasks – ‘the aim might be to design a grouping of students all at a similar cognitive level and of similar interests, or one where the participants bring different but complementary knowledge and skills’ (Luckin, R., Holmes, W., Griffiths, M. & Forceir, L.B. 2016. Intelligence Unleashed: An argument for AI in Education. London: Pearson, p.26).

Predictions about the impact of technology on education have a tendency to be made by people with a vested interest in the technologies. Edison was a businessman who had invested heavily in motion pictures. Donald Clark is an edtech entrepreneur whose company, Wildfire, uses AI in online learning programs. Stephen Heppell is executive chairman of LP+ who are currently developing a Chinese language learning community for 20 million Chinese school students. The reporting of AIED is almost invariably in websites that are paid for, in one way or another, by edtech companies. Predictions need, therefore, to be treated sceptically. Indeed, the safest prediction we can make about hyped educational technologies is that inflated expectations will be followed by disillusionment, before the technology finds a smaller niche.

 

Knowble, claims its developers, is a browser extension that will improve English vocabulary and reading comprehension. It also describes itself as an ‘adaptive language learning solution for publishers’. It’s currently beta and free, and sounds right up my street so I decided to give it a run.

Knowble reader

Users are asked to specify a first language (I chose French) and a level (A1 to C2): I chose B1, but this did not seem to impact on anything that subsequently happened. They are then offered a menu of about 30 up-to-date news items, grouped into 5 categories (world, science, business, sport, entertainment). Clicking on one article takes you to the article on the source website. There’s a good selection, including USA Today, CNN, Reuters, the Independent and the Torygraph from Britain, the Times of India, the Independent from Ireland and the Star from Canada. A large number of words are underlined: a single click brings up a translation in the extension box. Double-clicking on all other words will also bring up translations. Apart from that, there is one very short exercise (which has presumably been automatically generated) for each article.

For my trial run, I picked three articles: ‘Woman asks firefighters to help ‘stoned’ raccoon’ (from the BBC, 240 words), ‘Plastic straw and cotton bud ban proposed’ (also from the BBC, 823 words) and ‘London’s first housing market slump since 2009 weighs on UK price growth’ (from the Torygraph, 471 words).

Translations

Research suggests that the use of translations, rather than definitions, may lead to more learning gains, but the problem with Knowble is that it relies entirely on Google Translate. Google Translate is fast improving. Take the first sentence of the ‘plastic straw and cotton bud’ article, for example. It’s not a bad translation, but it gets the word ‘bid’ completely wrong, translating it as ‘offre’ (= offer), where ‘tentative’ (= attempt) is needed. So, we can still expect a few problems with Google Translate …

google_translateOne of the reasons that Google Translate has improved is that it no longer treats individual words as individual lexical items. It analyses groups of words and translates chunks or phrases (see, for example, the way it translates ‘as part of’). It doesn’t do word-for-word translation. Knowble, however, have set their software to ask Google for translations of each word as individual items, so the phrase ‘as part of’ is translated ‘comme’ + ‘partie’ + ‘de’. Whilst this example is comprehensible, problems arise very quickly. ‘Cotton buds’ (‘cotons-tiges’) become ‘coton’ + ‘bourgeon’ (= botanical shoots of cotton). Phrases like ‘in time’, ‘run into’, ‘sleep it off’ ‘take its course’, ‘fire station’ or ‘going on’ (all from the stoned raccoon text) all cause problems. In addition, Knowble are not using any parsing tools, so the system does not identify parts of speech, and further translation errors inevitably appear. In the short article of 240 words, about 10% are wrongly translated. Knowble claim to be using NLP tools, but there’s no sign of it here. They’re just using Google Translate rather badly.

Highlighted items

word_listNLP tools of some kind are presumably being used to select the words that get underlined. Exactly how this works is unclear. On the whole, it seems that very high frequency words are ignored and that lower frequency words are underlined. Here, for example, is the list of words that were underlined in the stoned raccoon text. I’ve compared them with (1) the CEFR levels for these words in the English Profile Text Inspector, and (2) the frequency information from the Macmillan dictionary (more stars = more frequent). In the other articles, some extremely high frequency words were underlined (e.g. price, cost, year) while much lower frequency items were not.

It is, of course, extremely difficult to predict which items of vocabulary a learner will know, even if we have a fairly accurate idea of their level. Personal interests play a significant part, so, for example, some people at even a low level will have no problem with ‘cannabis’, ‘stoned’ and ‘high’, even if these are low frequency. First language, however, is a reasonably reliable indicator as cognates can be expected to be easy. A French speaker will have no problem with ‘appreciate’, ‘unique’ and ‘symptom’. A recommendation engine that can meaningfully personalize vocabulary suggestions will, at the very least, need to consider cognates.

In short, the selection and underlining of vocabulary items, as it currently stands in Knowble, appears to serve no clear or useful function.

taskVocabulary learning

Knowble offers a very short exercise for each article. They are of three types: word completion, dictation and drag and drop (see the example). The rationale for the selection of the target items is unclear, but, in any case, these exercises are tokenistic in the extreme and are unlikely to lead to any significant learning gains. More valuable would be the possibility of exporting items into a spaced repetition flash card system.

effectiveThe claim that Knowble’s ‘learning effect is proven scientifically’ seems to me to be without any foundation. If there has been any proper research, it’s not signposted anywhere. Sure, reading lots of news articles (with a look-up function – if it works reliably) can only be beneficial for language learners, but they can do that with any decent dictionary running in the background.

Similar in many ways to en.news, which I reviewed in my last post, Knowble is another example of a technology-driven product that shows little understanding of language learning.

Last month, I wrote a post about the automated generation of vocabulary learning materials. Yesterday, I got an email from Mike Elchik, inviting me to take a look at the product that his company, WeSpeke, has developed in partnership with CNN. Called en.news, it’s a very regularly updated and wide selection of video clips and texts from CNN, which are then used to ‘automatically create a pedagogically structured, leveled and game-ified English lesson‘. Available at the AppStore and Google Play, as well as a desktop version, it’s free. Revenues will presumably be generated through advertising and later sales to corporate clients.

With 6.2 million dollars in funding so far, WeSpeke can leverage some state-of-the-art NLP and AI tools. Co-founder and chief technical adviser of the company is Jaime Carbonell, Director of the Language Technologies Institute at Carnegie Mellon University, described in Wikipedia as one of the gurus of machine learning. I decided to have a closer look.

home_page

Users are presented with a menu of CNN content (there were 38 items from yesterday alone), these are tagged with broad categories (Politics, Opinions, Money, Technology, Entertainment, etc.) and given a level, ranging from 1 to 5, although the vast majority of the material is at the two highest levels.

menu.jpg

I picked two lessons: a reading text about Mark Zuckerberg’s Congressional hearing (level 5) and a 9 minute news programme of mixed items (level 2 – illustrated above). In both cases, the lesson begins with the text. With the reading, you can click on words to bring up dictionary entries from the Collins dictionary. With the video, you can activate captions and again click on words for definitions. You can also slow down the speed. So far, so good.

There then follows a series of exercises which focus primarily on a set of words that have been automatically selected. This is where the problems began.

Level

It’s far from clear what the levels (1 – 5) refer to. The Zuckerberg text is 930 words long and is rated as B2 by one readability tool. But, using the English Profile Text Inspector, there are 19 types at C1 level, 14 at C2, and 98 which are unlisted. That suggests something substantially higher than B2. The CNN10 video is delivered at breakneck speed (as is often the case with US news shows). Yes, it can be slowed down, but that still won’t help with some passages, such as the one below:

A squirrel recently fell out of a tree in Western New York. Why would that make news?Because she bwoke her widdle leg and needed a widdle cast! Yes, there are casts for squirrels, as you can see in this video from the Orphaned Wildlife Center. A windstorm knocked the animal’s nest out of a tree, and when a woman saw that the baby squirrel was injured, she took her to a local vet. Doctors say she’s going to be just fine in a couple of weeks. Well, why ‘rodent’ she be? She’s been ‘whiskered’ away and cast in both a video and a plaster. And as long as she doesn’t get too ‘squirrelly’ before she heals, she’ll have quite a ‘tail’ to tell.

It’s hard to understand how a text like this got through the algorithms. But, as materials writers know, it is extremely hard to find authentic text that lends itself to language learning at anything below C1. On the evidence here, there is still some way to go before the process of selection can be automated. It may well be the case that CNN simply isn’t a particularly appropriate source.

Target learning items

The primary focus of these lessons is vocabulary learning, and it’s vocabulary learning of a very deliberate kind. Applied linguists are in general agreement that it makes sense for learners to approach the building of their L2 lexicon in a deliberate way (i.e. by studying individual words) for high-frequency items or items that can be identified as having a high surrender value (e.g. items from the AWL for students studying in an EMI context). Once you get to items that are less frequent than, say, the top 8,000 most frequent words, the effort expended in studying new words needs to be offset against their usefulness. Why spend a lot of time studying low frequency words when you’re unlikely to come across them again for some time … and will probably forget them before you do? Vocabulary development at higher levels is better served by extensive reading (and listening), possibly accompanied by glosses.

The target items in the Zuckerberg text were: advocacy, grilled, handicapping, sparked, diagnose, testified, hefty, imminent, deliberative and hesitant. One of these ‘grilled‘ is listed as A2 by English Vocabulary Profile, but that is with its literal, not metaphorical, meaning. Four of them are listed as C2 and the remaining five are off-list. In the CNN10 video, the target items were: strive, humble (verb), amplify, trafficked, enslaved, enacted, algae, trafficking, ink and squirrels. Of these, one is B1, two are C2 and the rest are unlisted. What is the point of studying these essentially random words? Why spend time going through a series of exercises that practise these items? Wouldn’t your time be better spent just doing some more reading? I have no idea how the automated selection of these items takes place, but it’s clear that it’s not working very well.

Practice exercises

There is plenty of variety of task-type but there are,  I think, two reasons to query the claim that these lessons are ‘pedagogically structured’. The first is the nature of the practice exercises; the second is the sequencing of the exercises. I’ll restrict my observations to a selection of the tasks.

1. Users are presented with a dictionary definition and an anagrammed target item which they must unscramble. For example:

existing for the purpose of discussing or planning something     VLREDBETEIIA

If you can’t solve the problem, you can always scroll through the text to find the answer. Burt the problem is in the task design. Dictionary definitions have been written to help language users decode a word. They simply don’t work very well when they are used for another purpose (as prompts for encoding).

2. Users are presented with a dictionary definition for which they must choose one of four words. There are many potential problems here, not the least of which is that definitions are often more complex than the word they are defining, or they present other challenges. As an example: cause to be unpretentious for to humble. On top of that, lexicographers often need or choose to embed the target item in the definition. For example:

a hefty amount of something, especially money, is very large

an event that is imminent, especially an unpleasant one, will happen very soon

When this is the case, it makes no sense to present these definitions and ask learners to find the target item from a list of four.

The two key pieces of content in this product – the CNN texts and the Collins dictionaries – are both less than ideal for their purposes.

3. Users are presented with a box of jumbled words which they must unscramble to form sentences that appeared in the text.

Rearrange_words_to_make_sentences

The sentences are usually long and hard to reconstruct. You can scroll through the text to find the answer, but I’m unclear what the point of this would be. The example above contains a mistake (vie instead of vice), but this was one of only two glitches I encountered.

4. Users are asked to select the word that they hear on an audio recording. For example:

squirreling     squirrel     squirreled     squirrels

Given the high level of challenge of both the text and the target items, this was a rather strange exercise to kick off the practice. The meaning has not yet been presented (in a matching / definition task), so what exactly is the point of this exercise?

5. Users are presented with gapped sentences from the text and asked to choose the correct grammatical form of the missing word. Some of these were hard (e.g. adjective order), others were very easy (e.g. some vs any). The example below struck me as plain weird for a lesson at this level.

________ have zero expectation that this Congress is going to make adequate changes. (I or Me ?)

6. At the end of both lessons, there were a small number of questions that tested your memory of the text. If, like me, you couldn’t remember all that much about the text after twenty minutes of vocabulary activities, you can scroll through the text to find the answers. This is not a task type that will develop reading skills: I am unclear what it could possibly develop.

Overall?

Using the lessons on offer here wouldn’t do a learner (as long as they already had a high level of proficiency) any harm, but it wouldn’t be the most productive use of their time, either. If a learner is motivated to read the text about Zuckerberg, rather than do lots of ‘busy’ work on a very odd set of words with gap-fills and matching tasks, they’d be better advised just to read the text again once or twice. They could use a look-up for words they want to understand and import them into a flashcard system with spaced repetition (en.news does have flashcards, but there’s no sign of spaced practice yet). More, they could check out another news website and read / watch other articles on the same subject (perhaps choosing websites with a different slant to CNN) and get valuable narrow-reading practice in this way.

My guess is that the technology has driven the product here, but without answering the fundamental questions about which words it’s appropriate for individual learners to study in a deliberate way and how this is best tackled, it doesn’t take learners very far.

 

 

 

 

Chatbots

Posted: December 1, 2016 in ed tech
Tags: , , , , , , , ,

 

Here’s a conversation I had recently with Cleverbot.

cleverbot

User (me): Hi Cleverbot. What are your views on the future of chatbots in language learning?

Cleverbot: Are you talking to yourself?

User: No, I’m asking you a question.

Cleverbot: Then what’s the question again? I forgot.

User: What are your views on the future of chatbots in language learning?

Cleverbot: Don’t you want me to finish my story?

User: No, I want you to answer my question.

Cleverbot: What is your question?

User: Do you think there is a future for chatbots in language education?

Cleverbot: Sure, if you are talking about the moon.

aliceI also tried to have a conversation with Alice, the avatar from EFL Classroom Bot, listed by Larry Ferlazzo as one of ‘the best online chatbots for practicing English’. I didn’t get any more sense out of her than out of Cleverbot.

Chatbots, apparently, are the next big thing. Again. David Mattin, head of trends and insights at trendwatching.com, writes (in the September 2016 issue of ‘Business Life’) that ‘the chatbot revolution is coming’ and that chatbots are a step towards the dream of an interface between user and technology that is so intuitive that the interface ‘simply fades away’. Chatbots have been around for some time. Remember Clippy – the Microsoft Office bot in the late 1990s – which you had to disable in order to stop yourself punching your computer screen? Since then, bots have become ubiquitous. There have been problems, such as Microsoft’s Tay bot that had to be taken down after sixteen hours earlier this year, when, after interacting with other Twitter users, it developed into an abusive Nazi. But chatbots aren’t going away and you’ve probably interacted with one to book a taxi, order food or attempt to talk to your bank. In September this year, the Guardian described them as ‘the talk of the town’ and ‘hot property in Silicon Valley’.

The real interest in chatbots is not, however, in the ‘exciting interface’ possibilities (both user interface and user experience remain pretty crude), but in the way that they are leaner, sit comfortably with the things we actually do on a phone and the fact that they offer a way of cutting out the high fees that developers have to pay to app stores . After so many start-up failures, chatbots offer a glimmer of financial hope to developers.

It’s no surprise, of course, to find the world of English language teaching beginning to sit up and take notice of this technology. A 2012 article by Ben Lehtinen in PeerSpectives enthuses about the possibilities in English language learning and reports the positive feedback of the author’s own students. ELTJam, so often so quick off the mark, developed an ELT Bot over the course of a hackathon weekend in March this year. Disappointingly, it wasn’t really a bot – more a case of humans pretending to be a bot pretending to be humans – but it probably served its exploratory purpose. duolingoAnd a few months ago Duolingo began incorporating bots. These are currently only available for French, Spanish and German learners in the iPhone app, so I haven’t been able to try it out and evaluate it. According to an infomercial in TechCrunch, ‘to make talking to the bots a bit more compelling, the company tried to give its different bots a bit of personality. There’s Chef Robert, Renee the Driver and Officer Ada, for example. They will react differently to your answers (and correct you as necessary), but for the most part, the idea here is to mimic a real conversation. These bots also allow for a degree of flexibility in your answers that most language-learning software simply isn’t designed for. There are plenty of ways to greet somebody, for example, but most services will often only accept a single answer. When you’re totally stumped for words, though, Duolingo offers a ‘help my reply’ button with a few suggested answers.’ In the last twelve months or so, Duolingo has considerably improved its ability to recognize multiple correct ways of expressing a particular idea, and its ability to recognise alternative answers to its translation tasks. However, I’m highly sceptical about its ability to mimic a real conversation any better than Cleverbot or Alice the EFL Bot, or its ability to provide systematically useful corrections.

My reasons lie in the current limitations of AI and NLP (Natural Language Processing). In a nutshell, we simply don’t know how to build a machine that can truly understand human language. Limited exchanges in restricted domains can be done pretty well (such as the early chatbot that did a good job of simulating an encounter with an evasive therapist, or, more recently ordering a taco and having a meaningless, but flirty conversation with a bot), but despite recent advances in semantic computing, we’re a long way from anything that can mimic a real conversation. As Audrey Watters puts it, we’re not even close.

When it comes to identifying language errors made by language learners, we’re not really much better off. Apps like Grammarly are not bad at identifying grammatical errors (but not good enough to be reliable), but pretty hopeless at dealing with lexical appropriacy. Much more reliable feedback to learners can be offered when the software is trained on particular topics and text types. Write & Improve does this with a relatively small selection of Cambridge English examination tasks, but a free conversation ….? Forget it.

So, how might chatbots be incorporated into language teaching / learning? A blog post from December 2015 entitled AI-powered chatbots and the future of language learning suggests one plausible possibility. Using an existing messenger service, such as WhatsApp or Telegram, an adaptive chatbot would send tasks (such as participation in a conversation thread with a predetermined topic, register, etc., or pronunciation practice or translation exercises) to a learner, provide feedback and record the work for later recycling. At the same time, the bot could send out reminders of work that needs to be done or administrative tasks that must be completed.

Kat Robb has written a very practical article about using instant messaging in English language classrooms. Her ideas are interesting (although I find the idea of students in a F2F classroom messaging each other slightly bizarre) and it’s easy to imagine ways in which her activities might be augmented with chatbot interventions. The Write & Improve app, mentioned above, could deploy a chatbot interface to give feedback instead of the flat (and, in my opinion, perfectly adequate) pop-up boxes currently in use. Come to think of it, more or less any digital language learning tool could be pimped up with a bot. Countless revisions can be envisioned.

But the overwhelming question is: would it be worth it? Bots are not likely, any time soon, to revolutionise language learning. What they might just do, however, is help to further reduce language teaching to a series of ‘mechanical and scripted gestures’. More certain is that a lot of money will be thrown down the post-truth edtech drain. Then, in the not too distant future, this latest piece of edtech will fall into the trough of disillusionment, to be replaced by the latest latest thing.