Posts Tagged ‘AI’

When the internet arrived on our desktops in the 1990s, language teachers found themselves able to access huge amounts of authentic texts of all kinds. It was a true game-changer. But when it came to ELT dedicated websites, the pickings were much slimmer. There was a very small number of good ELT resource sites (onestopenglish stood out from the crowd), but more ubiquitous and more enduring were the sites offering downloadable material shared by teachers. One of these, ESLprintables.com, currently has 1,082,522 registered users, compared to the 700,000+ of onestopenglish.

The resources on offer at sites such as these range from texts and scripted dialogues, along with accompanying comprehension questions, to grammar explanations and gapfills, vocabulary matching tasks and gapfills, to lists of prompts for discussions. Almost all of it is unremittingly awful, a terrible waste of the internet’s potential.

Ten years later, interactive online possibilities began to appear. Before long, language teachers found themselves able to use things like blogs, wikis and Google Docs. It was another true game changer. But when it came to ELT dedicated things, the pickings were much slimmer. There is some useful stuff (flashcard apps, for example) out there, but more ubiquitous are interactive versions of the downloadable dross that already existed. Learning platforms, which have such rich possibilities, are mostly loaded with gapfills, drag-and-drop, multiple choice, and so on. Again, it seems such a terrible waste of the technology’s potential. And all of this runs counter to what we know about how people learn another language. It’s as if decades of research into second language acquisition had never taken place.

And now we have AI and large language models like GPT. The possibilities are rich and quite a few people, like Sam Gravell and Svetlana Kandybovich, have already started suggesting interesting and creative ways of using the technology for language teaching. Sadly, though, technology has a tendency to bring out the worst in approaches to language teaching, since there’s always a bandwagon to be jumped on. Welcome to Twee, A.I. powered tools for English teachers, where you can generate your own dross in a matter of seconds. You can generate texts and dialogues, pitched at one of three levels, with or without target vocabulary, and produce comprehension questions (open questions, T / F, or M / C), exercises where vocabulary has to be matched to definitions, word-formation exercises, gapfills. The name of the site has been carefully chosen (Cambridge dictionary defines ‘twee’ as ‘artificially attractive’).

I decided to give it a try. Twee uses the same technology as ChatGPT and the results were unsurprising. I won’t comment in any detail on the intrinsic interest or the accuracy of factual information in the texts. They are what you might expect if you have experimented with ChatGPT. For the same reason, I won’t go into details about the credibility or naturalness of the dialogues. Similarly, the ability of Twee to gauge the appropriacy of texts for particular levels is poor: it hasn’t been trained on a tagged learner corpus. In any case, having only three level bands (A1/A2, B1/B2 and C1/C2) means that levelling is far too approximate. Suffice to say that the comprehension questions, vocabulary-item selection, vocabulary practice activities would all require very heavy editing.

Twee is still in beta, and, no doubt, improvements will come as the large language models on which it draws get bigger and better. Bilingual functionality is a necessary addition, and is doable. More reliable level-matching would be nice, but it’s a huge technological challenge, besides being theoretically problematic. But bigger problems remain and these have nothing to do with technology. Take a look at the examples below of how Twee suggests its reading comprehension tasks (open questions, M / C, T / F) could be used with some Beatles songs.

Is there any point getting learners to look at a ‘dialogue’ (on the topic of yellow submarines) like the one below? Is there any point getting learners to write essays using prompts such as those below?

What possible learning value could tasks such as these have? Is there any credible theory of language learning behind any of this, or is it just stuff that would while away some classroom time? AI meets ESLprintables – what a waste of the technology’s potential!

Edtech vendors like to describe their products as ‘solutions’, but the educational challenges, which these products are supposedly solutions to, often remain unexamined. Poor use of technology can exacerbate these challenges by making inappropriate learning materials more easily available.

Last September, Cambridge published a ‘Sustainability Framework for ELT’, which attempts to bring together environmental, social and economic sustainability. It’s a kind of 21st century skills framework and is designed to help teachers ‘to integrate sustainability skills development’ into their lessons. Among the sub-skills that are listed, a handful grabbed my attention:

  • Identifying and understanding obstacles to sustainability
  • Broadening discussion and including underrepresented voices
  • Understanding observable and hidden consequences
  • Critically evaluating sustainability claims
  • Understanding the bigger picture

Hoping to brush up my skills in these areas, I decided to take a look at the upcoming BETT show in London, which describes itself as ‘the biggest Education Technology exhibition in the world’. BETT and its parent company, Hyve, ‘are committed to redefining sustainability within the event industry and within education’. They are doing this by reducing their ‘onsite printing and collateral’. (‘Event collateral’ is an interesting event-industry term that refers to all the crap that is put into delegate bags, intended to ‘enhance their experience of the event’.) BETT and Hyve are encouraging all sponsors to go paperless, too, ‘switching from seat-drop collateral to QR codes’, and delegate bags will no longer be offered. They are partnering with various charities to donate ‘surplus food and furniture’ to local community projects, they are donating to other food charities that support families in need, and they are recycling all of the aisle banners into tote bags. Keynote speakers will include people like Sally Uren, CEO of ‘Forum for the Future’, who will talk about ‘Transforming carbon neutral education for a just and regenerative future’.

BETT and Hyve want us to take their corporate and social responsibility very seriously. All of these initiatives are very commendable, even though I wouldn’t go so far as to say that they will redefine sustainability within the event industry and education. But there is a problem – and it’s not that the world is already over-saturated with recycled tote bags. As the biggest jamboree of this kind in the world, the show attracts over 600 vendors and over 30,000 visitors, with over 120 countries represented. Quite apart from all the collateral and surplus furniture, the carbon and material footprint of the event cannot be negligible. Think of all those start-up solution-providers flying and driving into town, AirB’n’B-ing for the duration, and Ubering around town after hours, for a start.

But this is not really the problem, either. Much as the event likes to talk about ‘driving impact and improving outcomes for teachers and learners’, the clear and only purpose of the event is to sell stuff. It is to enable the investors in the 600+ edtech solution-providers in the exhibition area to move towards making a return on their investment. If we wanted to talk seriously about sustainability, the question that needs to be asked is: to what extent does all the hardware and software on sale contribute in any positive and sustainable way to education? Is there any meaningful social benefit to be derived from all this hardware and software, or is it all primarily just a part of a speculative, financial game? Is the corporate social responsibility of BETT / Hyve a form of green-washing to disguise the stimulation of more production and consumption? Is it all just a kind of environmentalism of the rich’ (Dauvergne, 2016).

Edtech is not the most pressing of environmental problems – indeed, there are examples of edtech that are likely more sustainable than the non-tech alternatives – but the sustainability question remains. There are at least four environmental costs to edtech:

  • The energy-greedy data infrastructures that lie behind digital transactions
  • The raw ingredients of digital devices
  • The environmentally destructive manufacture and production of digital devices
  • The environmental cost of dismantling and disposing digital hardware (Selwyn, 2018)

Some forms of edtech are more environmentally costly than others. First, we might consider the material costs. Going back to pre-internet days, think of the countless tonnes of audio cassettes, VCR tapes, DVDs and CD-ROMs. Think of the discarded playback devices, language laboratories and IWBs. None of these are easily recyclable and most have ended up in landfill, mostly in countries that never used these products. These days the hardware that is used for edtech is more often a device that serves other non-educational purposes, but the planned obsolescence of our phones, tablets and laptops is a huge problem for sustainability.

More important now are probably the energy costs of edtech. Audio and video streaming might seem more environmentally friendly than CDs and DVDs, but, depending on how often the CD or DVD is used, the energy cost of streaming (especially high quality video) can be much higher than using the physical format. AI ups the ante significantly (Brevini, 2022). Five years ago, a standard ‘AI training model in linguistics emit more than 284 tonnes of carbon dioxide equivalent’ (Strubell et al., 2019). With exponentially greater volumes of data now being used, the environmental cost is much, much higher. Whilst VR vendors will tout the environmental benefits of cutting down on travel, getting learners together in a physical room may well have a much lower carbon footprint than meeting in the Metaverse.

When doing the calculus of edtech, we need to evaluate the use-value of the technology. Does the tech actually have any clear educational (or other social) benefit, or is its value primarily in terms of its exchange-value?

To illustrate the difference between use-value and exchange-value, I’d like to return again to the beginnings of modern edtech in ELT. As the global market for ELT materials mushroomed in the 1990s, coursebook publishers realised that, for a relatively small investment, they could boost their sales by bringing out ‘new editions’ of best-selling titles. This meant a new cover, replacing a few texts and topics, making minor modifications to other content, and, crucially, adding extra features. As the years went by, these extra features became digital: CD-ROMs, DVDs, online workbooks and downloadables of various kinds. The publishers knew that sales depended on the existence of these shiny new things, even if many buyers made minimal use or zero use of them. But they gave the marketing departments and sales reps a pitch, and justified an increase in unit price. Did these enhanced coursebooks actually represent any increase in use-value? Did learners make better or faster progress in English as a result? On the whole, the answer has to be an unsurprising and resounding no. We should not be surprised if hundreds of megabytes of drag-and-drop grammar practice fail to have much positive impact on learning outcomes. From the start, it was the impact on the exchange-value (sales and profits) of these products that was the driving force.

Edtech vendors have always wanted to position themselves to potential buyers as ‘solution providers’, trumpeting the use-value of what they are selling. When it comes to attracting investors, it’s a different story, one that is all about minimum viable products, scalability and return on investment.

There are plenty of technologies that have undisputed educational use-value in language learning and teaching. Google Docs, Word, Zoom and YouTube come immediately to mind. Not coincidentally, they are not technologies that were designed for educational purposes. But when you look at specifically educational technology, It becomes much harder (though not impossible) to identify unambiguous gains in use-value. Most commonly, the technology holds out the promise of improved learning, but evidence that it has actually achieved this is extremely rare. Sure, a bells-and-whistles LMS offers exciting possibilities for flipped or blended learning, but research that demonstrates the effectiveness of these approaches in the real world is sadly lacking. Sure, VR might seem to offer a glimpse of motivated learners interacting meaningfully in the Metaverse, but I wouldn’t advise you to bet on it.

And betting is what most edtech is all about. An eye-watering $16.1 billion of venture capital was invested in global edtech in 2020. What matters is not that any of these products or services have any use-value, but that they are perceived to have a use-value. Central to this investment is the further commercialisation and privatisation of education (William & Hogan 2020). BETT is a part of this.

Returning to the development of my sustainability skills, I still need to consider the bigger picture. I’ve suggested that it is difficult to separate edtech from a consideration of capitalism, a system that needs to manufacture consumption, to expand production and markets in order to survive (Dauvergne, 2016: 48). Economic growth is the sine qua non of this system, and it is this that makes the British government (and others) so keen on BETT. Education and edtech in particular are rapidly growing markets. But growth is only sustainable, in environmental terms, if it is premised on things that we actually need, rather than things which are less necessary and ecologically destructive (Hickel, 2020). At the very least, as Selwyn (2021) noted, we need more diverse thinking: ‘What if environmental instability cannot be ‘solved’ simply through the expanded application of digital technologies but is actually exacerbated through increased technology use?

References

Brevini, B. (2022) Is AI Good for the Planet? Cambridge: Polity Press

Dauvergne, P. (2016) Environmentalism of the Rich. Cambridge, Mass.: MIT Press

Hickel, J. (2020) Less Is More. London: William Heinemann

Selwyn, N. (2018) EdTech is killing us all: facing up to the environmental consequences of digital education. EduResearch Matters 22 October, 2018. https://www.aare.edu.au/blog/?p=3293

Selwyn, N. (2021) Ed-Tech Within Limits: Anticipating educational technology in times of environmental crisis. E-Learning and Digital Media, 18 (5): 496 – 510. https://journals.sagepub.com/doi/pdf/10.1177/20427530211022951

Strubell, E., Ganesh, A. & McCallum, A. (2019) Energy and Policy Considerations for Deep Learning in NLP. Cornell University: https://arxiv.org/pdf/1906.02243.pdf

Williamson, B. & Hogan, A. (2020) Commercialisation and privatisation in / of education in the context of Covid-19. Education International

Who can tell where a blog post might lead? Over six years ago I wrote about adaptive professional development for teachers. I imagined the possibility of bite-sized, personalized CPD material. Now my vision is becoming real.

For the last two years, I have been working with a start-up that has been using AI to generate text using GPT-3 large language models. GPT-3 has recently been in the news because of the phenomenal success of the newly released ChatGPT. The technology certainly has a wow factor, but it has been around for a while now. ChatGPT can generate texts of various genres on any topic (with a few exceptions like current affairs) and the results are impressive. Imagine, then, how much more impressive the results can be when the kind of text is limited by genre and topic, allowing the software to be trained much more reliably.

This is what we have been working on. We took as our training corpus a huge collection of English language teaching teacher development texts that we could access online: blogs from all the major publishers, personal blogs, transcriptions from recorded conference presentations and webinars, magazine articles directed at teachers, along with books from publishers such as DELTA and Pavilion ELT, etc. We identified topics that seemed to be of current interest and asked our AI to generate blog posts. Later, we were able to get suggestions of topics from the software itself.

We then contacted a number of teachers and trainers who contribute to the publishers’ blogs and contracted them, first, to act as human trainers for the software, and, second, to agree to their names being used as the ‘authors’ of the blog posts we generated. In one or two cases, the authors thought that they had actually written the posts themselves! Next we submitted these posts to the marketing departments of the publishers (who run the blogs). Over twenty were submitted in this way, including:

  • What do teachers need to know about teaching 21st century skills in the English classroom?
  • 5 top ways of improving the well-being of English teachers
  • Teaching leadership skills in the primary English classroom
  • How can we promote eco-literacy in the English classroom?
  • My 10 favourite apps for English language learners

We couldn’t, of course, tell the companies that AI had been used to write the copy, but once we were sure that nobody had ever spotted the true authorship of this material, we were ready to move to the next stage of the project. We approached the marketing executives of two publishers and showed how we could generate teacher development material at a fraction of the current cost and in a fraction of the time. Partnerships were quickly signed.

Blog posts were just the beginning. We knew that we could use the same technology to produce webinar scripts, using learning design insights to optimise the webinars. The challenge we faced was that webinars need a presenter. We experimented with using animations, but feedback indicated that participants like to see a face. This is eminently doable, using our contracted authors and deep fake technology, but costs are still prohibitive. It remains cheaper and easier to use our authors delivering the scripts we have generated. This will no doubt change before too long.

The next obvious step was to personalize the development material. Large publishers collect huge amounts of data about visitors to their sites using embedded pixels. It is also relatively cheap and easy to triangulate this data with information from the customer databases and from activity on social media (especially Facebook). We know what kinds of classes people teach, and we know which aspects of teacher development they are interested in.

Publishers have long been interested in personalizing marketing material, and the possibility of extending this to the delivery of real development content is clearly exciting. (See below an email I received this week from the good folks at OUP marketing.)

Earlier this year one of our publishing partners began sending links to personalized materials of the kind we were able to produce with AI. The experiment was such a success that we have already taken it one stage further.

One of the most important clients of our main publishing partner employs hundreds of teachers to deliver online English classes using courseware that has been tailored to the needs of the institution. With so many freelance teachers working for them, along with high turnover of staff, there is inevitably a pressing need for teacher training to ensure optimal delivery. Since the classes are all online, it is possible to capture precisely what is going on. Using an AI-driven tool that was inspired by the Visible Classroom app (informed by the work of John Hattie), we can identify the developmental needs of the teachers. What kinds of activities are they using? How well do they exploit the functionalities of the platform? What can be said about the quality of their teacher talk? We combine this data with everything else and our proprietary algorithms determine what kinds of training materials each teacher receives. It doesn’t stop there. We can also now evaluate the effectiveness of these materials by analysing the learning outcomes of the students.

Teaching efficacy can by massively increased, whilst the training budget of the institution can be slashed. If all goes well, there will be no further need for teacher trainers at all. We won’t be stopping there. If results such as these can be achieved in teacher training, there’s no reason why the same technology cannot be leveraged for the teaching itself. Most of our partner’s teaching and testing materials are now quickly and very cheaply generated using GPT-3.5. If you want to see how this is done, check out the work of edugo.AI (a free trial is available) which can generate gapfills and comprehension test questions in a flash. As for replacing the teachers, we’re getting there. For the time being, though, it’s more cost-effective to use freelancers and to train them up.

The paragraph above was written by an AI-powered text generator called neuroflash https://app.neuro-flash.com/home which I told to produce a text on the topic ‘AI and education’. As texts on this topic go, it is both remarkable (in that it was not written by a human) and entirely unremarkable (in that it is practically indistinguishable from hundreds of human-written texts on the same subject). Neuroflash uses a neural network technology called GPT-3 – ‘a large language model’ – and ‘one of the most interesting and important AI systems ever produced’ (Chalmers, 2020). Basically, it generates text by predicting sequences of words based on huge databases. The nature of the paragraph above tells you all you need to know about the kinds of content that are usually found in texts about AI and education.

Not dissimilar from the neuroflash paragraph, educational commentary on uses of AI is characterised by (1) descriptions of AI tools already in use (e.g. speech recognition and machine translation) and (2) vague predictions which invariably refer to ‘the promise of personalised learning, adjusting what we give learners according to what they need to learn and keeping them motivated by giving them content that is of interest to them’ (Hughes, 2022). The question of what precisely will be personalised is unanswered: providing learners with optimal sets of resources (but which ones?), providing counselling services, recommendations or feedback for learners and teachers (but of what kind?) (Luckin, 2022). Nearly four years ago, I wrote https://adaptivelearninginelt.wordpress.com/2018/08/13/ai-and-language-teaching/ about the reasons why these questions remain unanswered. The short answer is that AI in language learning requires a ‘domain knowledge model’. This specifies what is to be learnt and includes an analysis of the steps that must be taken to reach that learning goal. This is lacking in SLA, or, at least, there is no general agreement on what it is. Worse, the models that are most commonly adopted in AI-driven programs (e.g. the deliberate learning of discrete items of grammar and vocabulary) are not supported by either current theory or research (see, for example, VanPatten & Smith, 2022).

In 2021, the IATEFL Learning Technologies SIG organised an event dedicated to AI in education. Unsurprisingly, there was a fair amount of input on AI in assessment, but my interest is in how AI might revolutionize how we learn and teach, not how we assess. What concrete examples did speakers provide?

Rose Luckin, the most well-known British expert on AI in education, kicked things off by mentioning three tools. One of these, Carnegie Learning, is a digital language course that looks very much like any of the ELT courses on offer from the big publishers – a fully blendable, multimedia (e.g. flashcards and videos) synthetic syllabus. This ‘blended learning solution’ is personalizable, since ‘no two students learn alike’, and, it claims, will develop a ‘lifelong love of language’. It appears to be premised on the idea of language learning as optimizing the delivery of ‘content’, of this content consisting primarily of discrete items, and of equating input with uptake. Been there, done that.

A second was Alelo Enskill https://www.alelo.com/about-us/ a chatbot / avatar roleplay program, first developed by the US military to teach Iraqi Arabic and aspects of Iraqi culture to Marines. I looked at the limitations of chatbot technology for language learning here https://adaptivelearninginelt.wordpress.com/2016/12/01/chatbots/ . The third tool mentioned by Luckin was Duolingo. Enough said.

Another speaker at this event was the founder and CEO of Edugo.AI https://www.edugo.ai/ , an AI-powered LMS which uses GPT-3. It allows schools to ‘create and upload on the platform any kind of language material (audio, video, text…). Our AI algorithms process and convert it in gamified exercises, which engage different parts of the brain, and gets students eager to practice’. Does this speaker know anything about gamification (for a quick read, I’d recommend Paul Driver (2012)) or neuroscience, I wonder. What, for that matter, does he know about language learning? Apparently, ‘language is not just about words, language is about sentences’ (Tomasello, 2022). Hmm, this doesn’t inspire confidence.

When you look at current uses of AI in language learning, there is very little (outside of testing, translation and speech ↔ text applications) that could justify enthusiastic claims that AI has any great educational potential. Skepticism seems to me a more reasonable and scientific response: de omnibus dubitandum.

Education is not the only field where AI has been talked up. When Covid hit us, AI was seen as the game-changing technology. It ‘could be deployed to make predictions, enhance efficiencies, and free up staff through automation; it could help rapidly process vast amounts of information and make lifesaving decisions’ (Chakravorti, 2022). The contribution of AI to the development of vaccines has been huge, but its role in diagnosing and triaging patients has been another matter altogether. Hundreds of predictive tools were developed: ‘none of them made a real difference, and some were potentially harmful’ (Heaven, 2021). Expectations were unrealistic and led to the deployment of tools before they were properly trialled. Thirty months down the line, a much more sober understanding of the potential of AI has emerged. Here, then, are the main lessons that have been learnt (I draw particularly on Engler, 2020, and Chakravorti, 2022) that are also relevant to education and language learning.

  • Anticipate what could go wrong before anticipating what might go right. Engler (2020) writes that ‘a poorly kept secret of AI practitioners is that 96% accuracy is suspiciously high for any machine learning problem’. In language learning, it is highly unlikely that personalized recommendations will ever reach anything even approaching this level of reliability. What are the implications for individual learners whose learning is inappropriately personalised?
  • We also know that a significant problem with AI systems is bias (O’Neil, 2016). There is a well-documented history of discriminatory outcomes because of people’s race, gender, social class or disability profile. Bias needs to be addressed proactively, not reactively.
  • Acknowledge from the outset that, for AI to work, huge amounts of data related to prior outcomes will be needed. In the cases of both Covid and language learning, much of this data will be personal. This raises immediate questions of privacy and consent, especially for learners who are children. Don’t minimise this issue.
  • Common by-products of AI use include insecure data storage, digital surveillance and consolidation of market power (Engler, 2020). Since such issues are so pervasive, assume they will be a problem, and need to be considered in evaluating the potential of any piece of AI edtech.
  • Find out what the subject matter experts have to say: ‘AI is only helpful when applied judiciously by subject-matter experts—people with long-standing experience with the problem that they are trying to solve […] Effectively predicting a badly defined problem is worse than doing nothing at all’ (Engler, 2020). It may be that the problem the AI is trying to help solve is the wrong problem and not worth solving at all (e.g. more efficient ‘delivery’ of explicit grammatical knowledge, gamifying gapfills).

To bring this post to an end, I returned to the AI / GTP-3 driven neuroflash program. This time, I asked it to write a longer piece entitled ‘The dangers of artificial intelligence in education: what could go wrong?’. Here’s what it came up with:

There are a few potential dangers of artificial intelligence in education. One is that AI could be used to replace human teachers. This could lead to a decline in the quality of education, as machines are not yet capable of replicating the full range of human emotions and social interactions that are necessary for teaching. Additionally, AI could be used to create biased or “racist” educational materials, as has already happened with some commercial software products. Finally, AI could be used to track and monitor students in ways that invade their privacy, or that could be used to manipulate their behavior.

All of these dangers are real and should be taken seriously. However, it’s important to remember that AI is still a new technology, and its impact on education is not yet fully understood. As such, we should tread cautiously when implementing AI into our educational systems.

When it comes to education, artificial intelligence could be a double-edged sword. While AI has the potential to revolutionize the field of education and help students learn in more efficient ways, there are also dangers that need to be considered.

One of the dangers of AI in education is that it could lead to a more standardized and homogenized form of learning. If all students are relying on AI-based learning tools, they may all end up learning the same things in the same way. This could lead to a loss of creativity and individuality in education.

Another danger is that AI-based learning tools could become too reliant on data and algorithms. This could lead to a situation where students are simply being taught what the machines think they should know, rather than being allowed to explore and discover new things for themselves.

Finally, there is the danger that AI could eventually replace human teachers altogether. While this may seem like a far-fetched scenario, it is not outside the realm of possibility. If AI-based learning tools become sophisticated enough, they may one day be able to do the job of teaching just as well as humans can.

These are just some of the dangers that need to be considered when it comes to artificial intelligence in education. While AI has the potential to revolutionize education, we need to be careful about how we implement it so that we don’t end up doing more harm than good.

References

Chakravorti, B. (2022) Why AI Failed to Live Up to Its Potential During the Pandemic. Harvard Business Review March 17,2022. https://hbr.org/2022/03/why-ai-failed-to-live-up-to-its-potential-during-the-pandemic

Chalmers, D. (2020) Weinberg, Justin (ed.). “GPT-3 and General Intelligence”. Daily Nous. Philosophers On GPT-3 (updated with replies by GPT-3) July 30, 2020. https://dailynous.com/2020/07/30/philosophers-gpt-3/#chalmers

Driver, P. (2012) The Irony of Gamification. In English Digital Magazine 3, British Council Portugal, pp. 21 – 24 http://digitaldebris.info/digital-debris/2011/12/31/the-irony-of-gamification-written-for-ied-magazine.html

Engler, A. (2020) A guide to healthy skepticism of artificial intelligence and coronavirus. Washington D.C.: Brookings Institution https://www.brookings.edu/research/a-guide-to-healthy-skepticism-of-artificial-intelligence-and-coronavirus/

Heaven, W. D. (2021) Hundreds of AI tools have been built to catch covid. None of them helped. MIT Technology Review, July 30, 2021. https://www.technologyreview.com/2021/07/30/1030329/machine-learning-ai-failed-covid-hospital-diagnosis-pandemic/

Hughes, G. (2022) What lies at the end of the AI rainbow? IATEFL LTSIG Newsletter Issue April 2022

Luckin, R. (2022) The implications of AI for language learning and teaching. IATEFL LTSIG Newsletter Issue April 2022

O’Neil, C. (2016) Weapons of Math Destruction. London: Allen Lane

Tomasello, G. (2022) Next Generation of AI-Language Education Software:NLP & Language Modules (GPT3). IATEFL LTSIG Newsletter Issue April 2022

VanPatten, B. & Smith, M. (2022) Explicit and Implicit Learning in Second Language Acquisition. Cambridge: Cambridge University Press

Colloquium

At the beginning of March, I’ll be going to Cambridge to take part in a Digital Learning Colloquium (for more information about the event, see here ). One of the questions that will be explored is how research might contribute to the development of digital language learning. In this, the first of two posts on the subject, I’ll be taking a broad overview of the current state of play in edtech research.

I try my best to keep up to date with research. Of the main journals, there are Language Learning and Technology, which is open access; CALICO, which offers quite a lot of open access material; and reCALL, which is the most restricted in terms of access of the three. But there is something deeply frustrating about most of this research, and this is what I want to explore in these posts. More often than not, research articles end with a call for more research. And more often than not, I find myself saying ‘Please, no, not more research like this!’

First, though, I would like to turn to a more reader-friendly source of research findings. Systematic reviews are, basically literature reviews which can save people like me from having to plough through endless papers on similar subjects, all of which contain the same (or similar) literature review in the opening sections. If only there were more of them. Others agree with me: the conclusion of one systematic review of learning and teaching with technology in higher education (Lillejord et al., 2018) was that more systematic reviews were needed.

Last year saw the publication of a systematic review of research on artificial intelligence applications in higher education (Zawacki-Richter, et al., 2019) which caught my eye. The first thing that struck me about this review was that ‘out of 2656 initially identified publications for the period between 2007 and 2018, 146 articles were included for final synthesis’. In other words, only just over 5% of the research was considered worthy of inclusion.

The review did not paint a very pretty picture of the current state of AIEd research. As the second part of the title of this review (‘Where are the educators?’) makes clear, the research, taken as a whole, showed a ‘weak connection to theoretical pedagogical perspectives’. This is not entirely surprising. As Bates (2019) has noted: ‘since AI tends to be developed by computer scientists, they tend to use models of learning based on how computers or computer networks work (since of course it will be a computer that has to operate the AI). As a result, such AI applications tend to adopt a very behaviourist model of learning: present / test / feedback.’ More generally, it is clear that technology adoption (and research) is being driven by technology enthusiasts, with insufficient expertise in education. The danger is that edtech developers ‘will simply ‘discover’ new ways to teach poorly and perpetuate erroneous ideas about teaching and learning’ (Lynch, 2017).

This, then, is the first of my checklist of things that, collectively, researchers need to do to improve the value of their work. The rest of this list is drawn from observations mostly, but not exclusively, from the authors of systematic reviews, and mostly come from reviews of general edtech research. In the next blog post, I’ll look more closely at a recent collection of ELT edtech research (Mavridi & Saumell, 2020) to see how it measures up.

1 Make sure your research is adequately informed by educational research outside the field of edtech

Unproblematised behaviourist assumptions about the nature of learning are all too frequent. References to learning styles are still fairly common. The most frequently investigated skill that is considered in the context of edtech is critical thinking (Sosa Neira, et al., 2017), but this is rarely defined and almost never problematized, despite a broad literature that questions the construct.

2 Adopt a sceptical attitude from the outset

Know your history. Decades of technological innovation in education have shown precious little in the way of educational gains and, more than anything else, have taught us that we need to be sceptical from the outset. ‘Enthusiasm and praise that are directed towards ‘virtual education, ‘school 2.0’, ‘e-learning and the like’ (Selwyn, 2014: vii) are indications that the lessons of the past have not been sufficiently absorbed (Levy, 2016: 102). The phrase ‘exciting potential’, for example, should be banned from all edtech research. See, for example, a ‘state-of-the-art analysis of chatbots in education’ (Winkler & Söllner, 2018), which has nothing to conclude but ‘exciting potential’. Potential is fine (indeed, it is perhaps the only thing that research can unambiguously demonstrate – see section 3 below), but can we try to be a little more grown-up about things?

3 Know what you are measuring

Measuring learning outcomes is tricky, to say the least, but it’s understandable that researchers should try to focus on them. Unfortunately, ‘the vast array of literature involving learning technology evaluation makes it challenging to acquire an accurate sense of the different aspects of learning that are evaluated, and the possible approaches that can be used to evaluate them’ (Lai & Bower, 2019). Metrics such as student grades are hard to interpret, not least because of the large number of variables and the danger of many things being conflated in one score. Equally, or possibly even more, problematic, are self-reporting measures which are rarely robust. It seems that surveys are the most widely used instrument in qualitative research (Sosa Neira, et al., 2017), but these will tell us little or nothing when used for short-term interventions (see point 5 below).

4 Ensure that the sample size is big enough to mean something

In most of the research into digital technology in education that was analysed in a literature review carried out for the Scottish government (ICF Consulting Services Ltd, 2015), there were only ‘small numbers of learners or teachers or schools’.

5 Privilege longitudinal studies over short-term projects

The Scottish government literature review (ICF Consulting Services Ltd, 2015), also noted that ‘most studies that attempt to measure any outcomes focus on short and medium term outcomes’. The fact that the use of a particular technology has some sort of impact over the short or medium term tells us very little of value. Unless there is very good reason to suspect the contrary, we should assume that it is a novelty effect that has been captured (Levy, 2016: 102).

6 Don’t forget the content

The starting point of much edtech research is the technology, but most edtech, whether it’s a flashcard app or a full-blown Moodle course, has content. Research reports rarely give details of this content, assuming perhaps that it’s just fine, and all that’s needed is a little tech to ‘present learners with the ‘right’ content at the ‘right’ time’ (Lynch, 2017). It’s a foolish assumption. Take a random educational app from the Play Store, a random MOOC or whatever, and the chances are you’ll find it’s crap.

7 Avoid anecdotal accounts of technology use in quasi-experiments as the basis of a ‘research article’

Control (i.e technology-free) groups may not always be possible but without them, we’re unlikely to learn much from a single study. What would, however, be extremely useful would be a large, collated collection of such action-research projects, using the same or similar technology, in a variety of settings. There is a marked absence of this kind of work.

8 Enough already of higher education contexts

Researchers typically work in universities where they have captive students who they can carry out research on. But we have a problem here. The systematic review of Lundin et al (2018), for example, found that ‘studies on flipped classrooms are dominated by studies in the higher education sector’ (besides lacking anchors in learning theory or instructional design). With some urgency, primary and secondary contexts need to be investigated in more detail, not just regarding flipped learning.

9 Be critical

Very little edtech research considers the downsides of edtech adoption. Online safety, privacy and data security are hardly peripheral issues, especially with younger learners. Ignoring them won’t make them go away.

More research?

So do we need more research? For me, two things stand out. We might benefit more from, firstly, a different kind of research, and, secondly, more syntheses of the work that has already been done. Although I will probably continue to dip into the pot-pourri of articles published in the main CALL journals, I’m looking forward to a change at the CALICO journal. From September of this year, one issue a year will be thematic, with a lead article written by established researchers which will ‘first discuss in broad terms what has been accomplished in the relevant subfield of CALL. It should then outline which questions have been answered to our satisfaction and what evidence there is to support these conclusions. Finally, this article should pose a “soft” research agenda that can guide researchers interested in pursuing empirical work in this area’. This will be followed by two or three empirical pieces that ‘specifically reflect the research agenda, methodologies, and other suggestions laid out in the lead article’.

But I think I’ll still have a soft spot for some of the other journals that are coyer about their impact factor and that can be freely accessed. How else would I discover (it would be too mean to give the references here) that ‘the effective use of new technologies improves learners’ language learning skills’? Presumably, the ineffective use of new technologies has the opposite effect? Or that ‘the application of modern technology represents a significant advance in contemporary English language teaching methods’?

References

Bates, A. W. (2019). Teaching in a Digital Age Second Edition. Vancouver, B.C.: Tony Bates Associates Ltd. Retrieved from https://pressbooks.bccampus.ca/teachinginadigitalagev2/

ICF Consulting Services Ltd (2015). Literature Review on the Impact of Digital Technology on Learning and Teaching. Edinburgh: The Scottish Government. https://dera.ioe.ac.uk/24843/1/00489224.pdf

Lai, J.W.M. & Bower, M. (2019). How is the use of technology in education evaluated? A systematic review. Computers & Education, 133(1), 27-42. Elsevier Ltd. Retrieved January 14, 2020 from https://www.learntechlib.org/p/207137/

Levy, M. 2016. Researching in language learning and technology. In Farr, F. & Murray, L. (Eds.) The Routledge Handbook of Language Learning and Technology. Abingdon, Oxon.: Routledge. pp.101 – 114

Lillejord S., Børte K., Nesje K. & Ruud E. (2018). Learning and teaching with technology in higher education – a systematic review. Oslo: Knowledge Centre for Education https://www.forskningsradet.no/siteassets/publikasjoner/1254035532334.pdf

Lundin, M., Bergviken Rensfeldt, A., Hillman, T. et al. (2018). Higher education dominance and siloed knowledge: a systematic review of flipped classroom research. International Journal of Educational Technology in Higher Education 15, 20 (2018) doi:10.1186/s41239-018-0101-6

Lynch, J. (2017). How AI Will Destroy Education. Medium, November 13, 2017. https://buzzrobot.com/how-ai-will-destroy-education-20053b7b88a6

Mavridi, S. & Saumell, V. (Eds.) (2020). Digital Innovations and Research in Language Learning. Faversham, Kent: IATEFL

Selwyn, N. (2014). Distrusting Educational Technology. New York: Routledge

Sosa Neira, E. A., Salinas, J. and de Benito Crosetti, B. (2017). Emerging Technologies (ETs) in Education: A Systematic Review of the Literature Published between 2006 and 2016. International Journal of Emerging Technologies in Education, 12 (5). https://online-journals.org/index.php/i-jet/article/view/6939

Winkler, R. & Söllner, M. (2018): Unleashing the Potential of Chatbots in Education: A State-Of-The-Art Analysis. In: Academy of Management Annual Meeting (AOM). Chicago, USA. https://www.alexandria.unisg.ch/254848/1/JML_699.pdf

Zawacki-Richter, O., Bond, M., Marin, V. I. And Gouverneur, F. (2019). Systematic review of research on artificial intelligence applications in higher education – where are the educators? International Journal of Educational Technology in Higher Education 2019

At the start of the last decade, ELT publishers were worried, Macmillan among them. The financial crash of 2008 led to serious difficulties, not least in their key Spanish market. In 2011, Macmillan’s parent company was fined ₤11.3 million for corruption. Under new ownership, restructuring was a constant. At the same time, Macmillan ELT was getting ready to move from its Oxford headquarters to new premises in London, a move which would inevitably lead to the loss of a sizable proportion of its staff. On top of that, Macmillan, like the other ELT publishers, was aware that changes in the digital landscape (the first 3G iPhone had appeared in June 2008 and wifi access was spreading rapidly around the world) meant that they needed to shift away from the old print-based model. With her finger on the pulse, Caroline Moore, wrote an article in October 2010 entitled ‘No Future? The English Language Teaching Coursebook in the Digital Age’ . The publication (at the start of the decade) and runaway success of the online ‘Touchstone’ course, from arch-rivals, Cambridge University Press, meant that Macmillan needed to change fast if they were to avoid being left behind.

Macmillan already had a platform, Campus, but it was generally recognised as being clunky and outdated, and something new was needed. In the summer of 2012, Macmillan brought in two new executives – people who could talk the ‘creative-disruption’ talk and who believed in the power of big data to shake up English language teaching and publishing. At the time, the idea of big data was beginning to reach public consciousness and ‘Big Data: A Revolution that Will Transform how We Live, Work, and Think’ by Viktor Mayer-Schönberger and Kenneth Cukier, was a major bestseller in 2013 and 2014. ‘Big data’ was the ‘hottest trend’ in technology and peaked in Google Trends in October 2014. See the graph below.

Big_data_Google_Trend

Not long after taking up their positions, the two executives began negotiations with Knewton, an American adaptive learning company. Knewton’s technology promised to gather colossal amounts of data on students using Knewton-enabled platforms. Its founder, Jose Ferreira, bragged that Knewton had ‘more data about our students than any company has about anybody else about anything […] We literally know everything about what you know and how you learn best, everything’. This data would, it was claimed, enable publishers to multiply, by orders of magnitude, the efficacy of learning materials, allowing publishers, like Macmillan, to provide a truly personalized and optimal offering to learners using their platform.

The contract between Macmillan and Knewton was agreed in May 2013 ‘to build next-generation English Language Learning and Teaching materials’. Perhaps fearful of being left behind in what was seen to be a winner-takes-all market (Pearson already had a financial stake in Knewton), Cambridge University Press duly followed suit, signing a contract with Knewton in September of the same year, in order ‘to create personalized learning experiences in [their] industry-leading ELT digital products’. Things moved fast because, by the start of 2014 when Macmillan’s new catalogue appeared, customers were told to ‘watch out for the ‘Big Tree’’, Macmillans’ new platform, which would be powered by Knewton. ‘The power that will come from this world of adaptive learning takes my breath away’, wrote the international marketing director.

Not a lot happened next, at least outwardly. In the following year, 2015, the Macmillan catalogue again told customers to ‘look out for the Big Tree’ which would offer ‘flexible blended learning models’ which could ‘give teachers much more freedom to choose what they want to do in the class and what they want the students to do online outside of the classroom’.

Macmillan_catalogue_2015

But behind the scenes, everything was going wrong. It had become clear that a linear model of language learning, which was a necessary prerequisite of the Knewton system, simply did not lend itself to anything which would be vaguely marketable in established markets. Skills development, not least the development of so-called 21st century skills, which Macmillan was pushing at the time, would not be facilitated by collecting huge amounts of data and algorithms offering personalized pathways. Even if it could, teachers weren’t ready for it, and the projections for platform adoptions were beginning to seem very over-optimistic. Costs were spiralling. Pushed to meet unrealistic deadlines for a product that was totally ill-conceived in the first place, in-house staff were suffering, and this was made worse by what many staffers thought was a toxic work environment. By the end of 2014 (so, before the copy for the 2015 catalogue had been written), the two executives had gone.

For some time previously, skeptics had been joking that Macmillan had been barking up the wrong tree, and by the time that the 2016 catalogue came out, the ‘Big Tree’ had disappeared without trace. The problem was that so much time and money had been thrown at this particular tree that not enough had been left to develop new course materials (for adults). The whole thing had been a huge cock-up of an extraordinary kind.

Cambridge, too, lost interest in their Knewton connection, but were fortunate (or wise) not to have invested so much energy in it. Language learning was only ever a small part of Knewton’s portfolio, and the company had raised over $180 million in venture capital. Its founder, Jose Ferreira, had been a master of marketing hype, but the business model was not delivering any better than the educational side of things. Pearson pulled out. In December 2016, Ferreira stepped down and was replaced as CEO. The company shifted to ‘selling digital courseware directly to higher-ed institutions and students’ but this could not stop the decline. In September of 2019, Knewton was sold for something under $17 million dollars, with investors taking a hit of over $160 million. My heart bleeds.

It was clear, from very early on (see, for example, my posts from 2014 here and here) that Knewton’s product was little more than what Michael Feldstein called ‘snake oil’. Why and how could so many people fall for it for so long? Why and how will so many people fall for it again in the coming decade, although this time it won’t be ‘big data’ that does the seduction, but AI (which kind of boils down to the same thing)? The former Macmillan executives are still at the game, albeit in new companies and talking a slightly modified talk, and Jose Ferreira (whose new venture has already raised $3.7 million) is promising to revolutionize education with a new start-up which ‘will harness the power of technology to improve both access and quality of education’ (thanks to Audrey Watters for the tip). Investors may be desperate to find places to spread their portfolio, but why do the rest of us lap up the hype? It’s a question to which I will return.

 

 

 

 

ltsigIt’s hype time again. Spurred on, no doubt, by the current spate of books and articles  about AIED (artificial intelligence in education), the IATEFL Learning Technologies SIG is organising an online event on the topic in November of this year. Currently, the most visible online references to AI in language learning are related to Glossika , basically a language learning system that uses spaced repetition, whose marketing department has realised that references to AI might help sell the product. GlossikaThey’re not alone – see, for example, Knowble which I reviewed earlier this year .

In the wider world of education, where AI has made greater inroads than in language teaching, every day brings more stuff: How artificial intelligence is changing teaching , 32 Ways AI is Improving Education , How artificial intelligence could help teachers do a better job , etc., etc. There’s a full-length book by Anthony Seldon, The Fourth Education Revolution: will artificial intelligence liberate or infantilise humanity? (2018, University of Buckingham Press) – one of the most poorly researched and badly edited books on education I’ve ever read, although that won’t stop it selling – and, no surprises here, there’s a Pearson commissioned report called Intelligence Unleashed: An argument for AI in Education (2016) which is available free.

Common to all these publications is the claim that AI will radically change education. When it comes to language teaching, a similar claim has been made by Donald Clark (described by Anthony Seldon as an education guru but perhaps best-known to many in ELT for his demolition of Sugata Mitra). In 2017, Clark wrote a blog post for Cambridge English (now unavailable) entitled How AI will reboot language learning, and a more recent version of this post, called AI has and will change language learning forever (sic) is available on Clark’s own blog. Given the history of the failure of education predictions, Clark is making bold claims. Thomas Edison (1922) believed that movies would revolutionize education. Radios were similarly hyped in the 1940s and in the 1960s it was the turn of TV. In the 1980s, Seymour Papert predicted the end of schools – ‘the computer will blow up the school’, he wrote. Twenty years later, we had the interactive possibilities of Web 2.0. As each technology failed to deliver on the hype, a new generation of enthusiasts found something else to make predictions about.

But is Donald Clark onto something? Developments in AI and computational linguistics have recently resulted in enormous progress in machine translation. Impressive advances in automatic speech recognition and generation, coupled with the power that can be packed into a handheld device, mean that we can expect some re-evaluation of the value of learning another language. Stephen Heppell, a specialist at Bournemouth University in the use of ICT in Education, has said: ‘Simultaneous translation is coming, making language teachers redundant. Modern languages teaching in future may be more about navigating cultural differences’ (quoted by Seldon, p.263). Well, maybe, but this is not Clark’s main interest.

Less a matter of opinion and much closer to the present day is the issue of assessment. AI is becoming ubiquitous in language testing. Cambridge, Pearson, TELC, Babbel and Duolingo are all using or exploring AI in their testing software, and we can expect to see this increase. Current, paper-based systems of testing subject knowledge are, according to Rosemary Luckin and Kristen Weatherby, outdated, ineffective, time-consuming, the cause of great anxiety and can easily be automated (Luckin, R. & Weatherby, K. 2018. ‘Learning analytics, artificial intelligence and the process of assessment’ in Luckin, R. (ed.) Enhancing Learning and Teaching with Technology, 2018. UCL Institute of Education Press, p.253). By capturing data of various kinds throughout a language learner’s course of study and by using AI to analyse learning development, continuous formative assessment becomes possible in ways that were previously unimaginable. ‘Assessment for Learning (AfL)’ or ‘Learning Oriented Assessment (LOA)’ are two terms used by Cambridge English to refer to the potential that AI offers which is described by Luckin (who is also one of the authors of the Pearson paper mentioned earlier). In practical terms, albeit in a still very limited way, this can be seen in the CUP course ‘Empower’, which combines CUP course content with validated LOA from Cambridge Assessment English.

Will this reboot or revolutionise language teaching? Probably not and here’s why. AIED systems need to operate with what is called a ‘domain knowledge model’. This specifies what is to be learnt and includes an analysis of the steps that must be taken to reach that learning goal. Some subjects (especially STEM subjects) ‘lend themselves much more readily to having their domains represented in ways that can be automatically reasoned about’ (du Boulay, D. et al., 2018. ‘Artificial intelligences and big data technologies to close the achievement gap’ in Luckin, R. (ed.) Enhancing Learning and Teaching with Technology, 2018. UCL Institute of Education Press, p.258). This is why most AIED systems have been built to teach these areas. Language are rather different. We simply do not have a domain knowledge model, except perhaps for the very lowest levels of language learning (and even that is highly questionable). Language learning is probably not, or not primarily, about acquiring subject knowledge. Debate still rages about the relationship between explicit language knowledge and language competence. AI-driven formative assessment will likely focus most on explicit language knowledge, as does most current language teaching. This will not reboot or revolutionise anything. It will more likely reinforce what is already happening: a model of language learning that assumes there is a strong interface between explicit knowledge and language competence. It is not a model that is shared by most SLA researchers.

So, one thing that AI can do (and is doing) for language learning is to improve the algorithms that determine the way that grammar and vocabulary are presented to individual learners in online programs. AI-optimised delivery of ‘English Grammar in Use’ may lead to some learning gains, but they are unlikely to be significant. It is not, in any case, what language learners need.

AI, Donald Clark suggests, can offer personalised learning. Precisely what kind of personalised learning this might be, and whether or not this is a good thing, remains unclear. A 2015 report funded by the Gates Foundation found that we currently lack evidence about the effectiveness of personalised learning. We do not know which aspects of personalised learning (learner autonomy, individualised learning pathways and instructional approaches, etc.) or which combinations of these will lead to gains in language learning. The complexity of the issues means that we may never have a satisfactory explanation. You can read my own exploration of the problems of personalised learning starting here .

What’s left? Clark suggests that chatbots are one area with ‘huge potential’. I beg to differ and I explained my reasons eighteen months ago . Chatbots work fine in very specific domains. As Clark says, they can be used for ‘controlled practice’, but ‘controlled practice’ means practice of specific language knowledge, the practice of limited conversational routines, for example. It could certainly be useful, but more than that? Taking things a stage further, Clark then suggests more holistic speaking and listening practice with Amazon Echo, Alexa or Google Home. If and when the day comes that we have general, as opposed to domain-specific, AI, chatting with one of these tools would open up vast new possibilities. Unfortunately, general AI does not exist, and until then Alexa and co will remain a poor substitute for human-human interaction (which is readily available online, anyway). Incidentally, AI could be used to form groups of online language learners to carry out communicative tasks – ‘the aim might be to design a grouping of students all at a similar cognitive level and of similar interests, or one where the participants bring different but complementary knowledge and skills’ (Luckin, R., Holmes, W., Griffiths, M. & Forceir, L.B. 2016. Intelligence Unleashed: An argument for AI in Education. London: Pearson, p.26).

Predictions about the impact of technology on education have a tendency to be made by people with a vested interest in the technologies. Edison was a businessman who had invested heavily in motion pictures. Donald Clark is an edtech entrepreneur whose company, Wildfire, uses AI in online learning programs. Stephen Heppell is executive chairman of LP+ who are currently developing a Chinese language learning community for 20 million Chinese school students. The reporting of AIED is almost invariably in websites that are paid for, in one way or another, by edtech companies. Predictions need, therefore, to be treated sceptically. Indeed, the safest prediction we can make about hyped educational technologies is that inflated expectations will be followed by disillusionment, before the technology finds a smaller niche.

 

Knowble, claims its developers, is a browser extension that will improve English vocabulary and reading comprehension. It also describes itself as an ‘adaptive language learning solution for publishers’. It’s currently beta and free, and sounds right up my street so I decided to give it a run.

Knowble reader

Users are asked to specify a first language (I chose French) and a level (A1 to C2): I chose B1, but this did not seem to impact on anything that subsequently happened. They are then offered a menu of about 30 up-to-date news items, grouped into 5 categories (world, science, business, sport, entertainment). Clicking on one article takes you to the article on the source website. There’s a good selection, including USA Today, CNN, Reuters, the Independent and the Torygraph from Britain, the Times of India, the Independent from Ireland and the Star from Canada. A large number of words are underlined: a single click brings up a translation in the extension box. Double-clicking on all other words will also bring up translations. Apart from that, there is one very short exercise (which has presumably been automatically generated) for each article.

For my trial run, I picked three articles: ‘Woman asks firefighters to help ‘stoned’ raccoon’ (from the BBC, 240 words), ‘Plastic straw and cotton bud ban proposed’ (also from the BBC, 823 words) and ‘London’s first housing market slump since 2009 weighs on UK price growth’ (from the Torygraph, 471 words).

Translations

Research suggests that the use of translations, rather than definitions, may lead to more learning gains, but the problem with Knowble is that it relies entirely on Google Translate. Google Translate is fast improving. Take the first sentence of the ‘plastic straw and cotton bud’ article, for example. It’s not a bad translation, but it gets the word ‘bid’ completely wrong, translating it as ‘offre’ (= offer), where ‘tentative’ (= attempt) is needed. So, we can still expect a few problems with Google Translate …

google_translateOne of the reasons that Google Translate has improved is that it no longer treats individual words as individual lexical items. It analyses groups of words and translates chunks or phrases (see, for example, the way it translates ‘as part of’). It doesn’t do word-for-word translation. Knowble, however, have set their software to ask Google for translations of each word as individual items, so the phrase ‘as part of’ is translated ‘comme’ + ‘partie’ + ‘de’. Whilst this example is comprehensible, problems arise very quickly. ‘Cotton buds’ (‘cotons-tiges’) become ‘coton’ + ‘bourgeon’ (= botanical shoots of cotton). Phrases like ‘in time’, ‘run into’, ‘sleep it off’ ‘take its course’, ‘fire station’ or ‘going on’ (all from the stoned raccoon text) all cause problems. In addition, Knowble are not using any parsing tools, so the system does not identify parts of speech, and further translation errors inevitably appear. In the short article of 240 words, about 10% are wrongly translated. Knowble claim to be using NLP tools, but there’s no sign of it here. They’re just using Google Translate rather badly.

Highlighted items

word_listNLP tools of some kind are presumably being used to select the words that get underlined. Exactly how this works is unclear. On the whole, it seems that very high frequency words are ignored and that lower frequency words are underlined. Here, for example, is the list of words that were underlined in the stoned raccoon text. I’ve compared them with (1) the CEFR levels for these words in the English Profile Text Inspector, and (2) the frequency information from the Macmillan dictionary (more stars = more frequent). In the other articles, some extremely high frequency words were underlined (e.g. price, cost, year) while much lower frequency items were not.

It is, of course, extremely difficult to predict which items of vocabulary a learner will know, even if we have a fairly accurate idea of their level. Personal interests play a significant part, so, for example, some people at even a low level will have no problem with ‘cannabis’, ‘stoned’ and ‘high’, even if these are low frequency. First language, however, is a reasonably reliable indicator as cognates can be expected to be easy. A French speaker will have no problem with ‘appreciate’, ‘unique’ and ‘symptom’. A recommendation engine that can meaningfully personalize vocabulary suggestions will, at the very least, need to consider cognates.

In short, the selection and underlining of vocabulary items, as it currently stands in Knowble, appears to serve no clear or useful function.

taskVocabulary learning

Knowble offers a very short exercise for each article. They are of three types: word completion, dictation and drag and drop (see the example). The rationale for the selection of the target items is unclear, but, in any case, these exercises are tokenistic in the extreme and are unlikely to lead to any significant learning gains. More valuable would be the possibility of exporting items into a spaced repetition flash card system.

effectiveThe claim that Knowble’s ‘learning effect is proven scientifically’ seems to me to be without any foundation. If there has been any proper research, it’s not signposted anywhere. Sure, reading lots of news articles (with a look-up function – if it works reliably) can only be beneficial for language learners, but they can do that with any decent dictionary running in the background.

Similar in many ways to en.news, which I reviewed in my last post, Knowble is another example of a technology-driven product that shows little understanding of language learning.

Last month, I wrote a post about the automated generation of vocabulary learning materials. Yesterday, I got an email from Mike Elchik, inviting me to take a look at the product that his company, WeSpeke, has developed in partnership with CNN. Called en.news, it’s a very regularly updated and wide selection of video clips and texts from CNN, which are then used to ‘automatically create a pedagogically structured, leveled and game-ified English lesson‘. Available at the AppStore and Google Play, as well as a desktop version, it’s free. Revenues will presumably be generated through advertising and later sales to corporate clients.

With 6.2 million dollars in funding so far, WeSpeke can leverage some state-of-the-art NLP and AI tools. Co-founder and chief technical adviser of the company is Jaime Carbonell, Director of the Language Technologies Institute at Carnegie Mellon University, described in Wikipedia as one of the gurus of machine learning. I decided to have a closer look.

home_page

Users are presented with a menu of CNN content (there were 38 items from yesterday alone), these are tagged with broad categories (Politics, Opinions, Money, Technology, Entertainment, etc.) and given a level, ranging from 1 to 5, although the vast majority of the material is at the two highest levels.

menu.jpg

I picked two lessons: a reading text about Mark Zuckerberg’s Congressional hearing (level 5) and a 9 minute news programme of mixed items (level 2 – illustrated above). In both cases, the lesson begins with the text. With the reading, you can click on words to bring up dictionary entries from the Collins dictionary. With the video, you can activate captions and again click on words for definitions. You can also slow down the speed. So far, so good.

There then follows a series of exercises which focus primarily on a set of words that have been automatically selected. This is where the problems began.

Level

It’s far from clear what the levels (1 – 5) refer to. The Zuckerberg text is 930 words long and is rated as B2 by one readability tool. But, using the English Profile Text Inspector, there are 19 types at C1 level, 14 at C2, and 98 which are unlisted. That suggests something substantially higher than B2. The CNN10 video is delivered at breakneck speed (as is often the case with US news shows). Yes, it can be slowed down, but that still won’t help with some passages, such as the one below:

A squirrel recently fell out of a tree in Western New York. Why would that make news?Because she bwoke her widdle leg and needed a widdle cast! Yes, there are casts for squirrels, as you can see in this video from the Orphaned Wildlife Center. A windstorm knocked the animal’s nest out of a tree, and when a woman saw that the baby squirrel was injured, she took her to a local vet. Doctors say she’s going to be just fine in a couple of weeks. Well, why ‘rodent’ she be? She’s been ‘whiskered’ away and cast in both a video and a plaster. And as long as she doesn’t get too ‘squirrelly’ before she heals, she’ll have quite a ‘tail’ to tell.

It’s hard to understand how a text like this got through the algorithms. But, as materials writers know, it is extremely hard to find authentic text that lends itself to language learning at anything below C1. On the evidence here, there is still some way to go before the process of selection can be automated. It may well be the case that CNN simply isn’t a particularly appropriate source.

Target learning items

The primary focus of these lessons is vocabulary learning, and it’s vocabulary learning of a very deliberate kind. Applied linguists are in general agreement that it makes sense for learners to approach the building of their L2 lexicon in a deliberate way (i.e. by studying individual words) for high-frequency items or items that can be identified as having a high surrender value (e.g. items from the AWL for students studying in an EMI context). Once you get to items that are less frequent than, say, the top 8,000 most frequent words, the effort expended in studying new words needs to be offset against their usefulness. Why spend a lot of time studying low frequency words when you’re unlikely to come across them again for some time … and will probably forget them before you do? Vocabulary development at higher levels is better served by extensive reading (and listening), possibly accompanied by glosses.

The target items in the Zuckerberg text were: advocacy, grilled, handicapping, sparked, diagnose, testified, hefty, imminent, deliberative and hesitant. One of these ‘grilled‘ is listed as A2 by English Vocabulary Profile, but that is with its literal, not metaphorical, meaning. Four of them are listed as C2 and the remaining five are off-list. In the CNN10 video, the target items were: strive, humble (verb), amplify, trafficked, enslaved, enacted, algae, trafficking, ink and squirrels. Of these, one is B1, two are C2 and the rest are unlisted. What is the point of studying these essentially random words? Why spend time going through a series of exercises that practise these items? Wouldn’t your time be better spent just doing some more reading? I have no idea how the automated selection of these items takes place, but it’s clear that it’s not working very well.

Practice exercises

There is plenty of variety of task-type but there are,  I think, two reasons to query the claim that these lessons are ‘pedagogically structured’. The first is the nature of the practice exercises; the second is the sequencing of the exercises. I’ll restrict my observations to a selection of the tasks.

1. Users are presented with a dictionary definition and an anagrammed target item which they must unscramble. For example:

existing for the purpose of discussing or planning something     VLREDBETEIIA

If you can’t solve the problem, you can always scroll through the text to find the answer. Burt the problem is in the task design. Dictionary definitions have been written to help language users decode a word. They simply don’t work very well when they are used for another purpose (as prompts for encoding).

2. Users are presented with a dictionary definition for which they must choose one of four words. There are many potential problems here, not the least of which is that definitions are often more complex than the word they are defining, or they present other challenges. As an example: cause to be unpretentious for to humble. On top of that, lexicographers often need or choose to embed the target item in the definition. For example:

a hefty amount of something, especially money, is very large

an event that is imminent, especially an unpleasant one, will happen very soon

When this is the case, it makes no sense to present these definitions and ask learners to find the target item from a list of four.

The two key pieces of content in this product – the CNN texts and the Collins dictionaries – are both less than ideal for their purposes.

3. Users are presented with a box of jumbled words which they must unscramble to form sentences that appeared in the text.

Rearrange_words_to_make_sentences

The sentences are usually long and hard to reconstruct. You can scroll through the text to find the answer, but I’m unclear what the point of this would be. The example above contains a mistake (vie instead of vice), but this was one of only two glitches I encountered.

4. Users are asked to select the word that they hear on an audio recording. For example:

squirreling     squirrel     squirreled     squirrels

Given the high level of challenge of both the text and the target items, this was a rather strange exercise to kick off the practice. The meaning has not yet been presented (in a matching / definition task), so what exactly is the point of this exercise?

5. Users are presented with gapped sentences from the text and asked to choose the correct grammatical form of the missing word. Some of these were hard (e.g. adjective order), others were very easy (e.g. some vs any). The example below struck me as plain weird for a lesson at this level.

________ have zero expectation that this Congress is going to make adequate changes. (I or Me ?)

6. At the end of both lessons, there were a small number of questions that tested your memory of the text. If, like me, you couldn’t remember all that much about the text after twenty minutes of vocabulary activities, you can scroll through the text to find the answers. This is not a task type that will develop reading skills: I am unclear what it could possibly develop.

Overall?

Using the lessons on offer here wouldn’t do a learner (as long as they already had a high level of proficiency) any harm, but it wouldn’t be the most productive use of their time, either. If a learner is motivated to read the text about Zuckerberg, rather than do lots of ‘busy’ work on a very odd set of words with gap-fills and matching tasks, they’d be better advised just to read the text again once or twice. They could use a look-up for words they want to understand and import them into a flashcard system with spaced repetition (en.news does have flashcards, but there’s no sign of spaced practice yet). More, they could check out another news website and read / watch other articles on the same subject (perhaps choosing websites with a different slant to CNN) and get valuable narrow-reading practice in this way.

My guess is that the technology has driven the product here, but without answering the fundamental questions about which words it’s appropriate for individual learners to study in a deliberate way and how this is best tackled, it doesn’t take learners very far.

 

 

 

 

Chatbots

Posted: December 1, 2016 in ed tech
Tags: , , , , , , , ,

 

Here’s a conversation I had recently with Cleverbot.

cleverbot

User (me): Hi Cleverbot. What are your views on the future of chatbots in language learning?

Cleverbot: Are you talking to yourself?

User: No, I’m asking you a question.

Cleverbot: Then what’s the question again? I forgot.

User: What are your views on the future of chatbots in language learning?

Cleverbot: Don’t you want me to finish my story?

User: No, I want you to answer my question.

Cleverbot: What is your question?

User: Do you think there is a future for chatbots in language education?

Cleverbot: Sure, if you are talking about the moon.

aliceI also tried to have a conversation with Alice, the avatar from EFL Classroom Bot, listed by Larry Ferlazzo as one of ‘the best online chatbots for practicing English’. I didn’t get any more sense out of her than out of Cleverbot.

Chatbots, apparently, are the next big thing. Again. David Mattin, head of trends and insights at trendwatching.com, writes (in the September 2016 issue of ‘Business Life’) that ‘the chatbot revolution is coming’ and that chatbots are a step towards the dream of an interface between user and technology that is so intuitive that the interface ‘simply fades away’. Chatbots have been around for some time. Remember Clippy – the Microsoft Office bot in the late 1990s – which you had to disable in order to stop yourself punching your computer screen? Since then, bots have become ubiquitous. There have been problems, such as Microsoft’s Tay bot that had to be taken down after sixteen hours earlier this year, when, after interacting with other Twitter users, it developed into an abusive Nazi. But chatbots aren’t going away and you’ve probably interacted with one to book a taxi, order food or attempt to talk to your bank. In September this year, the Guardian described them as ‘the talk of the town’ and ‘hot property in Silicon Valley’.

The real interest in chatbots is not, however, in the ‘exciting interface’ possibilities (both user interface and user experience remain pretty crude), but in the way that they are leaner, sit comfortably with the things we actually do on a phone and the fact that they offer a way of cutting out the high fees that developers have to pay to app stores . After so many start-up failures, chatbots offer a glimmer of financial hope to developers.

It’s no surprise, of course, to find the world of English language teaching beginning to sit up and take notice of this technology. A 2012 article by Ben Lehtinen in PeerSpectives enthuses about the possibilities in English language learning and reports the positive feedback of the author’s own students. ELTJam, so often so quick off the mark, developed an ELT Bot over the course of a hackathon weekend in March this year. Disappointingly, it wasn’t really a bot – more a case of humans pretending to be a bot pretending to be humans – but it probably served its exploratory purpose. duolingoAnd a few months ago Duolingo began incorporating bots. These are currently only available for French, Spanish and German learners in the iPhone app, so I haven’t been able to try it out and evaluate it. According to an infomercial in TechCrunch, ‘to make talking to the bots a bit more compelling, the company tried to give its different bots a bit of personality. There’s Chef Robert, Renee the Driver and Officer Ada, for example. They will react differently to your answers (and correct you as necessary), but for the most part, the idea here is to mimic a real conversation. These bots also allow for a degree of flexibility in your answers that most language-learning software simply isn’t designed for. There are plenty of ways to greet somebody, for example, but most services will often only accept a single answer. When you’re totally stumped for words, though, Duolingo offers a ‘help my reply’ button with a few suggested answers.’ In the last twelve months or so, Duolingo has considerably improved its ability to recognize multiple correct ways of expressing a particular idea, and its ability to recognise alternative answers to its translation tasks. However, I’m highly sceptical about its ability to mimic a real conversation any better than Cleverbot or Alice the EFL Bot, or its ability to provide systematically useful corrections.

My reasons lie in the current limitations of AI and NLP (Natural Language Processing). In a nutshell, we simply don’t know how to build a machine that can truly understand human language. Limited exchanges in restricted domains can be done pretty well (such as the early chatbot that did a good job of simulating an encounter with an evasive therapist, or, more recently ordering a taco and having a meaningless, but flirty conversation with a bot), but despite recent advances in semantic computing, we’re a long way from anything that can mimic a real conversation. As Audrey Watters puts it, we’re not even close.

When it comes to identifying language errors made by language learners, we’re not really much better off. Apps like Grammarly are not bad at identifying grammatical errors (but not good enough to be reliable), but pretty hopeless at dealing with lexical appropriacy. Much more reliable feedback to learners can be offered when the software is trained on particular topics and text types. Write & Improve does this with a relatively small selection of Cambridge English examination tasks, but a free conversation ….? Forget it.

So, how might chatbots be incorporated into language teaching / learning? A blog post from December 2015 entitled AI-powered chatbots and the future of language learning suggests one plausible possibility. Using an existing messenger service, such as WhatsApp or Telegram, an adaptive chatbot would send tasks (such as participation in a conversation thread with a predetermined topic, register, etc., or pronunciation practice or translation exercises) to a learner, provide feedback and record the work for later recycling. At the same time, the bot could send out reminders of work that needs to be done or administrative tasks that must be completed.

Kat Robb has written a very practical article about using instant messaging in English language classrooms. Her ideas are interesting (although I find the idea of students in a F2F classroom messaging each other slightly bizarre) and it’s easy to imagine ways in which her activities might be augmented with chatbot interventions. The Write & Improve app, mentioned above, could deploy a chatbot interface to give feedback instead of the flat (and, in my opinion, perfectly adequate) pop-up boxes currently in use. Come to think of it, more or less any digital language learning tool could be pimped up with a bot. Countless revisions can be envisioned.

But the overwhelming question is: would it be worth it? Bots are not likely, any time soon, to revolutionise language learning. What they might just do, however, is help to further reduce language teaching to a series of ‘mechanical and scripted gestures’. More certain is that a lot of money will be thrown down the post-truth edtech drain. Then, in the not too distant future, this latest piece of edtech will fall into the trough of disillusionment, to be replaced by the latest latest thing.