Archive for the ‘video’ Category

My attention was recently drawn (thanks to Grzegorz Śpiewak) to a recent free publication from OUP. It’s called ‘Multimodality in ELT: Communication skills for today’s generation’ (Donaghy et al., 2023) and it’s what OUP likes to call a ‘position paper’: it offers ‘evidence-based recommendations to support educators and learners in their future success’. Its topic is multimodal (or multimedia) literacy, a term used to describe the importance for learners of being able ‘not just to understand but to create multimedia messages, integrating text with images, sounds and video to suit a variety of communicative purposes and reach a range of target audiences’ (Dudeney et al., 2013: 13).

Grzegorz noted the author of this paper’s ‘positively charged, unhedged language to describe what is arguably a most complex problem area’. As an example, he takes the summary of the first section and circles questionable and / or unsubstantiated claims. It’s just one example from a text that reads more like a ‘manifesto’ than a balanced piece of evidence-reporting. The verb ‘need’ (in the sense of ‘must’, as in ‘teachers / learners / students need to …’) appears no less than 57 times. The modal ‘should’ (as in ‘teachers / learners / students should …’) clocks up 27 appearances.

What is it then that we all need to do? Essentially, the argument is that English language teachers need to develop their students’ multimodal literacy by incorporating more multimodal texts and tasks (videos and images) in all their lessons. The main reason for this appears to be that, in today’s digital age, communication is more often multimodal than not (i.e. monomodal written or spoken text). As an addendum, we are told that multimodal classroom practices are a ‘fundamental part of inclusive teaching’ in classes with ‘learners with learning difficulties and disabilities’. In case you thought it was ironic that such an argument would be put forward in a flat monomodal pdf, OUP also offers the same content through a multimodal ‘course’ with text, video and interactive tasks.

It might all be pretty persuasive, if it weren’t so overstated. Here are a few of the complex problem areas.

What exactly is multimodal literacy?

We are told in the paper that there are five modes of communication: linguistic, visual, aural, gestural and spatial. Multimodal literacy consists, apparently, of the ability

  • to ‘view’ multimodal texts (noticing the different modes, and, for basic literacy, responding to the text on an emotional level, and, for more advanced literacy, respond to it critically)
  • to ‘represent’ ideas and information in a multimodal way (posters, storyboards, memes, etc.)

I find this frustratingly imprecise. First: ‘viewing’. Noticing modes and reacting emotionally to a multimedia artefact do not take anyone very far on the path towards multimodal literacy, even if they are necessary first steps. It is only when we move towards a critical response (understanding the relative significance of different modes and problematizing our initial emotional response) that we can really talk about literacy (see the ‘critical literacy’ of Pegrum et al., 2018). We’re basically talking about critical thinking, a concept as vague and contested as any out there. Responding to a multimedia artefact ‘critically’ can mean more or less anything and everything.

Next: ‘representing’. What is the relative importance of ‘viewing’ and ‘representing’? What kinds of representations (artefacts) are important, and which are not? Presumably, they are not all of equal importance. And, whichever artefact is chosen as the focus, a whole range of technical skills will be needed to produce the artefact in question. So, precisely what kind of representing are we talking about?

Priorities in the ELT classroom

The Oxford authors write that ‘the main focus as English language teachers should obviously be on language’. I take this to mean that the ‘linguistic mode’ of communication should be our priority. This seems reasonable, since it’s hard to imagine any kind of digital literacy without some reading skills preceding it. But, again, the question of relative importance rears its ugly head. The time available for language leaning and teaching is always limited. Time that is devoted to the visual, aural, gestural or spatial modes of communication is time that is not devoted to the linguistic mode.

There are, too, presumably, some language teaching contexts (I’m thinking in particular about some adult, professional contexts) where the teaching of multimodal literacy would be completely inappropriate.

Multimodal literacy is a form of digital literacy. Writers about digital literacies like to say things like ‘digital literacies are as important to language learning as […] reading and writing skills’ or it is ‘crucial for language teaching to […] encompass the digital literacies which are increasingly central to learners’ […] lives’ (Pegrum et al, 2022). The question then arises: how important, in relative terms, are the various digital literacies? Where does multimodal literacy stand?

The Oxford authors summarise their view as follows:

There is a need for a greater presence of images, videos, and other multimodal texts in ELT coursebooks and a greater focus on using them as a starting point for analysis, evaluation, debate, and discussion.

My question to them is: greater than what? Typical contemporary courseware is already a whizzbang multimodal jamboree. There seem to me to be more pressing concerns with most courseware than supplementing it with visuals or clickables.

Evidence

The Oxford authors’ main interest is unquestionably in the use of video. They recommend extensive video viewing outside the classroom and digital story-telling activities inside. I’m fine with that, so long as classroom time isn’t wasted on getting to grips with a particular digital tool (e.g. a video editor, which, a year from now, will have been replaced by another video editor).

I’m fine with this because it involves learners doing meaningful things with language, and there is ample evidence to indicate that a good way to acquire language is to do meaningful things with it. However, I am less than convinced by the authors’ claim that such activities will strengthen ‘active and critical viewing, and effective and creative representing’. My scepticism derives firstly from my unease about the vagueness of the terms ‘viewing’ and ‘representing’, but I have bigger reservations.

There is much debate about the extent to which general critical thinking can be taught. General critical viewing has the same problems. I can apply critical viewing skills to some topics, because I have reasonable domain knowledge. In my case, it’s domain knowledge that activates my critical awareness of rhetorical devices, layout, choice of images and pull-out quotes, multimodal add-ons and so on. But without the domain knowledge, my critical viewing skills are likely to remain uncritical.

Perhaps most importantly of all, there is a lack of reliable research about ‘the extent to which language instructors should prioritize multimodality in the classroom’ (Kessler, 2022: 552). There are those, like the authors of this paper, who advocate for a ‘strong version’ of multimodality. Others go for a ‘weak version’ ‘in which non-linguistic modes should only minimally support or supplement linguistic instruction’ (Kessler, 2022: 552). And there are others who argue that multimodal activities may actually detract from or stifle L2 development (e.g. Manchón, 2017). In the circumstances, all the talk of ‘needs to’ and ‘should’ is more than a little premature.

Assessment

The authors of this Oxford paper rightly note that, if we are to adopt a multimodal approach, ‘it is important that assessment requirements take into account the multimodal nature of contemporary communication’. The trouble is that there are no widely used assessments (to my knowledge) that do this (including Oxford’s own tests). English language reading tests (like the Oxford Test of English) measure the comprehension of flat printed texts, as a proxy for reading skills. This is not the place to question the validity of such reading tests. Suffice to say that ‘little consensus exists as to what [the ability to read another language] entails, how it develops, and how progress in development can be monitored and fostered’ (Koda, 2021).

No doubt there are many people beavering away at trying to figure out how to assess multimodal literacy, but the challenges they face are not negligible. Twenty-first century digital (multimodal) literacy includes such things as knowing how to change the language of an online text to your own (and vice versa), how to bring up subtitles, how to convert written text to speech, how to generate audio scripts. All such skills may well be very valuable in this digital age, and all of them limit the need to learn another language.

Final thoughts

I can’t help but wonder why Oxford University Press should bring out a ‘position paper’ that is so at odds with their own publishing and assessing practices, and so at odds with the paper recently published in their flagship journal, ELT Journal. There must be some serious disconnect between the Marketing Department, which commissions papers such as these, and other departments within the company. Why did they allow such overstatement, when it is well known that many ELT practitioners (i.e. their customers) have the view that ‘linguistically based forms are (and should be) the only legitimate form of literacy’ (Choi & Yi, 2016)? Was it, perhaps, the second part of the title of this paper that appealed to the marketing people (‘Communication Skills for Today’s Generation’) and they just thought that ‘multimodality’ had a cool, contemporary ring to it? Or does the use of ‘multimodality’ help the marketing of courses like Headway and English File with additional multimedia bells and whistles? As I say, I can’t help but wonder.

If you want to find out more, I’d recommend the ELT Journal article, which you can access freely without giving your details to the marketing people.

Finally, it is perhaps time to question the logical connection between the fact that much reading these days is multimodal and the idea that multimodal literacy should be taught in a language classroom. Much reading that takes place online, especially with multimodal texts, could be called ‘hyper reading’, characterised as ‘sort of a brew of skimming and scanning on steroids’ (Baron, 2021: 12). Is this the kind of reading that should be promoted with language learners? Baron (2021) argues that the answer to this question depends on the level of reading skills of the learner. The lower the level, the less beneficial it is likely to be. But for ‘accomplished readers with high levels of prior knowledge about the topic’, hyper-reading may be a valuable approach. For many language learners, monomodal deep reading, which demands ‘slower, time-demanding cognitive and reflective functions’ (Baron, 2021: x – xi) may well be much more conducive to learning.

References

Baron, N. S. (2021) How We Read Now. Oxford: Oxford University Press

Choi, J. & Yi, Y. (2016) Teachers’ Integration of Multimodality into Classroom Practices for English Language Learners’ TESOL Journal, 7 (2): 3-4 – 327

Donaghy, K. (author), Karastathi, S. (consultant), Peachey, N. (consultant), (2023). Multimodality in ELT: Communication skills for today’s generation [PDF]. Oxford University Press. https://elt.oup.com/feature/global/expert/multimodality (registration needed)

Dudeney, G., Hockly, N. & Pegrum, M. (2013) Digital Literacies. Harlow: Pearson Education

Kessler, M. (2022) Multimodality. ELT Journal, 76 (4): 551 – 554

Koda, K. (2021) Assessment of Reading. https://doi.org/10.1002/9781405198431.wbeal0051.pub2

Manchón, R. M. (2017) The Potential Impact of Multimodal Composition on Language Learning. Journal of Second Language Writing, 38: 94 – 95

Pegrum, M., Dudeney, G. & Hockly, N. (2018) Digital Literacies Revisited. The European Journal of Applied Linguistics and TEFL, 7 (2): 3 – 24

Pegrum, M., Hockly, N. & Dudeney, G. (2022) Digital Literacies 2nd Edition. New York: Routledge

There’s an aspect of language learning which everyone agrees is terribly important, but no one can quite agree on what to call it. I’m talking about combinations of words, including fixed expressions, collocations, phrasal verbs and idioms. These combinations are relatively fixed, cannot always be predicted from their elements or generated by grammar rules (Laufer, 2022). They are sometimes referred to as formulaic sequences, formulaic expressions, lexical bundles or lexical chunks, among other multiword items. They matter to English language learners because a large part of English consists of such combinations. Hill (2001) suggests this may be up to 70%. More conservative estimates report 58.6% of writing and 52.3% of speech (Erman & Warren, 2000). Some of these combinations (e.g. ‘of course’, ‘at least’) are so common that they fall into lists of the 1000 most frequent lexical items in the language.

By virtue of their ubiquity and frequency, they are important both for comprehension of reading and listening texts and for the speed at which texts can be processed. This is because knowledge of these combinations ‘makes discourse relatively predictable’ (Boers, 2020). Similarly, such knowledge can significantly contribute to spoken fluency because combinations ‘can be retrieved from memory as prefabricated units rather than being assembled at the time of speaking’ (Boer, 2020).

So far, so good, but from here on, the waters get a little muddier. Given their importance, what is the best way for a learner to acquire a decent stock of them? Are they best acquired through incidental learning (through meaning-focused reading and listening) or deliberate learning (e.g. with focused exercises of flashcards)? If the former, how on earth can we help learners to make sure that they get exposure to enough combinations enough times? If the latter, what kind of practice works best and, most importantly, which combinations should be selected? With, at the very least, many tens of thousands of such combinations, life is too short to learn them all in a deliberate fashion. Some sort of triage is necessary, but how should we go about this? Frequency of occurrence would be one obvious criterion, but this merely raises the question of what kind of database should be used to calculate frequency – the spoken discourse of children will reveal very different patterns from the written discourse of, say, applied linguists. On top of that, we cannot avoid consideration of the learners’ reasons for learning the language. If, as is statistically most probable, they are learning English to use as a lingua franca, how important or relevant is it to learn combinations that are frequent, idiomatic and comprehensible in native-speaker cultures, but may be rare and opaque in many English as a Lingua Franca contexts?

There are few, if any, answers to these big questions. Research (e.g. Pellicer-Sánchez, 2020) can give us pointers, but, the bottom line is that we are left with a series of semi-informed options (see O’Keeffe et al., 2007: 58 – 99). So, when an approach comes along that claims to use software to facilitate the learning of English formulaic expressions (Lin, 2022) I am intrigued, to say the least.

The program is, slightly misleadingly, called IdiomsTube (https://www.idiomstube.com). A more appropriate title would have been IdiomaticityTube (as it focuses on ‘speech formulae, proverbs, sayings, similes, binomials, collocations, and so on’), but I guess ‘idioms’ is a more idiomatic word than ‘idiomaticity’. IdiomsTube allows learners to choose any English-captioned video from YouTube, which is then automatically analysed to identify from two to six formulaic expressions that are presented to the learner as learning objects. Learners are shown these items; the items are hyperlinked to (good) dictionary entries; learners watch the video and are then presented with a small variety of practice tasks. The system recommends particular videos, based on an automated analysis of their difficulty (speech rate and a frequency count of the lexical items they include) and on recommendations from previous users. The system is gamified and, for class use, teachers can track learner progress.

When an article by the program’s developer, Phoebe Lin, (in my view, more of an advertising piece than an academic one) came out in the ReCALL journal, she tweeted that she’d love feedback. I reached out but didn’t hear back. My response here is partly an evaluation of Dr Lin’s program, partly a reflection on how far technology can go in solving some of the knotty problems of language learning.

Incidental and deliberate learning

Researchers have long been interested in looking for ways of making incidental learning of lexical items more likely to happen (Boers, 2021: 39 ff.), of making it more likely that learners will notice lexical items while focusing on the content of a text. Most obviously, texts can be selected, written or modified so they contain multiple instances of a particular item (‘input flooding’). Alternatively, texts can be typographically enhanced so that particular items are highlighted in some way. But these approaches are not possible when learners are given the freedom to select any video from YouTube and when the written presentations are in the form of YouTube captions. Instead, IdiomsTube presents the items before the learner watches the video. They are, in effect, told to watch out for these items in advance. They are also given practice tasks after viewing.

The distinction between incidental and deliberate vocabulary learning is not always crystal-clear. In this case, it seems fairly clear that the approach is more slanted to deliberate learning, even though the selection of video by the learner is determined by a focus on content. Whether this works or not will depend on (1) the level-appropriacy of the videos that the learner watches, (2) the effectiveness of the program in recommending / identifying appropriate videos, (3) the ability of the program to identify appropriate formulaic expressions as learning targets in each video, and (4) the ability of the program to generate appropriate practice of these items.

Evaluating the level of YouTube videos

What makes a video easy or hard to understand? IdiomsTube attempts this analytical task by calculating (1) the speed of the speech and (2) the difficulty of the lexis as determined by the corpus frequency of these items. This gives a score out of five for each category (speed and difficulty). I looked at fifteen videos, all of which were recommended by the program. Most of the ones I looked at were scored at Speed #3 and Difficulty #1. One that I looked at, ‘Bruno Mars Carpool Karaoke’, had a speed of #2 and a difficulty of #1 (i.e. one of the easiest). The video is 15 minutes long. Here’s an extract from the first 90 seconds:

Let’s set this party off right, put yo’ pinky rings up to the moon, twenty four karat magic in the air, head to toe soul player, second verse for the hustlas, gangstas, bad bitches and ya ugly ass friends, I gotta show how a pimp get it in, and they waking up the rocket why you mad

Whoa! Without going into details, it’s clear that something has gone seriously wrong. Evaluating the difficulty of language, especially spoken language, is extremely complex (not least because there’s no objective measure of such a thing). It’s not completely dissimilar to the challenge of evaluating the accuracy, appropriacy and level of sophistication of a learner’s spoken language, and we’re a long way from being able to do that with any acceptable level of reliability. At least, we’re a long, long way from being able to do it well when there are no constraints on the kind of text (which is the case when taking the whole of YouTube as a potential source). Especially if we significantly restrict topic and text type, we can train software to do a much better job. However, this will require human input: it cannot be automated without.

The length of these 15 videos ranged from 3.02 to 29.27 minutes, with the mean length being about 10 minutes, and the median 8.32 minutes. Too damn long.

Selecting appropriate learning items

The automatic identification of formulaic language in a text presents many challenges: it is, as O’Keeffe et al. (2007: 82) note, only partially possible. A starting point is usually a list, and IdiomsTube begins with a list of 53,635 items compiled by the developer (Lin, 2022) over a number of years. The software has to match word combinations in the text to items in the list, and has to recognise variant forms. Formulaic language cannot always be identified just by matching to lists of forms: a piece of cake may just be a piece of cake, and therefore not a piece of cake to analyse. 53,365 items may sound like a lot, but a common estimate of the number of idioms in English is 25,000. The number of multiword units is much, much higher. 53,365 is not going to be enough for any reliable capture.

Since any given text is likely to contain a lot of formulaic language, the next task is to decide how to select for presentation (i.e. as learning objects) from those identified. The challenge is, as Lin (2022) remarks, both technical and theoretical: how can frequency and learnability be measured? There are no easy answers, and the approach of IdiomsTube is, by its own admission, crude. The algorithm prioritises longer items that contain lower frequency single items, and which have a low frequency of occurrence in a corpus of 40,000 randomly-sampled YouTube videos. The aim is to focus on formulaic language that is ‘more challenging in terms of composition (i.e. longer and made up of more difficult words) and, therefore, may be easier to miss due to their infrequent appearance on YouTube’. My immediate reaction is to question whether this approach will not prioritise items that are not worth the bother of deliberate learning in the first place.

The proof is in the proverbial pudding, so I looked at the learning items that were offered by my sample of 15 recommended videos. Sadly, IdiomsTube does not even begin to cut the mustard. The rest of this section details why the selection was so unsatisfactory: you may want to skip this and rejoin me at the start of the next section.

  • In total 85 target items were suggested. Of these 39 (just under half) were not fixed expressions. They were single items. Some of these single items (e.g. ‘blog’ and ‘password’ would be extremely easy for most learners). Of the others, 5 were opaque idioms (the most prototypical kind of idiom), the rest were collocations and fixed (but transparent) phrases and frames.
  • Some items (e.g. ‘I rest my case’) are limited in terms of the contexts in which they can be appropriately used.
  • Some items did not appear to be idiomatic in any way. ‘We need to talk’ and ‘able to do it’, for example, are strange selections, compared to others in their respective lists. They are also very ‘easy’: if you don’t readily understand items like these, you wouldn’t have a hope in hell of understanding the video.
  • There were a number of errors in the recommended target items. Errors included duplication of items within one set (‘get in the way’ + ‘get in the way of something’), misreading of an item (‘the shortest’ misread as ‘the shorts’), mislabelling of an item (‘vend’ instead of ‘vending machine’), linking to the wrong dictionary entry (e.g. ‘mini’ links to ‘miniskirt’, although in the video ‘mini’ = ‘small’, or, in another video, ‘stoke’ links to ‘stoked’, which is rather different!).
  • The selection of fixed expressions is sometimes very odd. In one video, the following items have been selected: get into an argument, vend, from the ground up, shovel, we need to talk, prefecture. The video contains others which would seem to be better candidates, including ‘You can’t tell’ (which appears twice), ‘in charge of’, ‘way too’ (which also appears twice), and ‘by the way’. It would seem, therefore, that some inappropriate items are selected, whilst other more appropriate ones are omitted.
  • There is a wide variation in the kind of target item. One set, for example, included: in order to do, friction, upcoming, run out of steam, able to do it, notification. Cross-checking with Pearson’s Global Scale of English, we have items ranging from A2 to C2+.

The challenges of automation

IdiomsTube comes unstuck on many levels. It fails to recommend appropriate videos to watch. It fails to suggest appropriate language to learn. It fails to provide appropriate practice. You wouldn’t know this from reading the article by Phoebe Lin in the ReCALL journal which does, however, suggest that ‘further improvements in the design and functions of IdiomsTube are needed’. Necessary they certainly are, but the interesting question is how possible they are.

My interest in IdiomsTube comes from my own experience in an app project which attempted to do something not completely dissimilar. We wanted to be able to evaluate the idiomaticity of learner-generated language, and this entailed identifying formulaic patterns in a large corpus. We wanted to develop a recommendation engine for learning objects (i.e. the lexical items) by combining measures of frequency and learnability. We wanted to generate tasks to practise collocational patterns, by trawling the corpus for contexts that lent themselves to gapfills. With some of these challenges, we failed. With others, we found a stopgap solution in human curation, writing and editing.

IdiomsTube is interesting, not because of what it tells us about how technology can facilitate language learning. It’s interesting because it tells us about the limits of technological applications to learning, and about the importance of sorting out theoretical challenges before the technical ones. It’s interesting as a case study is how not to go about developing an app: its ‘special enhancement features such as gamification, idiom-of-the-day posts, the IdiomsTube Teacher’s interface and IdiomsTube Facebook and Instagram pages’ are pointless distractions when the key questions have not been resolved. It’s interesting as a case study of something that should not have been published in an academic journal. It’s interesting as a case study of how techno-enthusiasm can blind you to the possibility that some learning challenges do not have solutions that can be automated.

References

Boers, F. (2020) Factors affecting the learning of multiword items. In Webb, S. (Ed.) The Routledge Handbook of Vocabulary Studies. Abingdon: Routledge. pp. 143 – 157

Boers, F. (2021) Evaluating Second Language Vocabulary and Grammar Instruction. Abingdon: Routledge

Erman, B. & Warren, B. (2000) The idiom principle and the open choice principle. Text, 20 (1): pp. 29 – 62

Hill, J. (2001) Revising priorities: from grammatical failure to collocational success. In Lewis, M. (Ed.) Teaching Collocation: further development in the Lexical Approach. Hove: LTP. Pp.47- 69

Laufer, B. (2022) Formulaic sequences and second language learning. In Szudarski, P. & Barclay, S. (Eds.) Vocabulary Theory, Patterning and Teaching. Bristol: Multilingual Matters. pp. 89 – 98

Lin, P. (2022). Developing an intelligent tool for computer-assisted formulaic language learning from YouTube videos. ReCALL 34 (2): pp.185–200.

O’Keeffe, A., McCarthy, M. & Carter, R. (2007) From Corpus to Classroom. Cambridge: Cambridge University Press

Pellicer-Sánchez, A. (2020) Learning single words vs. multiword items. In Webb, S. (Ed.) The Routledge Handbook of Vocabulary Studies. Abingdon: Routledge. pp. 158 – 173

The VR experience is nothing if it is not immersive, and in language learning, the value of immersion in VR is seen to be the way in which it can lead to what we might call ‘engagement’ or ‘flow’. Fully immersed in a VR world, learning can be maximized, or so the thinking goes (Lan, 2020; Chen & Hsu, 2020). ‘By blocking out visual and auditory distractions in the classroom, VR has the potential to help students deeply connect with the material’ (Gadelha, 2018). ‘There are no distracting classroom windows to stare out of when students are directly immersed into the topic they are investigating’ (Bonner & Reinders, 2018: 36). Such is the allure of immersion that it is no surprise to find the word in the names of VR language learning products like Immerse and ImmerseMe (although the nod to bilingual immersion progammes (such as those in Canada) is an added bonus).

There is, however, immersion and immersion. A common categorisation of VR is into:

  • non-immersive (e.g. a desktop game with a 2D screen and avatars)
  • semi-immersive (e.g. high-end arcade games and flight simulators with large projections)
  • fully immersive (e.g. with a head-mounted display, headphones, body sensors)

Taking things a little further is the possibility of directly inducing responses in the nervous system with molecular nanotechnology. We’re some way off that, but, fear not, people are working on it. At this point, it’s worth noting that this hierarchy of immersivity is driven by technological considerations: more tech = more immersion.

In ELT, the most common VR applications are currently at the low end of this scale. Probably the most talked about currently is the use of 3600 photography and a very simple headset like Google Cardboard, along with headphones, to take students on virtual field trips – anywhere from a museum or a Disney castle to a coral reef or outer space. See Raquel Ribeiro’s blog post for CUP for more ideas. Then, there are self-study packages, like Velawoods, which is a sort of combination of the SIMS with interaction made possible through speech recognition. The syllabus will be familiar to anyone used to using a contemporary coursebooks.

And, now, up a technological notch or two, is Immerse, which requires an Oculus headset. It appears to be a sort of Second Life where language learners can interact with each other and a trainer in a number of role plays, set in, for example, a garden barbecue, a pool bar, a conference or a deserted island. In addition to interacting with each other, students can interact with virtual objects, picking up darts and throw them at questions they want to focus on, for example. ‘Total physical engagement with the environment’ is how this is described by Immerse’s Chief Product Office. You can find out more in this promotional video.

Paul Driver has suggested that the evolution of VR can be ‘traced back through time as a constant struggle to create more immersive experiences. From the intricate scrolls of twelfth-century China, the huge panoramic paintings of the nineteenth century and early experiments in stereoscopic photography, to the promising but over-hyped 1990s arcade machines (which raised hopes and then dashed expectations for a whole generation), the history of virtual reality has been a meandering march forward, punctuated with long periods of stagnation’. Immerse may be fairly sophisticated as a VR language learning platform, but it has a long way to go as an immersive environment in comparison to games like Meeting Rembrandt: Master of Reality or Project VR Fishing. Its animations are crude and clunky, its scenarios short of detail.

But however ‘lifelike’ games like these are, their immersive potential is extremely limited if you have no interest in Rembrandt or fishing. VR is only as immersive as the intrinsic interest of (1) the ‘real world’ it is attempting to replicate, and (2) what you can do in it. The novelty factor may hold attention for a while, but not for long.

With simpler 3600 Google Cardboard versions of VR, you can’t actually do anything in the VR world besides watch, listen and marvel, so the intrinsic interest of the content is even more important. I quite like exploring the Okavango Delta, but I have no interest in rollercoasters or parachute jumps. But, to be immersed, I don’t actually need the 3600 experience at all, if the quality of the video is good enough. In many ways, I prefer an old-fashioned screen where my hands are not tied up with holding the phone into the Cardboard and the Cardboard to my nose.

3600 videos are usually short, and I can see how they can be used in a language class as a springboard for other work. But as a language learning tool, old-fashioned screens (with good content) may offer more potential than headsets (whether Cardboard or Oculus) because we can do other things (like communicate with other people, use a dictionary or take notes) at the same time.

VR technology in language learning cannot, therefore, (whatever its claims) generate immersion or engagement on its own. For the time being, it can, for some, captivate initial curiosity. For others, already used to high-end Oculus games, programmes like Immerse are more likely to generate a resounding ‘meh’. Engagement in learning is a highly complex phenomenon. Mercer and Dörnyei (2020: 102 ff.) argue that engaging learning materials must be designed for particular groups of learners (in terms of level and interests, for example) and they must get learners emotionally invested. Improvements in VR technology won’t really change anything.

VR is already well established and successful in some forms of education: military, healthcare and engineering, especially. Virtual reality is obviously a good place to learn how to defuse a bomb or carry out keyhole surgery. In other areas, such as soft skills training in corporate contexts, its use is growing, but its effectiveness is much less clear. In language learning, the purported advantages of VR (see, for example, Alizadeh, 2019, which has a useful bibliography, or Lloyd et al., 2017) are not convincing. There is no problem in language learning for which VR is the solution. This doesn’t mean that VR does not have a place in language learning / teaching. VR field trips may offer occasional moments of variety. Conversation in VR worlds like Facebook Spaces may be welcomed by some. And there will be markets for dedicated platforms like Velawoods, Mondly or Immerse.

Predictions about edtech are often thinly disguised attempts to accelerate a predicted future. Four years ago I went to a conference presentation by Saul Nassé, Chief Executive of Cambridge Assessment. All the participants were given a Cambridge branded Google Cardboard. At the time, Nassé wrote the following:

The technology is only going to get better and cheaper. In two or three years it will be wireless and cost less than a smart phone. That’s the point when you’ll see whole classrooms equipped with VR. And I like to think we’ll find a way of Cambridge English content being used in those classrooms, with people learning English in a whole new way. It may have been a long time coming, but I think the VR revolution is now truly here to stay’.

The message was echoed in Lloyd et al (2017), all three of whom worked for Cambridge Assessment, and amplified in a series of blog posts and conference presentations around that time. Since then, it has all gone rather quiet. There are still people out there (including the investors who have just pumped $1.5 million into Immerse in Series A funding), who believe that VR will be the next big thing in language learning. But edtech investors have a long track record of turning a blind eye to history. VR, as Saul Nassé observed, ‘has been the next big thing for thirty years’. And maybe for the next thirty years, too.

REFERENCES

Alizadeh, M. (2019). Augmented/virtual reality promises for ELT practitioners. In Clements, P., Krause, A. & Bennett, P. (Eds.), Diversity and inclusion. Tokyo: JALT. https://jalt-publications.org/sites/default/files/pdf-article/jalt2018-pcp-048.pdf

Bonner, E., & Reinders, H. (2018). Augmented and virtual reality in the language classroom: Practical ideas. Teaching English with Technology, 18 (3), pp. 33-53. Retrieved from https://files.eric.ed.gov/fulltext/EJ1186392.pdf

Chen, Y. L. & Hsu, C. C. (2020). Self-regulated mobile game-based English learning in a virtual reality environment. Computers and Education, 154 https://www.sciencedirect.com/science/article/abs/pii/S0360131520301093?dgcid=rss_sd_all

Gadelha, R. (2018). Revolutionizing Education: The promise of virtual reality. Childhood Education, 94 (1), pp. 40-43. doi:10.1080/00094056.2018.1420362

Lan, Y. J. (2020). Immersion, interaction and experience-oriented learning: Bringing virtual reality into FL learning. Language Learning & Technology, 24(1), pp. 1–15. http://hdl.handle.net/10125/44704

Lloyd, A., Rogerson, S. & Stead, G. (2017). Imagining the potential for using Virtual Reality technologies in language learning. In Carrier, M., Damerow, R. M. & Bailey, K. M. (Eds.) Digital Language Learning and Teaching. New York: Routledge. pp. 222 – 234

Mercer, S. & Dörnyei, Z. (2020). Engaging Language Learners in Contemporary Classrooms. Cambridge: Cambridge University Press

subtitlesAs both a language learner and a teacher, I have a number of questions about the value of watching subtitled videos for language learning. My interest is in watching extended videos, rather than short clips for classroom use, so I am concerned with incidental, rather than intentional, learning, mostly of vocabulary. My questions include:

  • Is it better to watch a video that is subtitled or unsubtitled?
  • Is it better to watch a video with L1 or L2 subtitles?
  • If a video is watched more than once, what is the best way to start and proceed? In which order (no subtitles, L1 subtitles and L2 subtitles) is it best to watch?

For help, I turned to three recent books about video and language learning: Ben Goldstein and Paul Driver’s Language Learning with Digital Video (CUP, 2015), Kieran Donaghy’s Film in Action (Delta, 2015) and Jamie Keddie’s Bringing Online Video into the Classroom (OUP, 2014). I was surprised to find no advice, but, as I explored more, I discovered that there may be a good reason for these authors’ silence.

There is now a huge literature out there on subtitles and language learning, and I cannot claim to have read it all. But I think I have read enough to understand that I am not going to find clear-cut answers to my questions.

The learning value of subtitles

It has been known for some time that the use of subtitles during extensive viewing of video in another language can help in the acquisition of that language. The main gains are in vocabulary acquisition and the development of listening skills (Montero Perez et al., 2013). This is true of both L1 subtitles (with an L2 audio track), sometimes called interlingual subtitles, (Incalcaterra McLoughlin et al, 2011) and L2 subtitles (with an L2 audio track), sometimes called intralingual subtitles or captions (Vanderplank, 1988). Somewhat more surprisingly, vocabulary gains may also come from what are called reversed subtitles (L2 subtitles and an L1 audio track) (Burczyńska, 2015). Of course, certain conditions apply for subtitled video to be beneficial, and I’ll come on to these. But there is general research agreement (an exception is Karakaş & Sariçoban, 2012) that more learning is likely to take place from watching a subtitled video in a target language than an unsubtitled one.

Opposition to the use of subtitles as a tool for language learning has mostly come from three angles. The first of these, which concerns L1 subtitles, is an antipathy to any use at all of L1. Although such an attitude remains entrenched in some quarters, there is no evidence to support it (Hall & Cook, 2012; Kerr, 2016). Researchers and, increasingly, teachers have moved on.

The second reservation that is sometimes expressed is that learners may not attend to either the audio track or the subtitles if they do not need to. They may, for example, ignore the subtitles in the case of reversed subtitles or ignore the L2 audio track when there are L1 subtitles. This can, of course, happen, but it seems that, on the whole, this is not the case. In an eye-tracking study by Bisson et al (2012), for example, it was found that most people followed the subtitles, irrespective of what kind they were. Unsurprisingly, they followed the subtitles more closely when the audio track was in a language that was less familiar. When conditions are right (see below), reading subtitles becomes a very efficient and partly automatized cognitive activity, which does not prevent people from processing the audio track at the same time (d’Ydewalle & Pavakanun, 1997).

Related to the second reservation is the concern that the two sources of information (audio and subtitles), combined with other information (images and music or sound effects), may be in competition and lead to cognitive overload, impacting negatively on both comprehension and learning. Recent research suggests that this concern is ungrounded (Kruger et al, 2014). L1 subtitles generate less cognitive load than L2 subtitles, but overload is not normally reached and mental resources are still available for learning (Baranowska, 2020). The absence of subtitles generates more cognitive load.

Conditions for learning

Before looking at the differences between L1 and L2 subtitles, it’s a good idea to look at the conditions under which learning is more likely to take place with subtitles. Some of these are obvious, others less so.

First of all, the video material must be of sufficient intrinsic interest to the learner. Secondly, the subtitles must be of a sufficiently high quality. This is not always the case with automatically generated captions, especially if the speech-to-text software struggles with the audio accent. It is also not always the case with professionally produced L1 subtitles, especially when the ‘translations are non-literal and made at the phrase level, making it hard to find connections between the subtitle text and the words in the video’ (Kovacs, 2013, cited by Zabalbeascoa et al., 2015: 112). As a minimum, standard subtitling guidelines, such as those produced for the British Channel 4, should be followed. These limit, for example, the number of characters per line to about 40 and a maximum of two lines.

For reasons that I’ll come on to, learners should be able to switch easily between L1 and L2 subtitles. They are also likely to benefit if reliably accurate glosses or hyperlinks are ‘embedded in the subtitles, making it possible for a learner to simply click for additional verbal, auditory or even pictorial glosses’ (Danan, 2015: 49).

At least as important as considerations of the materials or tools, is a consideration of what the learner brings to the activity (Frumuselu, 2019: 104). Vanderplank (2015) describes these different kinds of considerations as the ‘effects of’ subtitles on a learner and the ‘effects with’ subtitles on learner behaviour.

In order to learn from subtitles, you need to be able to read fast enough to process them. Anyone with a slow reading speed (e.g. some dyslexics) in their own language is going to struggle. Even with L1 subtitles, Vanderplank (2015: 24) estimates that it is only around the age of 10 that children can do this with confidence. Familarity with both the subject matter and with subtitle use will impact on this ability to read subtitles fast enough.

With L2 subtitles, the language proficiency of the learner related to the level of difficulty (especially lexical difficulty) of the subtitles will clearly be of some significance. It is unlikely that L2 subtitles will be of much benefit to beginners (Taylor, 2005). It also suggests that, at lower levels, materials need to be chosen carefully. On the whole, researchers have found that higher proficiency levels correlate with greater learning gains (Pujadas & Muñoz, 2019; Suárez & Gesa, 2019), but one earlier meta-analysis (Montero Perez et al., 2013) did not find that proficiency levels were significant.

Measures of general language proficiency may be too blunt an instrument to help us all of the time. I can learn more from Portuguese than from Arabic subtitles, even though I am a beginner in both languages. The degree of proximity between two languages, especially the script (Winke et al., 2010), is also likely to be significant.

But a wide range of other individual learner differences will also impact on the learning from subtitles. It is known that learners approach subtitles in varied and idiosyncratic ways (Pujolá, 2002), with some using L2 subtitles only as a ‘back-up’ and others relying on them more. Vanderplank (2019) grouped learners into three broad categories: minimal users who were focused throughout on enjoying films as they would in their L1, evolving users who showed marked changes in their viewing behaviour over time, and maximal users who tended to be experienced at using films to enhance their language learning.

Categories like these are only the tip of the iceberg. Sensory preferences, personality types, types of motivation, the impact of subtitles on anxiety levels and metacognitive strategy awareness are all likely to be important. For the last of these, Danan (2015: 47) asks whether learners should be taught ‘techniques to make better use of subtitles and compensate for weaknesses: techniques such as a quick reading of subtitles before listening, confirmation of word recognition or meaning after listening, as well as focus on form for spelling or grammatical accuracy?’

In short, it is, in practice, virtually impossible to determine optimal conditions for learning from subtitles, because we cannot ‘take into account all the psycho-social, cultural and pedagogic parameters’ (Gambier, 2015). With that said, it’s time to take a closer look at the different potential of L1 and L2 subtitles.

L1 vs L2 subtitles

Since all other things are almost never equal, it is not possible to say that one kind of subtitles offers greater potential for learning than another. As regards gains in vocabulary acquisition and listening comprehension, there is no research consensus (Baranowska, 2020: 107). Research does, however, offer us a number of pointers.

Extensive viewing of subtitled video (both L1 and L2) can offer ‘massive quantities of authentic and comprehensible input’ (Vanderplank, 1988: 273). With lower level learners, the input is likely to be more comprehensible with L1 subtitles, and, therefore, more enjoyable and motivating. This makes them often more suitable for what Caimi (2015: 11) calls ‘leisure viewing’. Vocabulary acquisition may be better served with L2 subtitles, because they can help viewers to recognize the words that are being spoken, increase their interaction with the target language, provide further language context, and increase the redundancy of information, thereby enhancing the possibility of this input being stored in long-term memory (Frumuselu et al., 2015). These effects are much more likely with Vanderplank’s (2019) motivated, ‘maximal’ users than with ‘minimal’ users.

There is one further area where L2 subtitles may have the edge over L1. One of the values of extended listening in a target language is the improvement in phonetic retuning (see, for example, Reinisch & Holt, 2013), the ability to adjust the phonetic boundaries in your own language to the boundaries that exist in the target language. Learning how to interpret unusual speech-sounds, learning how to deal with unusual mappings between sounds and words and learning how to deal with the acoustic variations of different speakers of the target language are all important parts of acquiring another language. Research by Mitterer and McQueen (2009) suggests that L2 subtitles help in this process, but L1 subtitles hinder it.

Classroom implications?

The literature on subtitles and language learning echoes with the refrain of ‘more research needed’, but I’m not sure that further research will lead to less ambiguous, practical conclusions. One of my initial questions concerned the optimal order of use of different kinds of subtitles. In most extensive viewing contexts, learners are unlikely to watch something more than twice. If they do (watching a recorded academic lecture, for example), they are likely to be more motivated by a desire to learn from the content than to learn language from the content. L1 subtitles will probably be preferred, and will have the added bonus of facilitating note-taking in the L1. For learners who are more motivated to learn the target language (Vanderplank’s ‘maximal’ users), a sequence of subtitle use, starting with the least cognitively challenging and moving to greater challenge, probably makes sense. Danan (2015: 46) suggests starting with an L1 soundtrack and reversed (L2) subtitles, then moving on to an L2 soundtrack and L2 subtitles, and ending with an L2 soundtrack and no subtitles. I would replace her first stage with an L2 soundtrack and L1 subtitles, but this is based on hunch rather than research.

This sequencing of subtitle use is common practice in language classrooms, but, here, (1) the video clips are usually short, and (2) the aim is often not incidental learning of vocabulary. Typically, the video clip has been selected as a tool for deliberate teaching of language items, so different conditions apply. At least one study has confirmed the value of the common teaching practice of pre-teaching target vocabulary items before viewing (Pujadas & Muñoz, 2019). The drawback is that, by getting learners to focus on particular items, less incidental learning of other language features is likely to take place. Perhaps this doesn’t matter too much. In a short clip of a few minutes, the opportunities for incidental learning are limited, anyway. With short clips and a deliberate learning aim, it seems reasonable to use L2 subtitles for a first viewing, and no subtitles thereafter.

An alternative frequent use of short video clips in classrooms is to use them as a springboard for speaking. In these cases, Baranowska (2020: 113) suggests that teachers may opt for L1 subtitles first, and follow up with L2 subtitles. Of course, with personal viewing devices or in online classes, teachers may want to exploit the possibilities of differentiating the subtitle condition for different learners.

REFERENCES

Baranowska, K. (2020). Learning most with least effort: subtitles and cognitive load. ELT Journal 74 (2): pp.105 – 115

Bisson, M.-J., Van Heuven, W.J.B., Conklin, K. and Tunney, R.J. (2012). Processing of native and foreign language subtitles in films: An eye tracking study. Applied Psycholingistics, 35 (2): pp. 399 – 418

Burczyńska, P. (2015). Reversed Subtitles as a Powerful Didactic Tool in SLA. In Gambier, Y., Caimi, A. & Mariotti, C. (Eds.), Subtitles and Language Learning. Principles, strategies and practical experiences. Bern: Peter Lang (pp. 221 – 244)

Caimi, A. (2015). Introduction. In Gambier, Y., Caimi, A. & Mariotti, C. (Eds.), Subtitles and Language Learning. Principles, strategies and practical experiences. Bern: Peter Lang (pp. 9 – 18)

Danan, M. (2015). Subtitling as a Language Learning Tool: Past Findings, Current Applications, and Future Paths. In Gambier, Y., Caimi, A. & Mariotti, C. (Eds.), Subtitles and Language Learning. Principles, strategies and practical experiences. Bern: Peter Lang (pp. 41 – 61)

d’Ydewalle, G. & Pavakanun, U. (1997). Could Enjoying a Movie Lead to Language Acquisition?. In: Winterhoff-Spurk, P., van der Voort, T.H.A. (Eds.) New Horizons in Media Psychology. VS Verlag für Sozialwissenschaften, Wiesbaden. https://doi.org/10.1007/978-3-663-10899-3_10

Frumuselu, A.D., de Maeyer, S., Donche, V. & Gutierrez Colon Plana, M. (2015). Television series inside the EFL classroom: bridging the gap between teaching and learning informal language through subtitles. Linguistics and Education, 32: pp. 107 – 17

Frumuselu, A. D. (2019). ‘A Friend in Need is a Film Indeed’: Teaching Colloquial Expressions with Subtitled Television Series. In Herrero, C. & Vanderschelden, I. (Eds.) Using Film and Media in the Language Classroom. Bristol: Multimedia Matters. pp.92 – 107

Gambier, Y. (2015). Subtitles and Language Learning (SLL): Theoretical background. In Gambier, Y., Caimi, A. & Mariotti, C. (Eds.), Subtitles and Language Learning. Principles, strategies and practical experiences. Bern: Peter Lang (pp. 63 – 82)

Hall, G. & Cook, G. (2012). Own-language Use in Language Teaching and Learning. Language Learning, 45 (3): pp. 271 – 308

Incalcaterra McLoughlin, L., Biscio, M. & Ní Mhainnín, M. A. (Eds.) (2011). Audiovisual Translation, Subtitles and Subtitling. Theory and Practice. Bern: Peter Lang

Karakaş, A. & Sariçoban, A. (2012). The impact of watching subtitled animated cartoons on incidental vocabulary learning of ELT students. Teaching English with Technology, 12 (4): pp. 3 – 15

Kerr, P. (2016). Questioning ‘English-only’ Classrooms: Own-language Use in ELT. In Hall, G. (Ed.) The Routledge Handbook of English Language Teaching (pp. 513 – 526)

Kruger, J. L., Hefer, E. & Matthew, G. (2014). Attention distribution and cognitive load in a subtitled academic lecture: L1 vs. L2. Journal of Eye Movement Research, 7: pp. 1 – 15

Mitterer, H. & McQueen, J. M. (2009). Foreign Subtitles Help but Native-Language Subtitles Harm Foreign Speech Perception. PLoS ONE 4 (11): e7785.doi:10.1371/journal.pone.0007785

Montero Perez, M., Van Den Noortgate, W., & Desmet, P. (2013). Captioned video for L2 listening and vocabulary learning: A meta-analysis. System, 41, pp. 720–739 doi:10.1016/j.system.2013.07.013

Pujadas, G. & Muñoz, C. (2019). Extensive viewing of captioned and subtitled TV series: a study of L2 vocabulary learning by adolescents, The Language Learning Journal, 47:4, 479-496, DOI: 10.1080/09571736.2019.1616806

Pujolá, J.- T. (2002). CALLing for help: Researching language learning strategies using help facilities in a web-based multimedia program. ReCALL, 14 (2): pp. 235 – 262

Reinisch, E. & Holt, L. L. (2013). Lexically Guided Phonetic Retuning of Foreign-Accented Speech and Its Generalization. Journal of Experimental Psychology: Human Perception and Performance. Advance online publication. doi: 10.1037/a0034409

Suárez, M. & Gesa, F. (2019) Learning vocabulary with the support of sustained exposure to captioned video: do proficiency and aptitude make a difference? The Language Learning Journal, 47:4, 497-517, DOI: 10.1080/09571736.2019.1617768

Taylor, G. (2005). Perceived processing strategies of students watching captioned video. Foreign Language Annals, 38(3), pp. 422-427

Vanderplank, R. (1988). The value of teletext subtitles in language learning. ELT Journal, 42 (4): pp. 272 – 281

Vanderplank, R. (2015). Thirty Years of Research into Captions / Same Language Subtitles and Second / Foreign Language Learning: Distinguishing between ‘Effects of’ Subtitles and ‘Effects with’ Subtitles for Future Research. In Gambier, Y., Caimi, A. & Mariotti, C. (Eds.), Subtitles and Language Learning. Principles, strategies and practical experiences. Bern: Peter Lang (pp. 19 – 40)

Vanderplank, R. (2019). ‘Gist watching can only take you so far’: attitudes, strategies and changes in behaviour in watching films with captions, The Language Learning Journal, 47:4, 407-423, DOI: 10.1080/09571736.2019.1610033

Winke, P., Gass, S. M., & Sydorenko, T. (2010). The Effects of Captioning Videos Used for Foreign Language Listening Activities. Language Learning & Technology, 1 (1): pp. 66 – 87

Zabalbeascoa, P., González-Casillas, S. & Pascual-Herce, R. (2015). In Gambier, Y., Caimi, A. & Mariotti, C. (Eds.), Subtitles and Language Learning. Principles, strategies and practical experiences Bern: Peter Lang (pp. 105–126)