Archive for the ‘learning outcomes’ Category

Book_coverIn my last post, I looked at shortcomings in edtech research, mostly from outside the world of ELT. I made a series of recommendations of ways in which such research could become more useful. In this post, I look at two very recent collections of ELT edtech research. The first of these is Digital Innovations and Research in Language Learning, edited by Mavridi and Saumell, and published this February by the Learning Technologies SIG of IATEFL. I’ll refer to it here as DIRLL. It’s available free to IATEFL LT SIG members, and can be bought for $10.97 as an ebook on Amazon (US). The second is the most recent edition (February 2020) of the Language Learning & Technology journal, which is open access and available here. I’ll refer to it here as LLTJ.

In both of these collections, the focus is not on ‘technology per se, but rather issues related to language learning and language teaching, and how they are affected or enhanced by the use of digital technologies’. However, they are very different kinds of publication. Nobody involved in the production of DIRLL got paid in any way (to the best of my knowledge) and, in keeping with its provenance from a teachers’ association, has ‘a focus on the practitioner as teacher-researcher’. Almost all of the contributing authors are university-based, but they are typically involved more in language teaching than in research. With one exception (a grant from the EU), their work was unfunded.

The triannual LLTJ is funded by two American universities and published by the University of Hawaii Press. The editors and associate editors are well-known scholars in their fields. The journal’s impact factor is high, close to the impact factor of the paywalled reCALL (published by the University of Cambridge), which is the highest-ranking journal in the field of CALL. The contributing authors are all university-based, many with a string of published articles (in prestige journals), chapters or books behind them. At least six of the studies were funded by national grant-awarding bodies.

I should begin by making clear that there was much in both collections that I found interesting. However, it was not usually the research itself that I found informative, but the literature review that preceded it. Two of the chapters in DIRLL were not really research, anyway. One was the development of a template for evaluating ICT-mediated tasks in CLIL, another was an advocacy of comics as a resource for language teaching. Both of these were new, useful and interesting to me. LLTJ included a valuable literature review of research into VR in FL learning (but no actual new research). With some exceptions in both collections, though, I felt that I would have been better off curtailing my reading after the reviews. Admittedly, there wouldn’t be much in the way of literature reviews if there were no previous research to report …

It was no surprise to see the learners who were the subjects of this research were overwhelmingly university students. In fact, only one article (about a high-school project in Israel, reported in DIRLL) was not about university students. The research areas focused on reflected this bias towards tertiary contexts: online academic reading skills, academic writing, online reflective practices in teacher training programmes, etc.

In a couple of cases, the selection of experimental subjects seemed plain bizarre. Why, if you want to find out about the extent to which Moodle use can help EAP students become better academic readers (in DIRLL), would you investigate this with a small volunteer cohort of postgraduate students of linguistics, with previous experience of using Moodle and experience of teaching? Is a less representative sample imaginable? Why, if you want to investigate the learning potential of the English File Pronunciation app (reported in LLTJ), which is clearly most appropriate for A1 – B1 levels, would you do this with a group of C1-level undergraduates following a course in phonetics as part of an English Studies programme?

More problematic, in my view, was the small sample size in many of the research projects. The Israeli virtual high school project (DIRLL), previously referred to, started out with only 11 students, but 7 dropped out, primarily, it seems, because of institutional incompetence: ‘the project was probably doomed […] to failure from the start’, according to the author. Interesting as this was as an account of how not to set up a project of this kind, it is simply impossible to draw any conclusions from 4 students about the potential of a VLE for ‘interaction, focus and self-paced learning’. The questionnaire investigating experience of and attitudes towards VR (in DIRLL) was completed by only 7 (out of 36 possible) students and 7 (out of 70+ possible) teachers. As the author acknowledges, ‘no great claims can be made’, but then goes on to note the generally ‘positive attitudes to VR’. Perhaps those who did not volunteer had different attitudes? We will never know. The study of motivational videos in tertiary education (DIRLL) started off with 15 subjects, but 5 did not complete the necessary tasks. The research into L1 use in videoconferencing (LLTJ) started off with 10 experimental subjects, all with the same L1 and similar cultural backgrounds, but there was no data available from 4 of them (because they never switched into L1). The author claims that the paper demonstrates ‘how L1 is used by language learners in videoconferencing as a social semiotic resource to support social presence’ – something which, after reading the literature review, we already knew. But the paper also demonstrates quite clearly how L1 is not used by language learners in videoconferencing as a social semiotic resource to support social presence. In all these cases, it is the participants who did not complete or the potential participants who did not want to take part that have the greatest interest for me.

Unsurprisingly, the LLTJ articles had larger sample sizes than those in DIRLL, but in both collections the length of the research was limited. The production of one motivational video (DIRLL) does not really allow us to draw any conclusions about the development of students’ critical thinking skills. Two four-week interventions do not really seem long enough to me to discover anything about learner autonomy and Moodle (DIRLL). An experiment looking at different feedback modes needs more than two written assignments to reach any conclusions about student preferences (LLTJ).

More research might well be needed to compensate for the short-term projects with small sample sizes, but I’m not convinced that this is always the case. Lacking sufficient information about the content of the technologically-mediated tools being used, I was often unable to reach any conclusions. A gamified Twitter environment was developed in one project (DIRLL), using principles derived from contemporary literature on gamification. The authors concluded that the game design ‘failed to generate interaction among students’, but without knowing a lot more about the specific details of the activity, it is impossible to say whether the problem was the principles or the particular instantiation of those principles. Another project, looking at the development of pronunciation materials for online learning (LLTJ), came to the conclusion that online pronunciation training was helpful – better than none at all. Claims are then made about the value of the method used (called ‘innovative Cued Pronunciation Readings’), but this is not compared to any other method / materials, and only a very small selection of these materials are illustrated. Basically, the reader of this research has no choice but to take things on trust. The study looking at the use of Alexa to help listening comprehension and speaking fluency (LLTJ) cannot really tell us anything about IPAs unless we know more about the particular way that Alexa is being used. Here, it seems that the students were using Alexa in an interactive storytelling exercise, but so little information is given about the exercise itself that I didn’t actually learn anything at all. The author’s own conclusion is that the results, such as they are, need to be treated with caution. Nevertheless, he adds ‘the current study illustrates that IPAs may have some value to foreign language learners’.

This brings me onto my final gripe. To be told that IPAs like Alexa may have some value to foreign language learners is to be told something that I already know. This wasn’t the only time this happened during my reading of these collections. I appreciate that research cannot always tell us something new and interesting, but a little more often would be nice. I ‘learnt’ that goal-setting plays an important role in motivation and that gamification can boost short-term motivation. I ‘learnt’ that reflective journals can take a long time for teachers to look at, and that reflective video journals are also very time-consuming. I ‘learnt’ that peer feedback can be very useful. I ‘learnt’ from two papers that intercultural difficulties may be exacerbated by online communication. I ‘learnt’ that text-to-speech software is pretty good these days. I ‘learnt’ that multimodal literacy can, most frequently, be divided up into visual and auditory forms.

With the exception of a piece about online safety issues (DIRLL), I did not once encounter anything which hinted that there may be problems in using technology. No mention of the use to which student data might be put. No mention of the costs involved (except for the observation that many students would not be happy to spend money on the English File Pronunciation app) or the cost-effectiveness of digital ‘solutions’. No consideration of the institutional (or other) pressures (or the reasons behind them) that may be applied to encourage teachers to ‘leverage’ edtech. No suggestion that a zero-tech option might actually be preferable. In both collections, the language used is invariably positive, or, at least, technology is associated with positive things: uncovering the possibilities, promoting autonomy, etc. Even if the focus of these publications is not on technology per se (although I think this claim doesn’t really stand up to close examination), it’s a little disingenuous to claim (as LLTJ does) that the interest is in how language learning and language teaching is ‘affected or enhanced by the use of digital technologies’. The reality is that the overwhelming interest is in potential enhancements, not potential negative effects.

I have deliberately not mentioned any names in referring to the articles I have discussed. I would, though, like to take my hat off to the editors of DIRLL, Sophia Mavridi and Vicky Saumell, for attempting to do something a little different. I think that Alicia Artusi and Graham Stanley’s article (DIRLL) about CPD for ‘remote’ teachers was very good and should interest the huge number of teachers working online. Chryssa Themelis and Julie-Ann Sime have kindled my interest in the potential of comics as a learning resource (DIRLL). Yu-Ju Lan’s article about VR (LLTJ) is surely the most up-to-date, go-to article on this topic. There were other pieces, or parts of pieces, that I liked, too. But, to me, it’s clear that ‘more research is needed’ … much less than (1) better and more critical research, and (2) more digestible summaries of research.

Colloquium

At the beginning of March, I’ll be going to Cambridge to take part in a Digital Learning Colloquium (for more information about the event, see here ). One of the questions that will be explored is how research might contribute to the development of digital language learning. In this, the first of two posts on the subject, I’ll be taking a broad overview of the current state of play in edtech research.

I try my best to keep up to date with research. Of the main journals, there are Language Learning and Technology, which is open access; CALICO, which offers quite a lot of open access material; and reCALL, which is the most restricted in terms of access of the three. But there is something deeply frustrating about most of this research, and this is what I want to explore in these posts. More often than not, research articles end with a call for more research. And more often than not, I find myself saying ‘Please, no, not more research like this!’

First, though, I would like to turn to a more reader-friendly source of research findings. Systematic reviews are, basically literature reviews which can save people like me from having to plough through endless papers on similar subjects, all of which contain the same (or similar) literature review in the opening sections. If only there were more of them. Others agree with me: the conclusion of one systematic review of learning and teaching with technology in higher education (Lillejord et al., 2018) was that more systematic reviews were needed.

Last year saw the publication of a systematic review of research on artificial intelligence applications in higher education (Zawacki-Richter, et al., 2019) which caught my eye. The first thing that struck me about this review was that ‘out of 2656 initially identified publications for the period between 2007 and 2018, 146 articles were included for final synthesis’. In other words, only just over 5% of the research was considered worthy of inclusion.

The review did not paint a very pretty picture of the current state of AIEd research. As the second part of the title of this review (‘Where are the educators?’) makes clear, the research, taken as a whole, showed a ‘weak connection to theoretical pedagogical perspectives’. This is not entirely surprising. As Bates (2019) has noted: ‘since AI tends to be developed by computer scientists, they tend to use models of learning based on how computers or computer networks work (since of course it will be a computer that has to operate the AI). As a result, such AI applications tend to adopt a very behaviourist model of learning: present / test / feedback.’ More generally, it is clear that technology adoption (and research) is being driven by technology enthusiasts, with insufficient expertise in education. The danger is that edtech developers ‘will simply ‘discover’ new ways to teach poorly and perpetuate erroneous ideas about teaching and learning’ (Lynch, 2017).

This, then, is the first of my checklist of things that, collectively, researchers need to do to improve the value of their work. The rest of this list is drawn from observations mostly, but not exclusively, from the authors of systematic reviews, and mostly come from reviews of general edtech research. In the next blog post, I’ll look more closely at a recent collection of ELT edtech research (Mavridi & Saumell, 2020) to see how it measures up.

1 Make sure your research is adequately informed by educational research outside the field of edtech

Unproblematised behaviourist assumptions about the nature of learning are all too frequent. References to learning styles are still fairly common. The most frequently investigated skill that is considered in the context of edtech is critical thinking (Sosa Neira, et al., 2017), but this is rarely defined and almost never problematized, despite a broad literature that questions the construct.

2 Adopt a sceptical attitude from the outset

Know your history. Decades of technological innovation in education have shown precious little in the way of educational gains and, more than anything else, have taught us that we need to be sceptical from the outset. ‘Enthusiasm and praise that are directed towards ‘virtual education, ‘school 2.0’, ‘e-learning and the like’ (Selwyn, 2014: vii) are indications that the lessons of the past have not been sufficiently absorbed (Levy, 2016: 102). The phrase ‘exciting potential’, for example, should be banned from all edtech research. See, for example, a ‘state-of-the-art analysis of chatbots in education’ (Winkler & Söllner, 2018), which has nothing to conclude but ‘exciting potential’. Potential is fine (indeed, it is perhaps the only thing that research can unambiguously demonstrate – see section 3 below), but can we try to be a little more grown-up about things?

3 Know what you are measuring

Measuring learning outcomes is tricky, to say the least, but it’s understandable that researchers should try to focus on them. Unfortunately, ‘the vast array of literature involving learning technology evaluation makes it challenging to acquire an accurate sense of the different aspects of learning that are evaluated, and the possible approaches that can be used to evaluate them’ (Lai & Bower, 2019). Metrics such as student grades are hard to interpret, not least because of the large number of variables and the danger of many things being conflated in one score. Equally, or possibly even more, problematic, are self-reporting measures which are rarely robust. It seems that surveys are the most widely used instrument in qualitative research (Sosa Neira, et al., 2017), but these will tell us little or nothing when used for short-term interventions (see point 5 below).

4 Ensure that the sample size is big enough to mean something

In most of the research into digital technology in education that was analysed in a literature review carried out for the Scottish government (ICF Consulting Services Ltd, 2015), there were only ‘small numbers of learners or teachers or schools’.

5 Privilege longitudinal studies over short-term projects

The Scottish government literature review (ICF Consulting Services Ltd, 2015), also noted that ‘most studies that attempt to measure any outcomes focus on short and medium term outcomes’. The fact that the use of a particular technology has some sort of impact over the short or medium term tells us very little of value. Unless there is very good reason to suspect the contrary, we should assume that it is a novelty effect that has been captured (Levy, 2016: 102).

6 Don’t forget the content

The starting point of much edtech research is the technology, but most edtech, whether it’s a flashcard app or a full-blown Moodle course, has content. Research reports rarely give details of this content, assuming perhaps that it’s just fine, and all that’s needed is a little tech to ‘present learners with the ‘right’ content at the ‘right’ time’ (Lynch, 2017). It’s a foolish assumption. Take a random educational app from the Play Store, a random MOOC or whatever, and the chances are you’ll find it’s crap.

7 Avoid anecdotal accounts of technology use in quasi-experiments as the basis of a ‘research article’

Control (i.e technology-free) groups may not always be possible but without them, we’re unlikely to learn much from a single study. What would, however, be extremely useful would be a large, collated collection of such action-research projects, using the same or similar technology, in a variety of settings. There is a marked absence of this kind of work.

8 Enough already of higher education contexts

Researchers typically work in universities where they have captive students who they can carry out research on. But we have a problem here. The systematic review of Lundin et al (2018), for example, found that ‘studies on flipped classrooms are dominated by studies in the higher education sector’ (besides lacking anchors in learning theory or instructional design). With some urgency, primary and secondary contexts need to be investigated in more detail, not just regarding flipped learning.

9 Be critical

Very little edtech research considers the downsides of edtech adoption. Online safety, privacy and data security are hardly peripheral issues, especially with younger learners. Ignoring them won’t make them go away.

More research?

So do we need more research? For me, two things stand out. We might benefit more from, firstly, a different kind of research, and, secondly, more syntheses of the work that has already been done. Although I will probably continue to dip into the pot-pourri of articles published in the main CALL journals, I’m looking forward to a change at the CALICO journal. From September of this year, one issue a year will be thematic, with a lead article written by established researchers which will ‘first discuss in broad terms what has been accomplished in the relevant subfield of CALL. It should then outline which questions have been answered to our satisfaction and what evidence there is to support these conclusions. Finally, this article should pose a “soft” research agenda that can guide researchers interested in pursuing empirical work in this area’. This will be followed by two or three empirical pieces that ‘specifically reflect the research agenda, methodologies, and other suggestions laid out in the lead article’.

But I think I’ll still have a soft spot for some of the other journals that are coyer about their impact factor and that can be freely accessed. How else would I discover (it would be too mean to give the references here) that ‘the effective use of new technologies improves learners’ language learning skills’? Presumably, the ineffective use of new technologies has the opposite effect? Or that ‘the application of modern technology represents a significant advance in contemporary English language teaching methods’?

References

Bates, A. W. (2019). Teaching in a Digital Age Second Edition. Vancouver, B.C.: Tony Bates Associates Ltd. Retrieved from https://pressbooks.bccampus.ca/teachinginadigitalagev2/

ICF Consulting Services Ltd (2015). Literature Review on the Impact of Digital Technology on Learning and Teaching. Edinburgh: The Scottish Government. https://dera.ioe.ac.uk/24843/1/00489224.pdf

Lai, J.W.M. & Bower, M. (2019). How is the use of technology in education evaluated? A systematic review. Computers & Education, 133(1), 27-42. Elsevier Ltd. Retrieved January 14, 2020 from https://www.learntechlib.org/p/207137/

Levy, M. 2016. Researching in language learning and technology. In Farr, F. & Murray, L. (Eds.) The Routledge Handbook of Language Learning and Technology. Abingdon, Oxon.: Routledge. pp.101 – 114

Lillejord S., Børte K., Nesje K. & Ruud E. (2018). Learning and teaching with technology in higher education – a systematic review. Oslo: Knowledge Centre for Education https://www.forskningsradet.no/siteassets/publikasjoner/1254035532334.pdf

Lundin, M., Bergviken Rensfeldt, A., Hillman, T. et al. (2018). Higher education dominance and siloed knowledge: a systematic review of flipped classroom research. International Journal of Educational Technology in Higher Education 15, 20 (2018) doi:10.1186/s41239-018-0101-6

Lynch, J. (2017). How AI Will Destroy Education. Medium, November 13, 2017. https://buzzrobot.com/how-ai-will-destroy-education-20053b7b88a6

Mavridi, S. & Saumell, V. (Eds.) (2020). Digital Innovations and Research in Language Learning. Faversham, Kent: IATEFL

Selwyn, N. (2014). Distrusting Educational Technology. New York: Routledge

Sosa Neira, E. A., Salinas, J. and de Benito Crosetti, B. (2017). Emerging Technologies (ETs) in Education: A Systematic Review of the Literature Published between 2006 and 2016. International Journal of Emerging Technologies in Education, 12 (5). https://online-journals.org/index.php/i-jet/article/view/6939

Winkler, R. & Söllner, M. (2018): Unleashing the Potential of Chatbots in Education: A State-Of-The-Art Analysis. In: Academy of Management Annual Meeting (AOM). Chicago, USA. https://www.alexandria.unisg.ch/254848/1/JML_699.pdf

Zawacki-Richter, O., Bond, M., Marin, V. I. And Gouverneur, F. (2019). Systematic review of research on artificial intelligence applications in higher education – where are the educators? International Journal of Educational Technology in Higher Education 2019

Digital flashcard systems like Memrise and Quizlet remain among the most popular language learning apps. Their focus is on the deliberate learning of vocabulary, an approach described by Paul Nation (Nation, 2005) as ‘one of the least efficient ways of developing learners’ vocabulary knowledge but nonetheless […] an important part of a well-balanced vocabulary programme’. The deliberate teaching of vocabulary also features prominently in most platform-based language courses.

For both vocabulary apps and bigger courses, the lexical items need to be organised into sets for the purposes of both presentation and practice. A common way of doing this, especially at lower levels, is to group the items into semantic clusters (sets with a classifying superordinate, like body part, and a collection of example hyponyms, like arm, leg, head, chest, etc.).

The problem, as Keith Folse puts it, is that such clusters ‘are not only unhelpful, they actually hinder vocabulary retention’ (Folse, 2004: 52). Evidence for this claim may be found in Higa (1963), Tinkham (1993, 1997), Waring (1997), Erten & Tekin (2008) and Barcroft (2015), to cite just some of the more well-known studies. The results, says Folse, ‘are clear and, I think, very conclusive’. The explanation that is usually given draws on interference theory: semantic similarity may lead to confusion (e.g. when learners mix up days of the week, colour words or adjectives to describe personality).

It appears, then, to be long past time to get rid of semantic clusters in language teaching. Well … not so fast. First of all, although most of the research sides with Folse, not all of it does. Nakata and Suzuki (2019) in their survey of more recent research found that results were more mixed. They found one study which suggested that there was no significant difference in learning outcomes between presenting words in semantic clusters and semantically unrelated groups (Ishii, 2015). And they found four studies (Hashemi & Gowdasiaei, 2005; Hoshino, 2010; Schneider, Healy, & Bourne, 1998, 2002) where semantic clusters had a positive effect on learning.

Nakata and Suzuki (2019) offer three reasons why semantic clustering might facilitate vocabulary learning: it (1) ‘reflects how vocabulary is stored in the mental lexicon, (2) introduces desirable difficulty, and (3) leads to extra attention, effort, or engagement from learners’. Finkbeiner and Nicol (2003) make a similar point: ‘although learning semantically related words appears to take longer, it is possible that words learned under these conditions are learned better for the purpose of actual language use (e.g., the retrieval of vocabulary during production and comprehension). That is, the very difficulty associated with learning the new labels may make them easier to process once they are learned’. Both pairs of researcher cited in this paragraph conclude that semantic clusters are best avoided, but their discussion of the possible benefits of this clustering is a recognition that the research (for reasons which I will come on to) cannot lead to categorical conclusions.

The problem, as so often with pedagogical research, is the gap between research conditions and real-world classrooms. Before looking at this in a little more detail, one relatively uncontentious observation can be made. Even those scholars who advise against semantic clustering (e.g. Papathanasiou, 2009), acknowledge that the situation is complicated by other factors, especially the level of proficiency of the learner and whether or not one or more of the hyponyms are known to the learner. At higher levels (when it is more likely that one or more of the hyponyms are already, even partially, known), semantic clustering is not a problem. I would add that, on the whole at higher levels, the deliberate learning of vocabulary is even less efficient than at lower levels and should be an increasingly small part of a well-balanced vocabulary programme.

So, why is there a problem drawing practical conclusions from the research? In order to have any scientific validity at all, researchers need to control a large number of variable. They need, for example, to be sure that learners do not already know any of the items that are being presented. The only practical way of doing this is to present sets of invented words, and this is what most of the research does (Sarioğlu, 2018). These artificial words solve one problem, but create others, the most significant of which is item difficulty. Many factors impact on item difficulty, and these include word frequency (obviously a problem with invented words), word length, pronounceability and the familiarity and length of the corresponding item in L1. None of the studies which support the abandonment of semantic clusters have controlled all of these variables (Nakata and Suzuki, 2019). Indeed, it would be practically impossible to do so. Learning pseudo-words is a very different proposition to learning real words, which a learner may subsequently encounter or want to use.

Take, for example, the days of the week. It’s quite common for learners to muddle up Tuesday and Thursday. The reason for this is not just semantic similarity (Tuesday and Monday are less frequently confused). They are also very similar in terms of both spelling and pronunciation. They are ‘synforms’ (see Laufer, 2009), which, like semantic clusters, can hinder learning of new items. But, now imagine a French-speaking learner of Spanish studying the days of the week. It is much less likely that martes and jueves will be muddled, because of their similarity to the French words mardi and jeudi. There would appear to be no good reason not to teach the complete set of days of the week to a learner like this. All other things being equal, it is probably a good idea to avoid semantic clusters, but all other things are very rarely equal.

Again, in an attempt to control for variables, researchers typically present the target items in isolation (in bilingual pairings). But, again, the real world does not normally conform to this condition. Leo Sellivan (2014) suggests that semantic clusters (e.g. colours) are taught as part of collocations. He gives the examples of red dress, green grass and black coffee, and points out that the alliterative patterns can serve as mnemonic devices which will facilitate learning. The suggestion is, I think, a very good one, but, more generally, it’s worth noting that the presentation of lexical items in both digital flashcards and platform courses is rarely context-free. Contexts will inevitably impact on learning and may well obviate the risks of semantic clustering.

Finally, this kind of research typically gives participants very restricted time to memorize the target words (Sarioğlu, 2018) and they are tested in very controlled recall tasks. In the case of language platform courses, practice of target items is usually spread out over a much longer period of time, with a variety of exposure opportunities (in controlled practice tasks, exposure in texts, personalisation tasks, revision exercises, etc.) both within and across learning units. In this light, it is not unreasonable to argue that laboratory-type research offers only limited insights into what should happen in the real world of language learning and teaching. The choice of learning items, the way they are presented and practised, and the variety of activities in the well-balanced vocabulary programme are probably all more significant than the question of whether items are organised into semantic clusters.

Although semantic clusters are quite common in language learning materials, much more common are thematic clusters (i.e. groups of words which are topically related, but include a variety of parts of speech (see below). Researchers, it seems, have no problem with this way of organising lexical sets. By way of conclusion, here’s an extract from a recent book:

‘Introducing new words together that are similar in meaning (synonyms), such as scared and frightened, or forms (synforms), like contain and maintain, can be confusing, and students are less likely to remember them. This problem is known as ‘interference’. One way to avoid this is to choose words that are around the same theme, but which include a mix of different parts of speech. For example, if you want to focus on vocabulary to talk about feelings, instead of picking lots of adjectives (happy, sad, angry, scared, frightened, nervous, etc.) include some verbs (feel, enjoy, complain) and some nouns (fun, feelings, nerves). This also encourages students to use a variety of structures with the vocabulary.’ (Hughes, et al., 2015: 25)

 

References

Barcroft, J. 2015. Lexical Input Processing and Vocabulary Learning. Amsterdam: John Benjamins

Erten, I.H., & Tekin, M. 2008. Effects on vocabulary acquisition of presenting new words in semantic sets versus semantically-unrelated sets. System, 36 (3), 407-422

Finkbeiner, M. & Nicol, J. 2003. Semantic category effects in second language word learning. Applied Psycholinguistics 24 (2003), 369–383

Folse, K. S. 2004. Vocabulary Myths. Ann Arbor: University of Michigan Press

Hashemi, M.R., & Gowdasiaei, F. 2005. An attribute-treatment interaction study: Lexical-set versus semantically-unrelated vocabulary instruction. RELC Journal, 36 (3), 341-361

Higa, M. 1963. Interference effects of intralist word relationships in verbal learning. Journal of Verbal Learning and Verbal Behavior, 2, 170-175

Hoshino, Y. 2010. The categorical facilitation effects on L2 vocabulary learning in a classroom setting. RELC Journal, 41, 301–312

Hughes, S. H., Mauchline, F. & Moore, J. 2019. ETpedia Vocabulary. Shoreham-by-Sea: Pavilion Publishing and Media

Ishii, T. 2015. Semantic connection or visual connection: Investigating the true source of confusion. Language Teaching Research, 19, 712–722

Laufer, B. 2009. The concept of ‘synforms’ (similar lexical forms) in vocabulary acquisition. Language and Education, 2 (2): 113 – 132

Nakata, T. & Suzuki, Y. 2019. Effects Of Massing And Spacing On The Learning Of Semantically Related And Unrelated Words. Studies in Second Language Acquisition 41 (2), 287 – 311

Nation, P. 2005. Teaching Vocabulary. Asian EFL Journal. http://www.asian-efl-journal.com/sept_05_pn.pdf

Papathanasiou, E. 2009. An investigation of two ways of presenting vocabulary. ELT Journal 63 (4), 313 – 322

Sarioğlu, M. 2018. A Matter of Controversy: Teaching New L2 Words in Semantic Sets or Unrelated Sets. Journal of Higher Education and Science Vol 8 / 1: 172 – 183

Schneider, V. I., Healy, A. F., & Bourne, L. E. 1998. Contextual interference effects in foreign language vocabulary acquisition and retention. In Healy, A. F. & Bourne, L. E. (Eds.), Foreign language learning: Psycholinguistic studies on training and retention (pp. 77–90). Mahwah, NJ: Erlbaum

Schneider, V. I., Healy, A. F., & Bourne, L. E. 2002. What is learned under difficult conditions is hard to forget: Contextual interference effects in foreign vocabulary acquisition, retention, and transfer. Journal of Memory and Language, 46, 419–440

Sellivan, L. 2014. Horizontal alternatives to vertical lists. Blog post: http://leoxicon.blogspot.com/2014/03/horizontal-alternatives-to-vertical.html

Tinkham, T. 1993. The effect of semantic clustering on the learning of second language vocabulary. System 21 (3), 371-380.

Tinkham, T. 1997. The effects of semantic and thematic clustering on the learning of a second language vocabulary. Second Language Research, 13 (2),138-163

Waring, R. 1997. The negative effects of learning words in semantic sets: a replication. System, 25 (2), 261 – 274

by Philip Kerr & Andrew Wickham

from IATEFL 2016 Birmingham Conference Selections (ed. Tania Pattison) Faversham, Kent: IATEFL pp. 75 – 78

ELT publishing, international language testing and private language schools are all industries: products are produced, bought and sold for profit. English language teaching (ELT) is not. It is an umbrella term that is used to describe a range of activities, some of which are industries, and some of which (such as English teaching in high schools around the world) might better be described as public services. ELT, like education more generally, is, nevertheless, often referred to as an ‘industry’.

Education in a neoliberal world

The framing of ELT as an industry is both a reflection of how we understand the term and a force that shapes our understanding. Associated with the idea of ‘industry’ is a constellation of other ideas and words (such as efficacy, productivity, privatization, marketization, consumerization, digitalization and globalization) which become a part of ELT once it is framed as an industry. Repeated often enough, ‘ELT as an industry’ can become a metaphor that we think and live by. Those activities that fall under the ELT umbrella, but which are not industries, become associated with the desirability of industrial practices through such discourse.

The shift from education, seen as a public service, to educational managerialism (where education is seen in industrial terms with a focus on efficiency, free market competition, privatization and a view of students as customers) can be traced to the 1980s and 1990s (Gewirtz, 2001). In 1999, under pressure from developed economies, the General Agreement on Trade in Services (GATS) transformed education into a commodity that could be traded like any other in the marketplace (Robertson, 2006). The global industrialisation and privatization of education continues to be promoted by transnational organisations (such as the World Bank and the OECD), well-funded free-market think-tanks (such as the Cato Institute), philanthro-capitalist foundations (such as the Gates Foundation) and educational businesses (such as Pearson) (Ball, 2012).

Efficacy and learning outcomes

Managerialist approaches to education require educational products and services to be measured and compared. In ELT, the most visible manifestation of this requirement is the current ubiquity of learning outcomes. Contemporary coursebooks are full of ‘can-do’ statements, although these are not necessarily of any value to anyone. Examples from one unit of one best-selling course include ‘Now I can understand advice people give about hotels’ and ‘Now I can read an article about unique hotels’ (McCarthy et al. 2014: 74). However, in a world where accountability is paramount, they are deemed indispensable. The problem from a pedagogical perspective is that teaching input does not necessarily equate with learning uptake. Indeed, there is no reason why it should.

Drawing on the Common European Framework of Reference for Languages (CEFR) for inspiration, new performance scales have emerged in recent years. These include the Cambridge English Scale and the Pearson Global Scale of English. Moving away from the broad six categories of the CEFR, such scales permit finer-grained measurement and we now see individual vocabulary and grammar items tagged to levels. Whilst such initiatives undoubtedly support measurements of efficacy, the problem from a pedagogical perspective is that they assume that language learning is linear and incremental, as opposed to complex and jagged.

Given the importance accorded to the measurement of language learning (or what might pass for language learning), it is unsurprising that attention is shifting towards the measurement of what is probably the most important factor impacting on learning: the teaching. Teacher competency scales have been developed by Cambridge Assessment, the British Council and EAQUALS (Evaluation and Accreditation of Quality Language Services), among others.

The backwash effects of the deployment of such scales are yet to be fully experienced, but the likely increase in the perception of both language learning and teacher learning as the synthesis of granularised ‘bits of knowledge’ is cause for concern.

Digital technology

Digital technology may offer advantages to both English language teachers and learners, but its rapid growth in language learning is the result, primarily but not exclusively, of the way it has been promoted by those who stand to gain financially. In education, generally, and in English language teaching, more specifically, advocacy of the privatization of education is always accompanied by advocacy of digitalization. The global market for digital English language learning products was reported to be $2.8 billion in 2015 and is predicted to reach $3.8 billion by 2020 (Ambient Insight, 2016).

In tandem with the increased interest in measuring learning outcomes, there is fierce competition in the market for high-stakes examinations, and these are increasingly digitally delivered and marked. In the face of this competition and in a climate of digital disruption, companies like Pearson and Cambridge English are developing business models of vertical integration where they can provide and sell everything from placement testing, to courseware (either print or delivered through an LMS), teaching, assessment and teacher training. Huge investments are being made in pursuit of such models. Pearson, for example, recently bought GlobalEnglish, Wall Street English, and set up a partnership with Busuu, thus covering all aspects of language learning from resources provision and publishing to off- and online training delivery.

As regards assessment, the most recent adult coursebook from Cambridge University Press (in collaboration with Cambridge English Language Assessment), ‘Empower’ (Doff, et. Al, 2015) sells itself on a combination of course material with integrated, validated assessment.

Besides its potential for scalability (and therefore greater profit margins), the appeal (to some) of platform-delivered English language instruction is that it facilitates assessment that is much finer-grained and actionable in real time. Digitization and testing go hand in hand.

Few English language teachers have been unaffected by the move towards digital. In the state sectors, large-scale digitization initiatives (such as the distribution of laptops for educational purposes, the installation of interactive whiteboards, the move towards blended models of instruction or the move away from printed coursebooks) are becoming commonplace. In the private sectors, online (or partially online) language schools are taking market share from the traditional bricks-and-mortar institutions.

These changes have entailed modifications to the skill-sets that teachers need to have. Two announcements at this conference reflect this shift. First of all, Cambridge English launched their ‘Digital Framework for Teachers’, a matrix of six broad competency areas organised into four levels of proficiency. Secondly, Aqueduto, the Association for Quality Education and Training Online, was launched, setting itself up as an accreditation body for online or blended teacher training courses.

Teachers’ pay and conditions

In the United States, and likely soon in the UK, the move towards privatization is accompanied by an overt attack on teachers’ unions, rights, pay and conditions (Selwyn, 2014). As English language teaching in both public and private sectors is commodified and marketized it is no surprise to find that the drive to bring down costs has a negative impact on teachers worldwide. Gwynt (2015), for example, catalogues cuts in funding, large-scale redundancies, a narrowing of the curriculum, intensified workloads (including the need to comply with ‘quality control measures’), the deskilling of teachers, dilapidated buildings, minimal resources and low morale in an ESOL department in one British further education college. In France, a large-scale study by Wickham, Cagnol, Wright and Oldmeadow (Linguaid, 2015; Wright, 2016) found that EFL teachers in the very competitive private sector typically had multiple employers, limited or no job security, limited sick pay and holiday pay, very little training and low hourly rates that were deteriorating. One of the principle drivers of the pressure on salaries is the rise of online training delivery through Skype and other online platforms, using offshore teachers in low-cost countries such as the Philippines. This type of training represents 15% in value and up to 25% in volume of all language training in the French corporate sector and is developing fast in emerging countries. These examples are illustrative of a broad global trend.

Implications

Given the current climate, teachers will benefit from closer networking with fellow professionals in order, not least, to be aware of the rapidly changing landscape. It is likely that they will need to develop and extend their skill sets (especially their online skills and visibility and their specialised knowledge), to differentiate themselves from competitors and to be able to demonstrate that they are in tune with current demands. More generally, it is important to recognise that current trends have yet to run their full course. Conditions for teachers are likely to deteriorate further before they improve. More than ever before, teachers who want to have any kind of influence on the way that marketization and industrialization are shaping their working lives will need to do so collectively.

References

Ambient Insight. 2016. The 2015-2020 Worldwide Digital English Language Learning Market. http://www.ambientinsight.com/Resources/Documents/AmbientInsight_2015-2020_Worldwide_Digital_English_Market_Sample.pdf

Ball, S. J. 2012. Global Education Inc. Abingdon, Oxon.: Routledge

Doff, A., Thaine, C., Puchta, H., Stranks, J. and P. Lewis-Jones 2015. Empower. Cambridge: Cambridge University Press

Gewirtz, S. 2001. The Managerial School: Post-welfarism and Social Justice in Education. Abingdon, Oxon.: Routledge

Gwynt, W. 2015. ‘The effects of policy changes on ESOL’. Language Issues 26 / 2: 58 – 60

McCarthy, M., McCarten, J. and H. Sandiford 2014. Touchstone 2 Student’s Book Second Edition. Cambridge: Cambridge University Press

Linguaid, 2015. Le Marché de la Formation Langues à l’Heure de la Mondialisation. Guildford: Linguaid

Robertson, S. L. 2006. ‘Globalisation, GATS and trading in education services.’ published by the Centre for Globalisation, Education and Societies, University of Bristol, Bristol BS8 1JA, UK at http://www.bris.ac.uk/education/people/academicStaff/edslr/publications/04slr

Selwyn, N. 2014. Distrusting Educational Technology. New York: Routledge

Wright, R. 2016. ‘My teacher is rich … or not!’ English Teaching Professional 103: 54 – 56

 

 

If you’re going to teach vocabulary, you need to organise it in some way. Almost invariably, this organisation is topical, with words grouped into what are called semantic sets. In coursebooks, the example below (from Rogers, M., Taylore-Knowles, J. & S. Taylor-Knowles. 2010. Open Mind Level 1. London: Macmillan, p.68) is fairly typical.

open mind

Coursebooks are almost always organised in a topical way. The example above comes in a unit (of 10 pages), entitled ‘You have talent!’, which contains two main vocabulary sections. It’s unsurprising to find a section called ‘personality adjectives’ in such a unit. What’s more, such an approach lends itself to the requisite, but largely, spurious ‘can-do’ statement in the self-evaluation section: I can talk about people’s positive qualities. We must have clearly identifiable learning outcomes, after all.

There is, undeniably, a certain intuitive logic in this approach. An alternative might entail a radical overhaul of coursebook architecture – this might not be such a bad thing, but might not go down too well in the markets. How else, after all, could the vocabulary strand of the syllabus be organised?

Well, there are a number of ways in which a vocabulary syllabus could be organised. Including the standard approach described above, here are four possibilities:

1 semantic sets (e.g. bee, butterfly, fly, mosquito, etc.)

2 thematic sets (e.g. ‘pets’: cat, hate, flea, feed, scratch, etc.)

3 unrelated sets

4 sets determined by a group of words’ occurrence in a particular text

Before reading further, you might like to guess what research has to say about the relative effectiveness of these four approaches.

The answer depends, to some extent, on the level of the learner. For advanced learners, it appears to make no, or little, difference (Al-Jabri, 2005, cited by Ellis & Shintani, 2014: 106). But, for the vast majority of English language learners (i.e. those at or below B2 level), the research is clear: the most effective way of organising vocabulary items to be learnt is by grouping them into thematic sets (2) or by mixing words together in a semantically unrelated way (3) – not by teaching sets like ‘personality adjectives’. It is surprising how surprising this finding is to so many teachers and materials writers. It goes back at least to 1988 and West’s article on ‘Catenizing’ in ELTJ, which argued that semantic grouping made little sense from a psycho-linguistic perspective. Since then, a large amount of research has taken place. This is succinctly summarised by Paul Nation (2013: 128) in the following terms: Avoid interference from related words. Words which are similar in form (Laufer, 1989) or meaning (Higa, 1963; Nation, 2000; Tinkham, 1993; Tinkham, 1997; Waring, 1997) are more difficult to learn together than they are to learn separately. For anyone who is interested, the most up-to-date review of this research that I can find is in chapter 11 of Barcroft (2105).

The message is clear. So clear that you have to wonder how it is not getting through to materials designers. Perhaps, coursebooks are different. They regularly eschew research findings for commercial reasons. But vocabulary apps? There is rarely, if ever, any pressure on the content-creation side of vocabulary apps (except those that are tied to coursebooks) to follow the popular misconceptions that characterise so many coursebooks. It wouldn’t be too hard to organise vocabulary into thematic sets (like, for example, the approach in the A2 level of Memrise German that I’m currently using). Is it simply because the developers of so many vocabulary apps just don’t know much about language learning?

References

Barcroft, J. 2015. Lexical Input Processing and Vocabulary Learning. Amsterdam: John Benjamins

Nation, I. S. P. 2013. Learning Vocabulary in Another Language 2nd edition. Cambridge: Cambridge University Press

Ellis, R. & N. Shintani, N. 2014. Exploring Language Pedagogy through Second Language Acquisition Research. Abingdon, Oxon: Routledge

West, M. 1988. ‘Catenizing’ English Language Teaching Journal 6: 147 – 151

Pearson’s ‘Efficacy’ initiative is a series of ‘commitments designed to measure and increase the company’s impact on learning outcomes around the world’. The company’s dedicated website  offers two glossy brochures with a wide range of interesting articles, a good questionnaire tool that can be used by anyone to measure the efficacy of their own educational products or services, as well as an excellent selection of links to other articles, some of which are critical of the initiative. These include Michael Feldstein’s long blog post  ‘Can Pearson Solve the Rubric’s Cube?’ which should be a first port of call for anyone wanting to understand better what is going on.

What does it all boil down to? The preface to Pearson’s ‘Asking More: the Path to Efficacy’ by CEO John Fallon provides a succinct introduction. Efficacy in education, says Fallon, is ‘making a measurable impact on someone’s life through learning’. ‘Measurable’ is the key word, because, as Fallon continues, ‘it is increasingly possible to determine what works and what doesn’t in education, just as in healthcare.’ We need ‘a relentless focus’ on ‘the learning outcomes we deliver’ because it is these outcomes that can be measured in ‘a systematic, evidence-based fashion’. Measurement, of course, is all the easier when education is delivered online, ‘real-time learner data’ can be captured, and the power of analytics can be deployed.

Pearson are very clearly aligning themselves with recent moves towards a more evidence-based education. In the US, Obama’s Race to the Top is one manifestation of this shift. Britain (with, for example, the Education Endowment Foundation) and France (with its Fonds d’Expérimentation pour la Jeunesse ) are both going in the same direction. Efficacy is all about evidence-based practice.

Both the terms ‘efficacy’ and ‘evidence-based practice’ come originally from healthcare. Fallon references this connection in the quote two paragraphs above. In the UK last year, Ben Goldacre (medical doctor, author of ‘Bad Science’ and a relentless campaigner against pseudo-science) was commissioned by the UK government to write a paper entitled ‘Building Evidence into Education’ . In this, he argued for the need to introduce randomized controlled trials into education in a similar way to their use in medicine.

As Fallon observed in the preface to the Pearson ‘Efficacy’ brochure, this all sounds like ‘common sense’. But, as Ben Goldacre discovered, things are not so straightforward in education. An excellent article in The Guardian outlined some of the problems in Goldacre’s paper.

With regard to ELT, Pearson’s ‘Efficacy’ initiative will stand or fall with the validity of their Global Scale of English, discussed in my March post ‘Knowledge Graphs’ . However, there are a number of other considerations that make the whole evidence-based / efficacy business rather less common-sensical than might appear at first glance.

  • The purpose of English language teaching and learning (at least, in compulsory education) is rather more than simply the mastery of grammatical and lexical systems, or the development of particular language skills. Some of these other purposes (e.g. the development of intercultural competence or the acquisition of certain 21st century skills, such as creativity) continue to be debated. There is very little consensus about the details of what these purposes (or outcomes) might be, or how they can be defined. Without consensus about these purposes / outcomes, it is not possible to measure them.
  • Even if we were able to reach a clear consensus, many of these outcomes do not easily lend themselves to measurement, and even less to low-cost measurement.
  • Although we clearly need to know what ‘works’ and what ‘doesn’t work’ in language teaching, there is a problem in assigning numerical values. As the EduThink blog observes, ‘the assignation of numerical values is contestable, problematic and complex. As teachers and researchers we should be engaging with the complexity [of education] rather than the reductive simplicities of [assigning numerical values]’.
  • Evidence-based medicine has resulted in unquestionable progress, but it is not without its fierce critics. A short summary of the criticisms can be found here .  It would be extremely risky to assume that a contested research procedure from one discipline can be uncritically applied to another.
  • Kathleen Graves, in her plenary at IATEFL 2014, ‘The Efficiency of Inefficiency’, explicitly linked health care and language teaching. She described a hospital where patient care was as much about human relationships as it was about medical treatment, an aspect of the hospital that went unnoticed by efficiency experts, since this could not be measured. See this blog for a summary of her talk.

These issues need to be discussed much further before we get swept away by the evidence-based bandwagon. If they are not, the real danger is that, as John Fallon cautions, we end up counting things that don’t really count, and we don’t count the things that really do count. Somehow, I doubt that an instrument like the Global Scale of English will do the trick.