Archive for the ‘analytics’ Category

On 21 January, I attended the launch webinar of DEFI (the Digital Education Futures Initiative), an initiative of the University of Cambridge, which seeks to work ‘with partners in industry, policy and practice to explore the field of possibilities that digital technology opens up for education’. The opening keynote speaker was Andrea Schleicher, head of education at the OECD. The OECD’s vision of the future of education is outlined in Schleicher’s book, ‘World Class: How to Build a 21st-Century School System’, freely available from the OECD, but his presentation for DEFI offers a relatively short summary. A recording is available here, and this post will take a closer look at some of the things he had to say.

Schleicher is a statistician and the coordinator of the OECD’s PISA programme. Along with other international organisations, such as the World Economic Forum and the World Bank (see my post here), the OECD promotes the global economization and corporatization of education, ‘based on the [human capital] view that developing work skills is the primary purpose of schooling’ (Spring, 2015: 14). In other words, the main proper function of education is seen to be meeting the needs of global corporate interests. In the early days of the COVID-19 pandemic, with the impact of school closures becoming very visible, Schleicher expressed concern about the disruption to human capital development, but thought it was ‘a great moment’: ‘the current wave of school closures offers an opportunity for experimentation and for envisioning new models of education’. Every cloud has a silver lining, and the pandemic has been a godsend for private companies selling digital learning (see my post about this here) and for those who want to reimagine education in a more corporate way.

Schleicher’s presentation for DEFI was a good opportunity to look again at the way in which organisations like the OECD are shaping educational discourse (see my post about the EdTech imaginary and ELT).

He begins by suggesting that, as a result of the development of digital technology (Google, YouTube, etc.) literacy is ‘no longer just about extracting knowledge’. PISA reading scores, he points out, have remained more or less static since 2000, despite the fact that we have invested (globally) more than 15% extra per student in this time. Only 9% of all 15-year-old students in the industrialised world can distinguish between fact and opinion.

To begin with, one might argue about the reliability and validity of the PISA reading scores (Berliner, 2020). One might also argue, as did a collection of 80 education experts in a letter to the Guardian, that the scores themselves are responsible for damaging global education, raising further questions about their validity. One might argue that the increased investment was spent in the wrong way (e.g. on hardware and software, rather than teacher training, for example), because the advice of organisations like OECD has been uncritically followed. And the statistic about critical reading skills is fairly meaningless unless it is compared to comparable metrics over a long time span: there is no reason to believe that susceptibility to fake news is any more of a problem now than it was, say, one hundred years ago. Nor is there any reason to believe that education can solve the fake-news problem (see my post about fake news and critical thinking here). These are more than just quibbles, but the main point that Schleicher is making is that education needs to change.

Schleicher next presents a graph which is designed to show that the amount of time that students spend studying correlates poorly with the amount they learn. His interest is in the (lack of) productivity of educational activities in some contexts. He goes on to argue that there is greater productivity in educational activities when learners have a growth mindset, implying (but not stating) that mindset interventions in schools would lead to a more productive educational environment.

Schleicher appears to confuse what students learn with the things they have learnt that have been measured by PISA. The two are obviously rather different, since PISA is only interested in a relatively small subset of the possible learning outcomes of schooling. His argument for growth mindset interventions hinges on the assumption that such interventions will lead to gains in reading scores. However, his graph demonstrates a correlation between growth mindset and reading scores, not a causal relationship. A causal relationship has not been clearly and empirically demonstrated (see my post about growth mindsets here) and recent work by Carol Dweck and her associates (e.g. Yeager et al., 2016), as well as other researchers (e.g. McPartlan et al, 2020), indicates that the relationship between gains in learning outcomes and mindset interventions is extremely complex.

Schleicher then turns to digitalisation and briefly discusses the positive and negative affordances of technology. He eulogizes platform companies before showing a slide designed to demonstrate that (in the workplace) there is a strong correlation between ICT use and learning. He concludes: ‘the digital world of learning is a hugely empowering world of learning’.

A brief paraphrase of this very disingenuous part of the presentation would be: technology can be good and bad, but I’ll only focus on the former. The discourse appears balanced, but it is anything but.

During the segment, Schleicher argues that technology is empowering, and gives the examples of ‘the most successful companies these days, they’re not created by a big industry, they’re created by a big idea’. This is plainly counterfactual. In the case of Alphabet and Facebook, profits did not follow from a ‘big idea’: the ideas changed as the companies evolved.

Schleicher then sketches a picture of an unpredictable future (pandemics, climate change, AI, cyber wars, etc.) as a way of framing the importance of being open (and resilient) to different futures and how we respond to them. He offers two different kinds of response: maintenance of the status quo, or ‘outsourcing’ of education. The pandemic, he suggests, has made more countries aware that the latter is the way forward.

In his discussion of the maintenance of the status quo, Schleicher talks about the maintenance of educational monopolies. By this, he must be referring to state monopolies on education: this is a favoured way of neoliberals of referring to state-sponsored education. But the extent to which, in 2021 in many OECD countries, the state has any kind of monopoly of education, is very open to debate. Privatization is advancing fast. Even in 2015, the World Education Forum’s ‘Final Report’ wrote that ‘the scale of engagement of nonstate actors at all levels of education is growing and becoming more diversified’. Schleicher goes on to talk about ‘large, bureaucratic school systems’, suggesting that such systems cannot be sufficiently agile, adaptive or responsive. ‘We should ask this question,’ he says, but his own answer to it is totally transparent: ‘changing education can be like moving graveyards’ is the title of the next slide. Education needs to be more like the health sector, he claims, which has been able to develop a COVID vaccine in such a short period of time. We need an education industry that underpins change in the same way as the health industry underpins vaccine development. In case his message isn’t yet clear enough, I’ll spell it out: education needs to be privatized still further.

Schleicher then turns to the ways in which he feels that digital technology can enhance learning. These include the use of AR, VR and AI. Technology, he says, can make learning so much more personalized: ‘the computer can study how you study, and then adapt learning so that it is much more granular, so much more adaptive, so much more responsive to your learning style’. He moves on to the field of assessment, again singing the praises of technology in the ways that it can offer new modes of assessment and ‘increase the reliability of machine rating for essays’. Through technology, we can ‘reunite learning and assessment’. Moving on to learning analytics, he briefly mentions privacy issues, before enthusing at greater length about the benefits of analytics.

Learning styles? Really? The reliability of machine scoring of essays? How reliable exactly? Data privacy as an area worth only a passing mention? The use of sensors to measure learners’ responses to learning experiences? Any pretence of balance appears now to have been shed. This is in-your-face sales talk.

Next up is a graph which purports to show the number of teachers in OECD countries who use technology for learners’ project work. This is followed by another graph showing the number of teachers who have participated in face-to-face and online CPD. The point of this is to argue that online CPD needs to become more common.

I couldn’t understand what point he was trying to make with the first graph. For the second, it is surely the quality of the CPD, rather than the channel, that matters.

Schleicher then turns to two further possible responses of education to unpredictable futures: ‘schools as learning hubs’ and ‘learn-as-you-go’. In the latter, digital infrastructure replaces physical infrastructure. Neither is explored in any detail. The main point appears to be that we should consider these possibilities, weighing up as we do so the risks and the opportunities (see slide below).

Useful ways to frame questions about the future of education, no doubt, but Schleicher is operating with a set of assumptions about the purpose of education, which he chooses not to explore. His fundamental assumption – that the primary purpose of education is to develop human capital in and for the global economy – is not one that I would share. However, if you do take that view, then privatization, economization, digitalization and the training of social-emotional competences are all reasonable corollaries, and the big question about the future concerns how to go about this in a more efficient way.

Schleicher’s (and the OECD’s) views are very much in accord with the libertarian values of the right-wing philanthro-capitalist foundations of the United States (the Gates Foundation, the Broad Foundation and so on), funded by Silicon Valley and hedge-fund managers. It is to the US that we can trace the spread and promotion of these ideas, but it is also, perhaps, to the US that we can now turn in search of hope for an alternative educational future. The privatization / disruption / reform movement in the US has stalled in recent years, as it has become clear that it failed to deliver on its promise of improved learning. The resistance to privatized and digitalized education is chronicled in Diane Ravitch’s latest book, ‘Slaying Goliath’ (2020). School closures during the pandemic may have been ‘a great moment’ for Schleicher, but for most of us, they have underscored the importance of face-to-face free public schooling. Now, with the electoral victory of Joe Biden and the appointment of a new US Secretary for Education (still to be confirmed), we are likely to see, for the first time in decades, an education policy that is firmly committed to public schools. The US is by far the largest contributor to the budget of the OECD – more than twice any other nation. Perhaps a rethink of the OECD’s educational policies will soon be in order?

References

Berliner D.C. (2020) The Implications of Understanding That PISA Is Simply Another Standardized Achievement Test. In Fan G., Popkewitz T. (Eds.) Handbook of Education Policy Studies. Springer, Singapore. https://doi.org/10.1007/978-981-13-8343-4_13

McPartlan, P., Solanki, S., Xu, D. & Sato, B. (2020) Testing Basic Assumptions Reveals When (Not) to Expect Mindset and Belonging Interventions to Succeed. AERA Open, 6 (4): 1 – 16 https://journals.sagepub.com/doi/pdf/10.1177/2332858420966994

Ravitch, D. (2020) Slaying Goliath: The Passionate Resistance to Privatization and the Fight to Save America’s Public School. New York: Vintage Books

Schleicher, A. (2018) World Class: How to Build a 21st-Century School System. Paris: OECD Publishing https://www.oecd.org/education/world-class-9789264300002-en.htm

Spring, J. (2015) Globalization of Education 2nd Edition. New York: Routledge

Yeager, D. S., et al. (2016) Using design thinking to improve psychological interventions: The case of the growth mindset during the transition to high school. Journal of Educational Psychology, 108(3), 374–391. https://doi.org/10.1037/edu0000098

In the last post, I mentioned a lesson plan from an article by Pegrum, M., Dudeney, G. & Hockly, N. (2018. Digital literacies revisited. The European Journal of Applied Linguistics and TEFL, 7 (2), pp. 3-24) in which students discuss the data that is collected by fitness apps and the possibility of using this data to calculate health insurance premiums, before carrying out and sharing online research about companies that track personal data. It’s a nice plan, unfortunately pay-walled, but you could try requesting a copy through Research Gate.

The only other off-the-shelf lesson plan I have been able to find is entitled ‘You and Your Data’ from the British Council. Suitable for level B2, this plan, along with a photocopiable pdf, contains with a vocabulary task (matching), a reading text (you and your data, who uses our data and why, can you protect your data) with true / false and sentence completion tasks) and a discussion (what do you do to protect our data). The material was written to coincide with Safer Internet Day (an EU project), which takes place in early February (next data 9 February 2021). The related website, Better Internet for Kids, contains links to a wide range of educational resources for younger learners.

For other resources, a good first stop is Ina Sander’s ‘A Critically Commented Guide to Data Literacy Tools’ in which she describes and evaluates a wide range of educational online resources for developing critical data literacy. Some of the resources that I discuss below are also evaluated in this guide. Here are some suggestion for learning / teaching resources.

A glossary

This is simply a glossary of terms that are useful in discussing data issues. It could easily be converted into a matching exercise or flashcards.

A series of interactive videos

Do not Track’ is an award-winning series of interactive videos, produced by a consortium of broadcasters. In seven parts, the videos consider such issues as who profits from the personal data that we generate online, the role of cookies in the internet economy, how online profiling is done, the data generated by mobile phones and how algorithms interpret the data.

Each episode is between 5 and 10 minutes long, and is therefore ideal for asynchronous viewing. In a survey of critical data literacy tools (Sander, 2020), ‘Do not Track’ proved popular with the students who used it. I highly recommend it, but students will probably need a B2 level or higher.

More informational videos

If you do not have time to watch the ‘Do Not Track’ video series, you may want to use something shorter. There are a huge number of freely available videos about online privacy. I have selected just two which I think would be useful. You may be able to find something better!

1 Students watch a video about how cookies work. This video, from Vox, is well-produced and is just under 7 minutes long. The speaker speaks fairly rapidly, so captions may be helpful.

Students watch a video as an introduction to the topic of surveillance and privacy. This video, ‘Reclaim our Privacy’, was produced by ‘La Quadrature du Net’, a French advocacy group that promotes digital rights and freedoms of citizens. It is short (3 mins) and can be watched with or without captions (English or 6 other languages). Its message is simple: political leaders should ensure that our online privacy is respected.

A simple matching task ‘ten principles for online privacy’

1 Share the image below with all the students and ask them to take a few minutes matching the illustrations to the principles on the right. There is no need for anyone to write or say anything, but it doesn’t matter if some students write the answers in the chat box.

(Note: This image and the other ideas for this activity are adapted from https://teachingprivacy.org/ , a project developed by the International Computer Science Institute and the University of California-Berkeley for secondary school students and undergraduates. Each of the images corresponds to a course module, which contains a wide-range of materials (videos, readings, discussions, etc.) which you may wish to explore more fully.)

2 Share the image below (which shows the answers in abbreviated form). Ask if anyone needs anything clarified.

You’re Leaving Footprints Principle: Your information footprint is larger than you think.

There’s No Anonymity Principle: There is no anonymity on the Internet.

Information Is Valuable Principle: Information about you on the Internet will be used by somebody in their interest — including against you.

Someone Could Listen Principle: Communication over a network, unless strongly encrypted, is never just between two parties.

Sharing Releases Control Principle: Sharing information over a network means you give up control over that information — forever.

Search Is Improving Principle: Just because something can’t be found today, doesn’t mean it can’t be found tomorrow.

Online Is Real Principle: The online world is inseparable from the “real” world.

Identity Isn’t Guaranteed Principle: Identity is not guaranteed on the Internet.

You Can’t Escape Principle: You can’t avoid having an information footprint by not going online.

Privacy Requires Work Principle: Only you have an interest in maintaining your privacy.

3 Wrap up with a discussion of these principles.

Hands-on exploration of privacy tools

Click on the link below to download the procedure for the activity, as well as supporting material.

A graphic novel

Written by Michael Keller and Josh Neufeld, and produced by Al Jazeera, this graphic novel ‘Terms of Service. Understanding our role in the world of Big Data’ provides a good overview of critical data literacy issues, offering lots of interesting, concrete examples of real cases. The language is, however, challenging (C1+). It may be especially useful for trainee teachers.

A website

The Privacy International website is an extraordinary goldmine of information and resources. Rather than recommending anything specific, my suggestion is that you, or your students, use the ‘Search’ function on the homepage and see where you end up.

In the first post in this 3-part series, I focussed on data collection practices in a number of ELT websites, as a way of introducing ‘critical data literacy’. Here, I explore the term in more detail.

Although the term ‘big data’ has been around for a while (see this article and infographic) it’s less than ten years ago that it began to enter everyday language, and found its way into the OED (2013). In the same year, Viktor Mayer-Schönberger and Kenneth Cukier published their best-selling ‘Big Data: A Revolution That Will Transform How We Live, Work, and Think’ (2013) and it was hard to avoid enthusiastic references in the media to the transformative potential of big data in every sector of society.

Since then, the use of big data and analytics has become ubiquitous. Massive data collection (and data surveillance) has now become routine and companies like Palantir, which specialise in big data analytics, have become part of everyday life. Palantir’s customers include the LAPD, the CIA, the US Immigration and Customs Enforcement (ICE) and the British Government. Its recent history includes links with Cambridge Analytica, assistance in an operation to arrest the parents of illegal migrant children, and a racial discrimination lawsuit where the company was accused of having ‘routinely eliminated’ Asian job applicants (settled out of court for $1.7 million).

Unsurprisingly, the datafication of society has not gone entirely uncontested. Whilst the vast majority of people seem happy to trade their personal data for convenience and connectivity, a growing number are concerned about who benefits most from this trade-off. On an institutional level, the EU introduced the General Data Protection Regulation (GDPR), which led to Google being fined Ꞓ50 million for insufficient transparency in their privacy policy and their practices of processing personal data for the purposes of behavioural advertising. In the intellectual sphere, there has been a recent spate of books that challenge the practices of ubiquitous data collection, coining new terms like ‘surveillance capitalism’, ‘digital capitalism’ and ‘data colonialism’. Here are four recent books that I have found particularly interesting.

Beer, D. (2019). The Data Gaze. London: Sage

Couldry, N. & Mejias, U. A. (2019). The Costs of Connection. Stanford: Stanford University Press

Sadowski, J. (2020). Too Smart. Cambridge, Mass.: MIT Press

Zuboff, S. (2019). The Age of Surveillance Capitalism. New York: Public Affairs

The use of big data and analytics in education is also now a thriving industry, with its supporters claiming that these technologies can lead to greater personalization, greater efficiency of instruction and greater accountability. Opponents (myself included) argue that none of these supposed gains have been empirically demonstrated, and that the costs to privacy, equity and democracy outweigh any potential gains. There is a growing critical literature and useful, recent books include:

Bradbury, A. & Roberts-Holmes, G. (2018). The Datafication of Primary and Early Years Education. Abingdon: Routledge

Jarke, J. & Breiter, A. (Eds.) (2020). The Datafication of Education. Abingdon: Routledge

Williamson, B. (2017). Big Data in Education: The digital future of learning, policy and practice. London: Sage

Concomitant with the rapid growth in the use of digital tools for language learning and teaching, and therefore the rapid growth in the amount of data that learners were (mostly unwittingly) giving away, came a growing interest in the need for learners to develop a set of digital competencies, or literacies, which would enable them to use these tools effectively. In the same year that Mayer-Schönberger and Cukier brought out their ‘Big Data’ book, the first book devoted to digital literacies in English language teaching came out (Dudeney et al., 2013). They defined digital literacies as the individual and social skills needed to effectively interpret, manage, share and create meaning in the growing range of digital communication channels (Dudeney et al., 2013: 2). The book contained a couple of activities designed to raise students’ awareness of online identity issues, along with others intended to promote critical thinking about digitally-mediated information (what the authors call ‘information literacy’), but ‘critical literacy’ was missing from the authors’ framework.

Critical thinking and critical literacy are not the same thing. Although there is no generally agreed definition of the former (with a small ‘c’), it is focussed primarily on logic and comprehension (Lee, 2011). Paul Dummett and John Hughes (2019: 4) describe it as ‘a mindset that involves thinking reflectively, rationally and reasonably’. The prototypical critical thinking activity involves the analysis of a piece of fake news (e.g. the task where students look at a website about tree octopuses in Dudeney et al. 2013: 198 – 203). Critical literacy, on the other hand, involves standing back from texts and technologies and viewing them as ‘circulating within a larger social and textual context’ (Warnick, 2002). Consideration of the larger social context necessarily entails consideration of unequal power relationships (Leee, 2011; Darvin, 2017), such as that between Google and the average user of Google. And it follows from this that critical literacy has a socio-political emancipatory function.

Critical digital literacy is now a growing field of enquiry (e.g. Pötzsch, 2019) and there is an awareness that digital competence frameworks, such as the Digital Competence Framework of the European Commission, are incomplete and out of date without the inclusion of critical digital literacy. Dudeney et al (2013) clearly recognise the importance of including critical literacy in frameworks of digital literacies. In Pegrum et al. (2018, unfortunately paywalled), they update the framework from their 2013 book, and the biggest change is the inclusion of critical literacy. They divide this into the following:

  • critical digital literacy – closely related to information literacy
  • critical mobile literacy – focussing on issues brought to the fore by mobile devices, ranging from protecting privacy through to safeguarding mental and physical health
  • critical material literacy – concerned with the material conditions underpinning the use of digital technologies, ranging from the socioeconomic influences on technological access to the environmental impacts of technological manufacturing and disposal
  • critical philosophical literacy – concerned with the big questions posed to and about humanity as our lives become conjoined with the existence of our smart devices, robots and AI
  • critical academic literacy, which refers to the pressing need to conduct meaningful studies of digital technologies in place of what is at times ‘cookie-cutter’ research

I’m not entirely convinced by the subdivisions, but labelling in this area is still in its infancy. My particular interest here, in critical data literacy, seems to span across a number of their sub-divisions. And the term that I am using, ‘critical data literacy’, which I’ve taken from Tygel & Kirsch (2016), is sometimes referred to as ‘critical big data literacy’ (Sander, 2020a) or ‘personal data literacy’ (Pangrazio & Selwyn, 2019). Whatever it is called, it is the development of ‘informed and critical stances toward how and why [our] data are being used’ (Pangrazio & Selwyn, 2018). One of the two practical activities in the Pegrum at al article (2018) looks at precisely this area (the task requires students to consider the data that is collected by fitness apps). It will be interesting to see, when the new edition of the ‘Digital Literacies’ book comes out (perhaps some time next year), how many other activities take a more overtly critical stance.

In the next post, I’ll be looking at a range of practical activities for developing critical data literacy in the classroom. This involves both bridging the gaps in knowledge (about data, algorithms and online privacy) and learning, practically, how to implement ‘this knowledge for a more empowered internet usage’ (Sander, 2020b).

Without wanting to invalidate the suggestions in the next post, a word of caution is needed. Just as critical thinking activities in the ELT classroom cannot be assumed to lead to any demonstrable increase in critical thinking (although there may be other benefits to the activities), activities to promote critical literacy cannot be assumed to lead to any actual increase in critical literacy. The reaction of many people may well be ‘It’s not like it’s life or death or whatever’ (Pangrazio & Selwyn, 2018). And, perhaps, education is rarely, if ever, a solution to political and social problems, anyway. And perhaps, too, we shouldn’t worry too much about educational interventions not leading to their intended outcomes. Isn’t that almost always the case? But, with those provisos in mind, I’ll come back next time with some practical ideas.

REFERENCES

Darvin R. (2017). Language, Ideology, and Critical Digital Literacy. In: Thorne S., May S. (eds) Language, Education and Technology. Encyclopedia of Language and Education (3rd ed.). Springer, Cham. pp. 17 – 30 https://doi.org/10.1007/978-3-319-02237-6_35

Dudeney, G., Hockly, N. & Pegrum, M. (2013). Digital Literacies. Harlow: Pearson Education

Dummett, P. & Hughes, J. (2019). Critical Thinking in ELT. Boston: National Geographic Learning

Lee, C. J. (2011). Myths about critical literacy: What teachers need to unlearn. Journal of Language and Literacy Education [Online], 7 (1), 95-102. Available at http://www.coa.uga.edu/jolle/2011_1/lee.pdf

Mayer-Schönberger, V. & Cukier, K. (2013). Big Data: A Revolution That Will Transform How We Live, Work, and Think. London: John Murray

Pangrazio, L. & Selwyn, N. (2018). ‘It’s not like it’s life or death or whatever’: young people’s understandings of social media data. Social Media + Society, 4 (3): pp. 1–9. https://journals.sagepub.com/doi/pdf/10.1177/2056305118787808

Pangrazio, L. & Selwyn, N. (2019). ‘Personal data literacies’: A critical literacies approach to enhancing understandings of personal digital data. New Media and Society, 21 (2): pp. 419 – 437

Pegrum, M., Dudeney, G. & Hockly, N. (2018). Digital literacies revisited. The European Journal of Applied Linguistics and TEFL, 7 (2), pp. 3-24

Pötzsch, H. (2019). Critical Digital Literacy: Technology in Education Beyond Issues of User Competence and Labour-Market Qualifications. tripleC: Communication, Capitalism & Critique, 17: pp. 221 – 240 Available at https://www.triple-c.at/index.php/tripleC/article/view/1093

Sander, I. (2020a). What is critical big data literacy and how can it be implemented? Internet Policy Review, 9 (2). DOI: 10.14763/2020.2.1479 https://www.econstor.eu/bitstream/10419/218936/1/2020-2-1479.pdf

Sander, I. (2020b). Critical big data literacy tools – Engaging citizens and promoting empowered internet usage. Data & Policy, 2: DOI: https://doi.org/10.1017/dap.2020.5

Tygel, A. & Kirsch, R. (2016). Contributions of Paulo Freire for a Critical Data Literacy: a Popular Education Approach. The Journal of Community Informatics, 12 (3). Available at http://www.ci-journal.net/index.php/ciej/article/view/1296

Warnick, B. (2002). Critical Literacy in a Digital Era. Mahwah, NJ, Lawrence Erlbaum Associates

Take the Cambridge Assessment English website, for example. When you connect to the site, you will see, at the bottom of the screen, a familiar (to people in Europe, at least) notification about the site’s use of cookies: the cookies consent.

You probably trust the site, so ignore the notification and quickly move on to find the resource you are looking for. But if you did click on hyperlinked ‘set cookies’, what would you find? The first link takes you to the ‘Cookie policy’ where you will be told that ‘We use cookies principally because we want to make our websites and mobile applications user-friendly, and we are interested in anonymous user behaviour. Generally our cookies don’t store sensitive or personally identifiable information such as your name and address or credit card details’. Scroll down, and you will find out more about the kind of cookies that are used. Besides the cookies that are necessary to the functioning of the site, you will see that there are also ‘third party cookies’. These are explained as follows: ‘Cambridge Assessment works with third parties who serve advertisements or present offers on our behalf and personalise the content that you see. Cookies may be used by those third parties to build a profile of your interests and show you relevant adverts on other sites. They do not store personal information directly but use a unique identifier in your browser or internet device. If you do not allow these cookies, you will experience less targeted content’.

This is not factually inaccurate: personal information is not stored directly. However, it is extremely easy for this information to be triangulated with other information to identify you personally. In addition to the data that you generate by having cookies on your device, Cambridge Assessment will also directly collect data about you. Depending on your interactions with Cambridge Assessment, this will include ‘your name, date of birth, gender, contact data including your home/work postal address, email address and phone number, transaction data including your credit card number when you make a payment to us, technical data including internet protocol (IP) address, login data, browser type and technology used to access this website’. They say they may share this data ‘with other people and/or businesses who provide services on our behalf or at our request’ and ‘with social media platforms, including but not limited to Facebook, Google, Google Analytics, LinkedIn, in pseudonymised or anonymised forms’.

In short, Cambridge Assessment may hold a huge amount of data about you and they can, basically, do what they like with it.

The cookie and privacy policies are fairly standard, as is the lack of transparency in the phrasing of them. Rather more transparency would include, for example, information about which particular ad trackers you are giving your consent to. This information can be found with a browser extension tool like Ghostery, and these trackers can be blocked. As you’ll see below, there are 5 ad trackers on this site. This is rather more than other sites that English language teachers are likely to go to. ETS-TOEFL has 4, Macmillan English and Pearson have 3, CUP ELT and the British Council Teaching English have 1, OUP ELT, IATEFL, BBC Learning English and Trinity College have none. I could only find TESOL, with 6 ad trackers, which has more. The blogs for all these organisations invariably have more trackers than their websites.

The use of numerous ad trackers is probably a reflection of the importance that Cambridge Assessment gives to social media marketing. There is a research paper, produced by Cambridge Assessment, which outlines the significance of big data and social media analytics. They have far more Facebook followers (and nearly 6 million likes) than any other ELT page, and they are proud of their #1 ranking in the education category of social media. The amount of data that can be collected here is enormous and it can be analysed in myriad ways using tools like Ubervu, Yomego and Hootsuite.

A little more transparency, however, would not go amiss. According to a report in Vox, Apple has announced that some time next year ‘iPhone users will start seeing a new question when they use many of the apps on their devices: Do they want the app to follow them around the internet, tracking their behavior?’ Obviously, Google and Facebook are none too pleased about this and will be fighting back. The implications for ad trackers and online advertising, more generally, are potentially huge. I wrote to Cambridge Assessment about this and was pleased to hear that ‘Cambridge Assessment are currently reviewing the process by which we obtain users consent for the use of cookies with the intention of moving to a much more transparent model in the future’. Let’s hope that other ELT organisations are doing the same.

You may be less bothered than I am by the thought of dozens of ad trackers following you around the net so that you can be served with more personalized ads. But the digital profile about you, to which these cookies contribute, may include information about your ethnicity, disabilities and sexual orientation. This profile is auctioned to advertisers when you visit some sites, allowing them to show you ‘personalized’ adverts based on the categories in your digital profile. Contrary to EU regulations, these categories may include whether you have cancer, a substance-abuse problem, your politics and religion (as reported in Fortune https://fortune.com/2019/01/28/google-iab-sensitive-profiles/ ).

But it’s not these cookies that are the most worrying aspect about our lack of digital privacy. It’s the sheer quantity of personal data that is stored about us. Every time we ask our students to use an app or a platform, we are asking them to divulge huge amounts of data. With ClassDojo, for example, this includes names, usernames, passwords, age, addresses, photographs, videos, documents, drawings, or audio files, IP addresses and browser details, clicks, referring URL’s, time spent on site, and page views (Manolev et al., 2019; see also Williamson, 2019).

It is now widely recognized that the ‘consent’ that is obtained through cookie policies and other end-user agreements is largely spurious. These consent agreements, as Sadowski (2019) observes, are non-negotiated, and non-negotiable; you either agree or you are denied access. What’s more, he adds, citing one study, it would take 76 days, working for 8 hours a day, to read the privacy policies a person typically encounters in a year. As a result, most of us choose not to choose when we accept online services (Cobo, 2019: 25). We have little, if any, control over how the data that is collected is used (Birch et al., 2020). More importantly, perhaps, when we ask our students to sign up to an educational app, we are asking / telling them to give away their personal data, not just ours. They are unlikely to fully understand the consequences of doing so.

The extent of this ignorance is also now widely recognized. In the UK, for example, two reports (cited by Sander, 2020) indicate that ‘only a third of people know that data they have not actively chosen to share has been collected’ (Doteveryone, 2018: 5), and that ‘less than half of British adult internet users are aware that apps collect their location and information on their personal preferences’ (Ofcom, 2019: 14).

The main problem with this has been expressed by programmer and activist, Richard Stallman, in an interview with New York magazine (Kulwin, 2018): Companies are collecting data about people. The data that is collected will be abused. That’s not an absolute certainty, but it’s a practical, extreme likelihood, which is enough to make collection a problem.

The abuse that Smallman is referring to can come in a variety of forms. At the relatively trivial end is the personalized advertising. Much more serious is the way that data aggregation companies will scrape data from a variety of sources, building up individual data profiles which can be used to make significant life-impacting decisions, such as final academic grades or whether one is offered a job, insurance or credit (Manolev et al., 2019). Cathy O’Neil’s (2016) best-selling ‘Weapons of Math Destruction’ spells out in detail how this abuse of data increases racial, gender and class inequalities. And after the revelations of Edward Snowden, we all know about the routine collection by states of huge amounts of data about, well, everyone. Whether it’s used for predictive policing or straightforward repression or something else, it is simply not possible for younger people, our students, to know what personal data they may regret divulging at a later date.

Digital educational providers may try to reassure us that they will keep data private, and not use it for advertising purposes, but the reassurances are hollow. These companies may change their terms and conditions further down the line, and examples exist of when this has happened (Moore, 2018: 210). But even if this does not happen, the data can never be secure. Illegal data breaches and cyber attacks are relentless, and education ranked worst at cybersecurity out of 17 major industries in one recent analysis (Foresman, 2018). One report suggests that one in five US schools and colleges have fallen victim to cyber-crime. Two weeks ago, I learnt (by chance, as I happened to be looking at my security settings on Chrome) that my passwords for Quizlet, Future Learn, Elsevier and Science Direct had been compromised by a data breach. To get a better understanding of the scale of data breaches, you might like to look at the UK’s IT Governance site, which lists detected and publicly disclosed data breaches and cyber attacks each month (36.6 million records breached in August 2020). If you scroll through the list, you’ll see how many of them are educational sites. You’ll also see a comment about how leaky organisations have been throughout lockdown … because they weren’t prepared for the sudden shift online.

Recent years have seen a growing consensus that ‘it is crucial for language teaching to […] encompass the digital literacies which are increasingly central to learners’ […] lives’ (Dudeney et al., 2013). Most of the focus has been on the skills that are needed to use digital media. There also appears to be growing interest in developing critical thinking skills in the context of digital media (e.g. Peachey, 2016) – identifying fake news and so on. To a much lesser extent, there has been some focus on ‘issues of digital identity, responsibility, safety and ethics when students use these technologies’ (Mavridi, 2020a: 172). Mavridi (2020b: 91) also briefly discusses the personal risks of digital footprints, but she does not have the space to explore more fully the notion of critical data literacy. This literacy involves an understanding of not just the personal risks of using ‘free’ educational apps and platforms, but of why they are ‘free’ in the first place. Sander (2020b) suggests that this literacy entails ‘an understanding of datafication, recognizing the risks and benefits of the growing prevalence of data collection, analytics, automation, and predictive systems, as well as being able to critically reflect upon these developments. This includes, but goes beyond the skills of, for example, changing one’s social media settings, and rather constitutes an altered view on the pervasive, structural, and systemic levels of changing big data systems in our datafied societies’.

In my next two posts, I will, first of all, explore in more detail the idea of critical data literacy, before suggesting a range of classroom resources.

(I posted about privacy in March 2014, when I looked at the connections between big data and personalized / adaptive learning. In another post, September 2014, I looked at the claims of the CEO of Knewton who bragged that his company had five orders of magnitude more data about you than Google has. … We literally have more data about our students than any company has about anybody else about anything, and it’s not even close. You might find both of these posts interesting.)

References

Birch, K., Chiappetta, M. & Artyushina, A. (2020). ‘The problem of innovation in technoscientific capitalism: data rentiership and the policy implications of turning personal digital data into a private asset’ Policy Studies, 41:5, 468-487, DOI: 10.1080/01442872.2020.1748264

Cobo, C. (2019). I Accept the Terms and Conditions. https://adaptivelearninginelt.files.wordpress.com/2020/01/41acf-cd84b5_7a6e74f4592c460b8f34d1f69f2d5068.pdf

Doteveryone. (2018). People, Power and Technology: The 2018 Digital Attitudes Report. https://attitudes.doteveryone.org.uk

Dudeney, G., Hockly, N. & Pegrum, M. (2013). Digital Literacies. Harlow: Pearson Education

Foresman, B. (2018). Education ranked worst at cybersecurity out of 17 major industries. Edscoop, December 17, 2018. https://edscoop.com/education-ranked-worst-at-cybersecurity-out-of-17-major-industries/

Kulwin, K. (2018). F*ck Them. We Need a Law’: A Legendary Programmer Takes on Silicon Valley, New York Intelligencer, 2018, https://nymag.com/intelligencer/2018/04/richard-stallman-rms-on-privacy-data-and-free-software.html

Manolev, J., Sullivan, A. & Slee, R. (2019). ‘Vast amounts of data about our children are being harvested and stored via apps used by schools’ EduReseach Matters, February 18, 2019. https://www.aare.edu.au/blog/?p=3712

Mavridi, S. (2020a). Fostering Students’ Digital Responsibility, Ethics and Safety Skills (Dress). In Mavridi, S. & Saumell, V. (Eds.) Digital Innovations and Research in Language Learning. Faversham, Kent: IATEFL. pp. 170 – 196

Mavridi, S. (2020b). Digital literacies and the new digital divide. In Mavridi, S. & Xerri, D. (Eds.) English for 21st Century Skills. Newbury, Berks.: Express Publishing. pp. 90 – 98

Moore, M. (2018). Democracy Hacked. London: Oneworld

Ofcom. (2019). Adults: Media use and attitudes report [Report]. https://www.ofcom.org.uk/__data/assets/pdf_file/0021/149124/adults-media-use-and-attitudes-report.pdf

O’Neil, C. (2016). Weapons of Math Destruction. London: Allen Lane

Peachey, N. (2016). Thinking Critically through Digital Media. http://peacheypublications.com/

Sadowski, J. (2019). ‘When data is capital: Datafication, accumulation, and extraction’ Big Data and Society 6 (1) https://doi.org/10.1177%2F2053951718820549

Sander, I. (2020a). What is critical big data literacy and how can it be implemented? Internet Policy Review, 9 (2). DOI: 10.14763/2020.2.1479 https://www.econstor.eu/bitstream/10419/218936/1/2020-2-1479.pdf

Sander, I. (2020b). Critical big data literacy tools—Engaging citizens and promoting empowered internet usage. Data & Policy, 2: e5 doi:10.1017/dap.2020.5

Williamson, B. (2019). ‘Killer Apps for the Classroom? Developing Critical Perspectives on ClassDojo and the ‘Ed-tech’ Industry’ Journal of Professional Learning, 2019 (Semester 2) https://cpl.asn.au/journal/semester-2-2019/killer-apps-for-the-classroom-developing-critical-perspectives-on-classdojo

The use of big data and analytics in education continues to grow.

A vast apparatus of measurement is being developed to underpin national education systems, institutions and the actions of the individuals who occupy them. […] The presence of digital data and software in education is being amplified through massive financial and political investment in educational technologies, as well as huge growth in data collection and analysis in policymaking practices, extension of performance measurement technologies in the management of educational institutions, and rapid expansion of digital methodologies in educational research. To a significant extent, many of the ways in which classrooms function, educational policy departments and leaders make decisions, and researchers make sense of data, simply would not happen as currently intended without the presence of software code and the digital data processing programs it enacts. (Williamson, 2017: 4)

The most common and successful use of this technology so far has been in the identification of students at risk of dropping out of their courses (Jørno & Gynther, 2018: 204). The kind of analytics used in this context may be called ‘academic analytics’ and focuses on educational processes at the institutional level or higher (Gelan et al, 2018: 3). However, ‘learning analytics’, the capture and analysis of learner and learning data in order to personalize learning ‘(1) through real-time feedback on online courses and e-textbooks that can ‘learn’ from how they are used and ‘talk back’ to the teacher, and (2) individualization and personalization of the educational experience through adaptive learning systems that enable materials to be tailored to each student’s individual needs through automated real-time analysis’ (Mayer-Schönberger & Cukier, 2014) has become ‘the main keyword of data-driven education’ (Williamson, 2017: 10). See my earlier posts on this topic here and here and here.

Learning with big dataNear the start of Mayer-Schönberger and Cukier’s enthusiastic sales pitch (Learning with Big Data: The Future of Education) for the use of big data in education, there is a discussion of Duolingo. They quote Luis von Ahn, the founder of Duolingo, as saying ‘there has been little empirical work on what is the best way to teach a foreign language’. This is so far from the truth as to be laughable. Von Ahn’s comment, along with the Duolingo product itself, is merely indicative of a lack of awareness of the enormous amount of research that has been carried out. But what could the data gleaned from the interactions of millions of users with Duolingo tell us of value? The example that is given is the following. Apparently, ‘in the case of Spanish speakers learning English, it’s common to teach pronouns early on: words like “he,” “she,” and “it”.’ But, Duolingo discovered, ‘the term “it” tends to confuse and create anxiety for Spanish speakers, since the word doesn’t easily translate into their language […] Delaying the introduction of “it” until a few weeks later dramatically improves the number of people who stick with learning English rather than drop out.’ Was von Ahn unaware of the decades of research into language transfer effects? Did von Ahn (who grew up speaking Spanish in Guatemala) need all this data to tell him that English personal pronouns can cause problems for Spanish learners of English? Was von Ahn unaware of the debates concerning the value of teaching isolated words (especially grammar words!)?

The area where little empirical research has been done is not in different ways of learning another language: it is in the use of big data and learning analytics to assist language learning. Claims about the value of these technologies in language learning are almost always speculative – they are based on comparison to other school subjects (especially, mathematics). Gelan et al (2018: 2), who note this lack of research, suggest that ‘understanding language learner behaviour could provide valuable insights into task design for instructors and materials designers, as well as help students with effective learning strategies and personalised learning pathways’ (my italics). Reinders (2018: 81) writes ‘that analysis of prior experiences with certain groups or certain courses may help to identify key moments at which students need to receive more or different support. Analysis of student engagement and performance throughout a course may help with early identification of learning problems and may prompt early intervention’ (italics added). But there is some research out there, and it’s worth having a look at. Most studies that have collected learner-tracking data concern glossary use for reading comprehension and vocabulary retention (Gelan et al, 2018: 5), but a few have attempted to go further in scope.

Volk et al (2015) looked at the behaviour of the 20,000 students per day using the platform which accompanies ‘More!’ (Gerngross et al. 2008) to do their English homework for Austrian lower secondary schools. They discovered that

  • the exercises used least frequently were those that are located further back in the course book
  • usage is highest from Monday to Wednesday, declining from Thursday, with a rise again on Sunday
  • most interaction took place between 3:00 and 5:00 pm.
  • repetition of exercises led to a strong improvement in success rate
  • students performed better on multiple choice and matching exercises than they did where they had to produce some language

The authors of this paper conclude by saying that ‘the results of this study suggest a number of new avenues for research. In general, the authors plan to extend their analysis of exercise results and applied exercises to the population of all schools using the online learning platform more-online.at. This step enables a deeper insight into student’s learning behaviour and allows making more generalizing statements.’ When I shared these research findings with the Austrian lower secondary teachers that I work with, their reaction was one of utter disbelief. People get paid to do this research? Why not just ask us?

More useful, more actionable insights may yet come from other sources. For example, Gu Yueguo, Pro-Vice-Chancellor of the Beijing Foreign Studies University has announced the intention to set up a national Big Data research center, specializing in big data-related research topics in foreign language education (Yu, 2015). Meanwhile, I’m aware of only one big research project that has published its results. The EC Erasmus+ VITAL project (Visualisation Tools and Analytics to monitor Online Language Learning & Teaching) was carried out between 2015 and 2017 and looked at the learning trails of students from universities in Belgium, Britain and the Netherlands. It was discovered (Gelan et al, 2015) that:

  • students who did online exercises when they were supposed to do them were slightly more successful than those who were late carrying out the tasks
  • successful students logged on more often, spent more time online, attempted and completed more tasks, revisited both exercises and theory pages more frequently, did the work in the order in which it was supposed to be done and did more work in the holidays
  • most students preferred to go straight into the assessed exercises and only used the theory pages when they felt they needed to; successful students referred back to the theory pages more often than unsuccessful students
  • students made little use of the voice recording functionality
  • most online activity took place the day before a class and the day of the class itself

EU funding for this VITAL project amounted to 274,840 Euros[1]. The technology for capturing the data has been around for a long time. In my opinion, nothing of value, or at least nothing new, has been learnt. Publishers like Pearson and Cambridge University Press who have large numbers of learners using their platforms have been capturing learning data for many years. They do not publish their findings and, intriguingly, do not even claim that they have learnt anything useful / actionable from the data they have collected. Sure, an exercise here or there may need to be amended. Both teachers and students may need more support in using the more open-ended functionalities of the platforms (e.g. discussion forums). But are they getting ‘unprecedented insights into what works and what doesn’t’ (Mayer-Schönberger & Cukier, 2014)? Are they any closer to building better pedagogies? On the basis of what we know so far, you wouldn’t want to bet on it.

It may be the case that all the learning / learner data that is captured could be used in some way that has nothing to do with language learning. Show me a language-learning app developer who does not dream of monetizing the ‘behavioural surplus’ (Zuboff, 2018) that they collect! But, for the data and analytics to be of any value in guiding language learning, it must lead to actionable insights. Unfortunately, as Jørno & Gynther (2018: 198) point out, there is very little clarity about what is meant by ‘actionable insights’. There is a danger that data and analytics ‘simply gravitates towards insights that confirm longstanding good practice and insights, such as “students tend to ignore optional learning activities … [and] focus on activities that are assessed” (Jørno & Gynther, 2018: 211). While this is happening, the focus on data inevitably shapes the way we look at the object of study (i.e. language learning), ‘thereby systematically excluding other perspectives’ (Mau, 2019: 15; see also Beer, 2019). The belief that tech is always the solution, that all we need is more data and better analytics, remains very powerful: it’s called techno-chauvinism (Broussard, 2018: 7-8).

References

Beer, D. 2019. The Data Gaze. London: Sage

Broussard, M. 2018. Artificial Unintelligence. Cambridge, Mass.: MIT Press

Gelan, A., Fastre, G., Verjans, M., Martin, N., Jansenswillen, G., Creemers, M., Lieben, J., Depaire, B. & Thomas, M. 2018. ‘Affordances and limitations of learning analytics for computer­assisted language learning: a case study of the VITAL project’. Computer Assisted Language Learning. pp. 1­26. http://clok.uclan.ac.uk/21289/

Gerngross, G., Puchta, H., Holzmann, C., Stranks, J., Lewis-Jones, P. & Finnie, R. 2008. More! 1 Cyber Homework. Innsbruck, Austria: Helbling

Jørno, R. L. & Gynther, K. 2018. ‘What Constitutes an “Actionable Insight” in Learning Analytics?’ Journal of Learning Analytics 5 (3): 198 – 221

Mau, S. 2019. The Metric Society. Cambridge: Polity Press

Mayer-Schönberger, V. & Cukier, K. 2014. Learning with Big Data: The Future of Education. New York: Houghton Mifflin Harcourt

Reinders, H. 2018. ‘Learning analytics for language learning and teaching’. JALT CALL Journal 14 / 1: 77 – 86 https://files.eric.ed.gov/fulltext/EJ1177327.pdf

Volk, H., Kellner, K. & Wohlhart, D. 2015. ‘Learning Analytics for English Language Teaching.’ Journal of Universal Computer Science, Vol. 21 / 1: 156-174 http://www.jucs.org/jucs_21_1/learning_analytics_for_english/jucs_21_01_0156_0174_volk.pdf

Williamson, B. 2017. Big Data in Education. London: Sage

Yu, Q. 2015. ‘Learning Analytics: The next frontier for computer assisted language learning in big data age’ SHS Web of Conferences, 17 https://www.shs-conferences.org/articles/shsconf/pdf/2015/04/shsconf_icmetm2015_02013.pdf

Zuboff, S. 2019. The Age of Surveillance Capitalism. London: Profile Books

 

[1] See https://ec.europa.eu/programmes/erasmus-plus/sites/erasmusplus2/files/ka2-2015-he_en.pdf

ltsigIt’s hype time again. Spurred on, no doubt, by the current spate of books and articles  about AIED (artificial intelligence in education), the IATEFL Learning Technologies SIG is organising an online event on the topic in November of this year. Currently, the most visible online references to AI in language learning are related to Glossika , basically a language learning system that uses spaced repetition, whose marketing department has realised that references to AI might help sell the product. GlossikaThey’re not alone – see, for example, Knowble which I reviewed earlier this year .

In the wider world of education, where AI has made greater inroads than in language teaching, every day brings more stuff: How artificial intelligence is changing teaching , 32 Ways AI is Improving Education , How artificial intelligence could help teachers do a better job , etc., etc. There’s a full-length book by Anthony Seldon, The Fourth Education Revolution: will artificial intelligence liberate or infantilise humanity? (2018, University of Buckingham Press) – one of the most poorly researched and badly edited books on education I’ve ever read, although that won’t stop it selling – and, no surprises here, there’s a Pearson commissioned report called Intelligence Unleashed: An argument for AI in Education (2016) which is available free.

Common to all these publications is the claim that AI will radically change education. When it comes to language teaching, a similar claim has been made by Donald Clark (described by Anthony Seldon as an education guru but perhaps best-known to many in ELT for his demolition of Sugata Mitra). In 2017, Clark wrote a blog post for Cambridge English (now unavailable) entitled How AI will reboot language learning, and a more recent version of this post, called AI has and will change language learning forever (sic) is available on Clark’s own blog. Given the history of the failure of education predictions, Clark is making bold claims. Thomas Edison (1922) believed that movies would revolutionize education. Radios were similarly hyped in the 1940s and in the 1960s it was the turn of TV. In the 1980s, Seymour Papert predicted the end of schools – ‘the computer will blow up the school’, he wrote. Twenty years later, we had the interactive possibilities of Web 2.0. As each technology failed to deliver on the hype, a new generation of enthusiasts found something else to make predictions about.

But is Donald Clark onto something? Developments in AI and computational linguistics have recently resulted in enormous progress in machine translation. Impressive advances in automatic speech recognition and generation, coupled with the power that can be packed into a handheld device, mean that we can expect some re-evaluation of the value of learning another language. Stephen Heppell, a specialist at Bournemouth University in the use of ICT in Education, has said: ‘Simultaneous translation is coming, making language teachers redundant. Modern languages teaching in future may be more about navigating cultural differences’ (quoted by Seldon, p.263). Well, maybe, but this is not Clark’s main interest.

Less a matter of opinion and much closer to the present day is the issue of assessment. AI is becoming ubiquitous in language testing. Cambridge, Pearson, TELC, Babbel and Duolingo are all using or exploring AI in their testing software, and we can expect to see this increase. Current, paper-based systems of testing subject knowledge are, according to Rosemary Luckin and Kristen Weatherby, outdated, ineffective, time-consuming, the cause of great anxiety and can easily be automated (Luckin, R. & Weatherby, K. 2018. ‘Learning analytics, artificial intelligence and the process of assessment’ in Luckin, R. (ed.) Enhancing Learning and Teaching with Technology, 2018. UCL Institute of Education Press, p.253). By capturing data of various kinds throughout a language learner’s course of study and by using AI to analyse learning development, continuous formative assessment becomes possible in ways that were previously unimaginable. ‘Assessment for Learning (AfL)’ or ‘Learning Oriented Assessment (LOA)’ are two terms used by Cambridge English to refer to the potential that AI offers which is described by Luckin (who is also one of the authors of the Pearson paper mentioned earlier). In practical terms, albeit in a still very limited way, this can be seen in the CUP course ‘Empower’, which combines CUP course content with validated LOA from Cambridge Assessment English.

Will this reboot or revolutionise language teaching? Probably not and here’s why. AIED systems need to operate with what is called a ‘domain knowledge model’. This specifies what is to be learnt and includes an analysis of the steps that must be taken to reach that learning goal. Some subjects (especially STEM subjects) ‘lend themselves much more readily to having their domains represented in ways that can be automatically reasoned about’ (du Boulay, D. et al., 2018. ‘Artificial intelligences and big data technologies to close the achievement gap’ in Luckin, R. (ed.) Enhancing Learning and Teaching with Technology, 2018. UCL Institute of Education Press, p.258). This is why most AIED systems have been built to teach these areas. Language are rather different. We simply do not have a domain knowledge model, except perhaps for the very lowest levels of language learning (and even that is highly questionable). Language learning is probably not, or not primarily, about acquiring subject knowledge. Debate still rages about the relationship between explicit language knowledge and language competence. AI-driven formative assessment will likely focus most on explicit language knowledge, as does most current language teaching. This will not reboot or revolutionise anything. It will more likely reinforce what is already happening: a model of language learning that assumes there is a strong interface between explicit knowledge and language competence. It is not a model that is shared by most SLA researchers.

So, one thing that AI can do (and is doing) for language learning is to improve the algorithms that determine the way that grammar and vocabulary are presented to individual learners in online programs. AI-optimised delivery of ‘English Grammar in Use’ may lead to some learning gains, but they are unlikely to be significant. It is not, in any case, what language learners need.

AI, Donald Clark suggests, can offer personalised learning. Precisely what kind of personalised learning this might be, and whether or not this is a good thing, remains unclear. A 2015 report funded by the Gates Foundation found that we currently lack evidence about the effectiveness of personalised learning. We do not know which aspects of personalised learning (learner autonomy, individualised learning pathways and instructional approaches, etc.) or which combinations of these will lead to gains in language learning. The complexity of the issues means that we may never have a satisfactory explanation. You can read my own exploration of the problems of personalised learning starting here .

What’s left? Clark suggests that chatbots are one area with ‘huge potential’. I beg to differ and I explained my reasons eighteen months ago . Chatbots work fine in very specific domains. As Clark says, they can be used for ‘controlled practice’, but ‘controlled practice’ means practice of specific language knowledge, the practice of limited conversational routines, for example. It could certainly be useful, but more than that? Taking things a stage further, Clark then suggests more holistic speaking and listening practice with Amazon Echo, Alexa or Google Home. If and when the day comes that we have general, as opposed to domain-specific, AI, chatting with one of these tools would open up vast new possibilities. Unfortunately, general AI does not exist, and until then Alexa and co will remain a poor substitute for human-human interaction (which is readily available online, anyway). Incidentally, AI could be used to form groups of online language learners to carry out communicative tasks – ‘the aim might be to design a grouping of students all at a similar cognitive level and of similar interests, or one where the participants bring different but complementary knowledge and skills’ (Luckin, R., Holmes, W., Griffiths, M. & Forceir, L.B. 2016. Intelligence Unleashed: An argument for AI in Education. London: Pearson, p.26).

Predictions about the impact of technology on education have a tendency to be made by people with a vested interest in the technologies. Edison was a businessman who had invested heavily in motion pictures. Donald Clark is an edtech entrepreneur whose company, Wildfire, uses AI in online learning programs. Stephen Heppell is executive chairman of LP+ who are currently developing a Chinese language learning community for 20 million Chinese school students. The reporting of AIED is almost invariably in websites that are paid for, in one way or another, by edtech companies. Predictions need, therefore, to be treated sceptically. Indeed, the safest prediction we can make about hyped educational technologies is that inflated expectations will be followed by disillusionment, before the technology finds a smaller niche.

 

Back in December 2013, in an interview with eltjam , David Liu, COO of the adaptive learning company, Knewton, described how his company’s data analysis could help ELT publishers ‘create more effective learning materials’. He focused on what he calls ‘content efficacy[i]’ (he uses the word ‘efficacy’ five times in the interview), a term which he explains below:

A good example is when we look at the knowledge graph of our partners, which is a map of how concepts relate to other concepts and prerequisites within their product. There may be two or three prerequisites identified in a knowledge graph that a student needs to learn in order to understand a next concept. And when we have hundreds of thousands of students progressing through a course, we begin to understand the efficacy of those said prerequisites, which quite frankly were made by an author or set of authors. In most cases they’re quite good because these authors are actually good in what they do. But in a lot of cases we may find that one of those prerequisites actually is not necessary, and not proven to be useful in achieving true learning or understanding of the current concept that you’re trying to learn. This is interesting information that can be brought back to the publisher as they do revisions, as they actually begin to look at the content as a whole.

One commenter on the post, Tom Ewens, found the idea interesting. It could, potentially, he wrote, give us new insights into how languages are learned much in the same way as how corpora have given us new insights into how language is used. Did Knewton have any plans to disseminate the information publicly, he asked. His question remains unanswered.

At the time, Knewton had just raised $51 million (bringing their total venture capital funding to over $105 million). Now, 16 months later, Knewton have launched their new product, which they are calling Knewton Content Insights. They describe it as the world’s first and only web-based engine to automatically extract statistics comparing the relative quality of content items — enabling us to infer more information about student proficiency and content performance than ever before possible.

The software analyses particular exercises within the learning content (and particular items within them). It measures the relative difficulty of individual items by, for example, analysing how often a question is answered incorrectly and how many tries it takes each student to answer correctly. It also looks at what they call ‘exhaustion’ – how much content students are using in a particular area – and whether they run out of content. The software can correlate difficulty with exhaustion. Lastly, it analyses what they call ‘assessment quality’ – how well  individual questions assess a student’s understanding of a topic.

Knewton’s approach is premised on the idea that learning (in this case language learning) can be broken down into knowledge graphs, in which the information that needs to be learned can be arranged and presented hierarchically. The ‘granular’ concepts are then ‘delivered’ to the learner, and Knewton’s software can optimise the delivery. The first problem, as I explored in a previous post, is that language is a messy, complex system: it doesn’t lend itself terribly well to granularisation. The second problem is that language learning does not proceed in a linear, hierarchical way: it is also messy and complex. The third is that ‘language learning content’ cannot simply be delivered: a process of mediation is unavoidable. Are the people at Knewton unaware of the extensive literature devoted to the differences between synthetic and analytic syllabuses, of the differences between product-oriented and process-oriented approaches? It would seem so.

Knewton’s ‘Content Insights’ can only, at best, provide some sort of insight into the ‘language knowledge’ part of any learning content. It can say nothing about the work that learners do to practise language skills, since these are not susceptible to granularisation: you simply can’t take a piece of material that focuses on reading or listening and analyse its ‘content efficacy at the concept level’. Because of this, I predicted (in the post about Knowledge Graphs) that the likely focus of Knewton’s analytics would be discrete item, sentence-level grammar (typically tenses). It turns out that I was right.

Knewton illustrate their new product with screen shots such as those below.

Content-Insight-Assessment-1

 

 

 

 

 

Content-Insight-Exhaustion-1

 

 

 

 

 

 

 

They give a specific example of the sort of questions their software can answer. It is: do students generally find the present simple tense easier to understand than the present perfect tense? Doh!

It may be the case that Knewton Content Insights might optimise the presentation of this kind of grammar, but optimisation of this presentation and practice is highly unlikely to have any impact on the rate of language acquisition. Students are typically required to study the present perfect at every level from ‘elementary’ upwards. They have to do this, not because the presentation in, say, Headway, is not optimised. What they need is to spend a significantly greater proportion of their time on ‘language use’ and less on ‘language knowledge’. This is not just my personal view: it has been extensively researched, and I am unaware of any dissenting voices.

The number-crunching in Knewton Content Insights is unlikely, therefore, to lead to any actionable insights. It is, however, very likely to lead (as writer colleagues at Pearson and other publishers are finding out) to an obsession with measuring the ‘efficacy’ of material which, quite simply, cannot meaningfully be measured in this way. It is likely to distract from much more pressing issues, notably the question of how we can move further and faster away from peddling sentence-level, discrete-item grammar.

In the long run, it is reasonable to predict that the attempt to optimise the delivery of language knowledge will come to be seen as an attempt to tackle the wrong question. It will make no significant difference to language learners and language learning. In the short term, how much time and money will be wasted?

[i] ‘Efficacy’ is the buzzword around which Pearson has built its materials creation strategy, a strategy which was launched around the same time as this interview. Pearson is a major investor in Knewton.

2014-09-30_2216Jose Ferreira, the fast-talking sales rep-in-chief of Knewton, likes to dazzle with numbers. In a 2012 talk hosted by the US Department of Education, Ferreira rattles off the stats: So Knewton students today, we have about 125,000, 180,000 right now, by December it’ll be 650,000, early next year it’ll be in the millions, and next year it’ll be close to 10 million. And that’s just through our Pearson partnership. For each of these students, Knewton gathers millions of data points every day. That, brags Ferreira, is five orders of magnitude more data about you than Google has. … We literally have more data about our students than any company has about anybody else about anything, and it’s not even close. With just a touch of breathless exaggeration, Ferreira goes on: We literally know everything about what you know and how you learn best, everything.

The data is mined to find correlations between learning outcomes and learning behaviours, and, once correlations have been established, learning programmes can be tailored to individual students. Ferreira explains: We take the combined data problem all hundred million to figure out exactly how to teach every concept to each kid. So the 100 million first shows up to learn the rules of exponents, great let’s go find a group of people who are psychometrically equivalent to that kid. They learn the same ways, they have the same learning style, they know the same stuff, because Knewton can figure out things like you learn math best in the morning between 8:40 and 9:13 am. You learn science best in 42 minute bite sizes the 44 minute mark you click right, you start missing questions you would normally get right.

The basic premise here is that the more data you have, the more accurately you can predict what will work best for any individual learner. But how accurate is it? In the absence of any decent, independent research (or, for that matter, any verifiable claims from Knewton), how should we respond to Ferreira’s contribution to the White House Education Datapalooza?

A 51Oy5J3o0yL._AA258_PIkin4,BottomRight,-46,22_AA280_SH20_OU35_new book by Stephen Finlay, Predictive Analytics, Data Mining and Big Data (Palgrave Macmillan, 2014) suggests that predictive analytics are typically about 20 – 30% more accurate than humans attempting to make the same judgements. That’s pretty impressive and perhaps Knewton does better than that, but the key thing to remember is that, however much data Knewton is playing with, and however good their algorithms are, we are still talking about predictions and not certainties. If an adaptive system could predict with 90% accuracy (and the actual figure is typically much lower than that) what learning content and what learning approach would be effective for an individual learner, it would still mean that it was wrong 10% of the time. When this is scaled up to the numbers of students that use Knewton software, it means that millions of students are getting faulty recommendations. Beyond a certain point, further expansion of the data that is mined is unlikely to make any difference to the accuracy of predictions.

A further problem identified by Stephen Finlay is the tendency of people in predictive analytics to confuse correlation and causation. Certain students may have learnt maths best between 8.40 and 9.13, but it does not follow that they learnt it best because they studied at that time. If strong correlations do not involve causality, then actionable insights (such as individualised course design) can be no more than an informed gamble.

Knewton’s claim that they know how every student learns best is marketing hyperbole and should set alarm bells ringing. When it comes to language learning, we simply do not know how students learn (we do not have any generally accepted theory of second language acquisition), let alone how they learn best. More data won’t help our theories of learning! Ferreira’s claim that, with Knewton, every kid gets a perfectly optimized textbook, except it’s also video and other rich media dynamically generated in real time is equally preposterous, not least since the content of the textbook will be at least as significant as the way in which it is ‘optimized’. And, as we all know, textbooks have their faults.

Cui bono? Perhaps huge data and predictive analytics will benefit students; perhaps not. We will need to wait and find out. But Stephen Finlay reminds us that in gold rushes (and internet booms and the exciting world of Big Data) the people who sell the tools make a lot of money. Far more strike it rich selling picks and shovels to prospectors than do the prospectors. Likewise, there is a lot of money to be made selling Big Data solutions. Whether the buyer actually gets any benefit from them is not the primary concern of the sales people. (p.16/17) Which is, perhaps, one of the reasons that some sales people talk so fast.