Part 4: big data and analytics

Posted: January 29, 2014 in A guide to adaptive learning
Tags: , , , , , , , , ,

In order to understand more complex models of adaptive learning, it is necessary to take a temporary step sideways away from the world of language learning. Businesses have long used analytics – the analysis of data to find meaningful patterns – in insurance, banking and marketing. With the exponential growth in computer processing power and memory capacity, businesses now have access to volumes of data of almost unimaginable size. This is known as ‘big data’ and has been described as ‘a revolution that will transform how we live, work and think’ (Mayer-Schönberger & Cukier, ‘Big Data’, 2013). Frequently cited examples of the potential of big data are the success of Amazon to analyze and predict buying patterns and the use of big data analysis in Barack Obama’s 2012 presidential re-election. Business commentators are all singing the same song on the subject. This will be looked at again in later posts. For the time being, it is enough to be aware of the main message. ‘The high-performing organisation of the future will be one that places great value on data and analytical exploration’ (The Economist Intelligence Unit, ‘In Search of Insight and Foresight: Getting more out of big data’ 2013, p.15). ‘Almost no sphere of business activity will remain untouched by this movement,’ (McAfee & Brynjolfsson, ‘Big Data: The Management Revolution’, Harvard Business Review (October 2012), p. 65).

The Economist cover

With the growing bonds between business and education (another topic which will be explored later), it is unsurprising that language learning / teaching materials are rapidly going down the big data route. In comparison to what is now being developed for ELT, the data that is analyzed in the adaptive learning models I have described in an earlier post is very limited, and the algorithms used to shape the content are very simple.

The volume and variety of data and the speed of processing are now of an altogether different order. Jose Ferreira, CEO of Knewton, one of the biggest players in adaptive learning in ELT, spells out the kind of data that can be tapped[1]:

At Knewton, we divide educational data into five types: one pertaining to student identity and onboarding, and four student activity-based data sets that have the potential to improve learning outcomes. They’re listed below in order of how difficult they are to attain:

1) Identity Data: Who are you? Are you allowed to use this application? What admin rights do you have? What district are you in? How about demographic info?

2) User Interaction Data: User interaction data includes engagement metrics, click rate, page views, bounce rate, etc. These metrics have long been the cornerstone of internet optimization for consumer web companies, which use them to improve user experience and retention. This is the easiest to collect of the data sets that affect student outcomes. Everyone who creates an online app can and should get this for themselves.

3) Inferred Content Data: How well does a piece of content “perform” across a group, or for any one subgroup, of students? What measurable student proficiency gains result when a certain type of student interacts with a certain piece of content? How well does a question actually assess what it intends to? Efficacy data on instructional materials isn’t easy to generate — it requires algorithmically normed assessment items. However it’s possible now for even small companies to “norm” small quantities of items. (Years ago, before we developed more sophisticated methods of norming items at scale, Knewton did so using Amazon’s “Mechanical Turk” service.)

4) System-Wide Data: Rosters, grades, disciplinary records, and attendance information are all examples of system-wide data. Assuming you have permission (e.g. you’re a teacher or principal), this information is easy to acquire locally for a class or school. But it isn’t very helpful at small scale because there is so little of it on a per-student basis. At very large scale it becomes more useful, and inferences that may help inform system-wide recommendations can be teased out.

5) Inferred Student Data: Exactly what concepts does a student know, at exactly what percentile of proficiency? Was an incorrect answer due to a lack of proficiency, or forgetfulness, or distraction, or a poorly worded question, or something else altogether? What is the probability that a student will pass next week’s quiz, and what can she do right this moment to increase it?

Software of this kind keeps complex personal profiles, with millions of variables per student, on as many students as necessary. The more student profiles (and therefore students) that can be compared, the more useful the data is. Big players in this field, such as Knewton, are aiming for student numbers in the tens to hundreds of millions. Once data volume of this order is achieved, the ‘analytics’, or the algorithms that convert data into ‘actionable insights’ (J. Spring, ‘Education Networks’ (New York: Routledge, 2012), p.55) become much more reliable.

  1. I’m reminded of this study: – inferred data is set to be bigger than we can even imagine…

  2. philipjkerr says:

    Thanks for that link, Gavin. The article looks extremely interesting.

  3. Ania Kolbuszewska says:

    Thanks so much for the very informative and relevant posts – brilliant stuff.

  4. Sarah Cunningham says:

    Hi Philip and many thanks for the great posts, looking forward to the rest. Weird that all this futuristic data and technology seems to be taking us back into the dark ages methodologically in so many ways. Your examples from duolingo remind me of the 1949 Hungarian phrasebook I used to have (‘She wore a low-necked lilac evening gown’ was my favourite useful phrase) Back then we used to laugh at it now we are worried about being steam-rollered by it. Funny world!

    • Jill Hadfield says:

      Agree absolutely – it seems to be going back to mechanistic formulaic meaning less decontextualised stuff – The pen of my aunt’ type language learning

  5. From the Knewton blog you linked to: ‘We must all commit to the principle that the data ultimately belong to the students and the schools. We are merely custodians, and we must do our utmost to safeguard it while providing maximum openness for those to whom it belongs.’

    Don’t banks also claim to be ‘merely custodians’? Have they been paragons of ‘maximum openness’? Is this the direction we’re heading?

  6. philipjkerr says:

    I’ve just finished reading ‘The Great American Education-Industrial Complex’ by Anthony G. Picciano and Joel Spring, in which there is a very interesting section on the advocacy of big data in education by the World Economic Forum. The Forum’s ‘Global Education Technology Report 2010-2011’, according to Picciano and Spring (p.35-37),
    ‘advocates the application of what it calls the ‘Transformation 2.0 agenda to education and other social services. What is transformation 2.0? It is the application and analysis of data related to school management, operations, student assessment scores, and instructional materials. Data collected on students, in this paradigm, could be used to match instructional materials with student abilities and predict the student’s education future. The person chosen to write the Report’s section on ‘Transformation 2.0 for an Effective Social Strategy’ was Mikael Hagstrom, the Executive Vice President of Europe, the Middle East, Africa and Asia Pacific at SAS which as described on its website ‘is the leader in business analytics software and services, and the largest independent vendor in the business intelligence market’.
    […] Hagstrom argues that the use of data-driven decision making has evolved through three stages; stages that can be used to frame its application to education. […] The third stage uses ‘analytics’, which he defines as the software and processes that convert data in ‘actionable insights’.
    […] Reflecting its advocacy of applying analytics to educational data, World Economic Forum’s (2011) report ‘Empowering People and Transforming Society’ identified Knewton as one of its three pioneers of Information Technology and New Media for 2011. ‘

  7. philipjkerr says:

    There’s a very good article by Neil Selwyn in Learning, Media and Technology (2014) called ‘Data entry: towards the critical study of digital data and education’. In it he discusses, digital data and the reproduction of inequalities and social relations, digital data and the intensification of managerialism within education, dataveillance, digital data and the reductive nature of ‘what counts’ as ‘education’. Contact me if you have trouble getting hold of a copy.

  8. […] main keyword of data-driven education’ (Williamson, 2017: 10). See my earlier posts on this topic here and here and […]

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s