Datapalooza

Posted: September 30, 2014 in analytics, big data
Tags: actionable insights, algorithms, analytics, big data, coursebooks, data mining, Knewton, learning outcomes, marketing

Jose Ferreira, the fast-talking sales rep-in-chief of Knewton, likes to dazzle with numbers. In a 2012 talk hosted by the US Department of Education, Ferreira rattles off the stats: So Knewton students today, we have about 125,000, 180,000 right now, by December it’ll be 650,000, early next year it’ll be in the millions, and next year it’ll be close to 10 million. And that’s just through our Pearson partnership. For each of these students, Knewton gathers millions of data points every day. That, brags Ferreira, is five orders of magnitude more data about you than Google has. … We literally have more data about our students than any company has about anybody else about anything, and it’s not even close. With just a touch of breathless exaggeration, Ferreira goes on: We literally know everything about what you know and how you learn best, everything.

The data is mined to find correlations between learning outcomes and learning behaviours, and, once correlations have been established, learning programmes can be tailored to individual students. Ferreira explains: We take the combined data problem all hundred million to figure out exactly how to teach every concept to each kid. So the 100 million first shows up to learn the rules of exponents, great let’s go find a group of people who are psychometrically equivalent to that kid. They learn the same ways, they have the same learning style, they know the same stuff, because Knewton can figure out things like you learn math best in the morning between 8:40 and 9:13 am. You learn science best in 42 minute bite sizes the 44 minute mark you click right, you start missing questions you would normally get right.

The basic premise here is that the more data you have, the more accurately you can predict what will work best for any individual learner. But how accurate is it? In the absence of any decent, independent research (or, for that matter, any verifiable claims from Knewton), how should we respond to Ferreira’s contribution to the White House Education Datapalooza?

A new book by Stephen Finlay, Predictive Analytics, Data Mining and Big Data (Palgrave Macmillan, 2014) suggests that predictive analytics are typically about 20 – 30% more accurate than humans attempting to make the same judgements. That’s pretty impressive and perhaps Knewton does better than that, but the key thing to remember is that, however much data Knewton is playing with, and however good their algorithms are, we are still talking about predictions and not certainties. If an adaptive system could predict with 90% accuracy (and the actual figure is typically much lower than that) what learning content and what learning approach would be effective for an individual learner, it would still mean that it was wrong 10% of the time. When this is scaled up to the numbers of students that use Knewton software, it means that millions of students are getting faulty recommendations. Beyond a certain point, further expansion of the data that is mined is unlikely to make any difference to the accuracy of predictions.

A further problem identified by Stephen Finlay is the tendency of people in predictive analytics to confuse correlation and causation. Certain students may have learnt maths best between 8.40 and 9.13, but it does not follow that they learnt it best because they studied at that time. If strong correlations do not involve causality, then actionable insights (such as individualised course design) can be no more than an informed gamble.

Knewton’s claim that they know how every student learns best is marketing hyperbole and should set alarm bells ringing. When it comes to language learning, we simply do not know how students learn (we do not have any generally accepted theory of second language acquisition), let alone how they learn best. More data won’t help our theories of learning! Ferreira’s claim that, with Knewton, every kid gets a perfectly optimized textbook, except it’s also video and other rich media dynamically generated in real time is equally preposterous, not least since the content of the textbook will be at least as significant as the way in which it is ‘optimized’. And, as we all know, textbooks have their faults.

Cui bono? Perhaps huge data and predictive analytics will benefit students; perhaps not. We will need to wait and find out. But Stephen Finlay reminds us that in gold rushes (and internet booms and the exciting world of Big Data) the people who sell the tools make a lot of money. Far more strike it rich selling picks and shovels to prospectors than do the prospectors. Likewise, there is a lot of money to be made selling Big Data solutions. Whether the buyer actually gets any benefit from them is not the primary concern of the sales people. (p.16/17) Which is, perhaps, one of the reasons that some sales people talk so fast.

Comments

Scott Thornbury says:

October 1, 2014 at 6:14 am

“When it comes to language learning, we simply do not know how students learn”. Or what. And the what is of particular relevance to any system predicated on the incremental accumulation of data. What, in short, is the knowledge base of SLA?

In a brilliant article in the latest ELT Journal (‘Moving beyond accuracy: from tests of English to tests of “Englishing”‘) Christopher J. Hall makes the point that

‘Most actual learning is not the result of the step-by-step internal reproduction of an externally existing “target” system. Its is true that substantial amounts of English knowledge develop through instructionally regulated explicit learning. But for the majority of learners, English resources also inevitably develop through natural acquisition, resulting in the dynamic construction of a unique and constantly shifting mental system, created on the basis of unique, localized experiences of use…. Learning, then, is not confined to internally reproducing the external target system as declarative knowledge, but also involves constructing a unique, personal procedural system which will inevitably differ from the taught system in important ways.’

Data-driven learning (and coursebooks for that matter – I had to get that in) seems to be all about ‘reproducing the external target system as declarative knowledge’. It might, conceivably, work for subjects like maths. But only partially for language.

Reply
Russ says:

October 1, 2014 at 6:46 am

I’m curious what Hall bases this on. Is it mentioned in the article?

Reply
- Scott Thornbury says:
  
  October 1, 2014 at 10:45 pm
  
  I can’t remember, Russ, and I’m presently 10,000km away from the copy of ELT J that I quoted from this morning. But it’s fairly self-evident (if you can stomach that notion!) that the L2 system that each learner deploys is not a copy, nor even a pale copy, of the system that they have been taught. And yet many learners are able to work wonders (communicatively speaking) with this idiosyncratic, unique, non-standard, unstable system, the elements of which cannot necessarily be predicted on the basis of an aggregation of data from x million other learners.
  
  Reply
  - Russ says:
    
    October 5, 2014 at 4:07 pm
    
    ah, sorry Scott. I should’ve downloaded the article, and looked myself. A friend of mine suggested I was ‘trolling’ with that question but I was genuinely curious. You see, I’ve recently been trying to find out about the ‘natural order of acquisition’ research and exactly what research there is to back that up. I thought Hall was talking about that here, hence the question. After reading the article I think he was talking about something quite different.
    
    It was quite an eye-opening read. I think I’ll need to read it again before I can really get my head round it. There’s quite a lot packed into the sentences at times and I had trouble unpacking some of it.
    
    You’re right that I have problems with the idea of ‘self evident’ 😉
    http://malingual.blogspot.co.uk/2013/12/the-importance-of-research.html
    (last section)
Datapalooza | e-learning-ukr | Scoop.it says:

October 2, 2014 at 5:29 am

[…] Jose Ferreira, the fast-talking sales rep-in-chief of Knewton, likes to dazzle with numbers. In a 2012 talk hosted by the US Department of Education, Ferreira rattles off the stats: So Knewton stud… […]

Reply
Datapalooza | Digital Delights | Scoop.it says:

October 2, 2014 at 6:17 am

[…] Jose Ferreira, the fast-talking sales rep-in-chief of Knewton, likes to dazzle with numbers. In a 2012 talk hosted by the US Department of Education, Ferreira rattles off the stats: So Knewton stud… […]

Reply
Scott Thornbury says:

October 3, 2014 at 7:56 am

Further to my comment about the learner’s internalized linguistic system (both grammar and lexicon) being a pale reflection of the pedagogical system – and Russ’s request for references to research that might support such a view – I’ve just attended a talk by Michael Long who is emphatic in arguing that all the research into developmental orders that has been done since the 1979s and 80s provides not a shred of evidence to support the view that what is taught is learned. He cites Manfred Pienemann’s ‘processability theory’ (PT) to explain this, i.e. that learning is constrained by cognitive processing capacity, a theory that, Long adds, has been put to the test empirically, and has yielded a more than 90% success rate in terms of predicting acquisition orders in a whole range of languages. But I suspect that processing capacity is not the only reason that the learner’s system emerges idiosyncratically. Usage-based theorists (with whom Christopher Hall professes an allegiance) would claim that exposure is also crucial, while those of a more sociocultural bent would argue that the learner’s social and communicative needs will determine, to a large extent, what intake becomes uptake.

Long, by the way, does not mince his words. ‘Is there evidence that the structural syllabus works? The answer is no. Clearly no’.

Reply
- Russ says:
  
  October 6, 2014 at 10:23 am
  
  Thanks for the info. That’s quite emphatic. Wish I could’ve seen it. I actually downloaded the Peinemann (1988?) paper as well, a while ago as it seems to be the one that is constantly referenced. It was quite tough going.
  
  I know that you’ve read Swan’s collected essays and so you’ve probably seen his article called ‘legislation by hypothesis’ in which he suggests the evidence for order of acquisition is not as strong as some may suggest. Similarly Catherine Walter suggests research shows that grammar teaching ‘does’ work. http://www.theguardian.com/education/2012/sep/18/teach-grammar-rules
  
  I’m no great fan of the structural syllabus but I’m genuinely not sure what to believe or where to look for the answers, -or even if I understand the question (I’m beginning to think it’s the latter). I recently had a chat with a new colleague who did a PhD on this topic and I still didn’t feel any better informed after, just more confused. I suggested to her that a phrase like “I should go” could, in my experience be learnt at any point. She replied that this is ‘lexical’ not grammar, -so now I’m not even sure we’re talking about the same thing.
  
  She was suggesting that these studies referred to things like 3rd person s, past tense and these kinds of features. But since all students have different 1st languages and since all of those languages have a greater and lesser amount of similarity with English (some languages having past tense and some not) how can there be a universal order of acquisition for all students?
  
  I think Swan gives the example that in order to make a question in Japanese we just add ‘ka’ at the end of a sentence. He asks if that rule really needs to come at a specific developmental point or if it couldn’t just be learnt at any time.
  
  The more I look into this the less I seem to understand. If you know any writers who’ve laid this all out in a clear, easy to follow way then I’d love to read it.
  
  Confused of Leicester.
  
  Reply
Big data, learning analytics and language learning / teaching | Adaptive Learning in ELT says:

March 14, 2019 at 12:31 pm

[…] of data-driven education’ (Williamson, 2017: 10). See my earlier posts on this topic here and here and […]

Reply
Platforms and big data in ELT – a look back at the last decade | Adaptive Learning in ELT says:

January 7, 2020 at 10:15 am

[…] was clear, from very early on (see, for example, my posts from 2014 here and here) that Knewton’s product was little more than what Michael Feldstein called ‘snake oil’. Why […]

Reply
Critical data literacy #1: our data and ELT websites | Adaptive Learning in ELT says:

September 17, 2020 at 9:33 am

[…] I looked at the connections between big data and personalized / adaptive learning. In another post, September 2014, I looked at the claims of the CEO of Knewton who bragged that his company had five orders of […]

Reply

	Cutting To The Chase… on Questions about pre-teaching v…
	chrisfrybarcelona on Multimodality and modals
	MB on Multimodality and modals
	Multimodality and mo… on Digital literacies literacy
	Multimodality and mo… on Fake news and critical thinkin…

	Cutting To The Chase… on Questions about pre-teaching v…
	chrisfrybarcelona on Multimodality and modals
	MB on Multimodality and modals
	Multimodality and mo… on Digital literacies literacy
	Multimodality and mo… on Fake news and critical thinkin…

Adaptive Learning in ELT

Follow Blog via Email

Recent Posts

Recent Comments

Archives

Categories

Datapalooza

Leave a comment Cancel reply

Recent Posts

Recent Comments

Archives

Categories

Follow Blog via Email

Recent Posts

Recent Comments

Archives

Categories

Datapalooza

Share this:

Related

Leave a comment Cancel reply