Posts Tagged ‘data mining’

Learners are different, the argument goes, so learning paths will be different, too. And, the argument continues, if learners will benefit from individualized learning pathways, so instruction should be based around an analysis of the optimal learning pathways for individuals and tailored to match them. In previous posts, I have questioned whether such an analysis is meaningful or reliable and whether the tailoring leads to any measurable learning gains. In this post, I want to focus primarily on the analysis of learner differences.

Family / social background and previous educational experiences are obvious ways in which learners differ when they embark on any course of study. The way they impact on educational success is well researched and well established. Despite this research, there are some who disagree. For example, Dominic Cummings (former adviser to Michael Gove when he was UK Education minister and former campaign director of the pro-Brexit Vote Leave group) has argued  that genetic differences, especially in intelligence, account for more than 50% of the differences in educational achievement.

Cummings got his ideas from Robert Plomin , one of the world’s most cited living psychologists. Plomin, in a recent paper in Nature, ‘The New Genetics of Intelligence’ , argues that ‘intelligence is highly heritable and predicts important educational, occupational and health outcomes better than any other trait’. In an earlier paper, ‘Genetics affects choice of academic subjects as well as achievement’, Plomin and his co-authors argued that ‘choosing to do A-levels and the choice of subjects show substantial genetic influence, as does performance after two years studying the chosen subjects’. Environment matters, says Plomin , but it’s possible that genes matter more.

All of which leads us to the field known as ‘educational genomics’. In an article of breathless enthusiasm entitled ‘How genetics could help future learners unlock hidden potential’ , University of Sussex psychologist, Darya Gaysina, describes educational genomics as the use of ‘detailed information about the human genome – DNA variants – to identify their contribution to particular traits that are related to education [… ] it is thought that one day, educational genomics could enable educational organisations to create tailor-made curriculum programmes based on a pupil’s DNA profile’. It could, she writes, ‘enable schools to accommodate a variety of different learning styles – both well-worn and modern – suited to the individual needs of the learner [and] help society to take a decisive step towards the creation of an education system that plays on the advantages of genetic background. Rather than the current system, that penalises those individuals who do not fit the educational mould’.

The goal is not just personalized learning. It is ‘Personalized Precision Education’ where researchers ‘look for patterns in huge numbers of genetic factors that might explain behaviors and achievements in individuals. It also focuses on the ways that individuals’ genotypes and environments interact, or how other “epigenetic” factors impact on whether and how genes become active’. This will require huge amounts of ‘data gathering from learners and complex analysis to identify patterns across psychological, neural and genetic datasets’. Why not, suggests Darya Gaysina, use the same massive databases that are being used to identify health risks and to develop approaches to preventative medicine?

BG-for-educationIf I had a spare 100 Euros, I (or you) could buy Darya Gaysina’s book, ‘Behavioural Genetics for Education’ (Palgrave Macmillan, 2016) and, no doubt, I’d understand the science better as a result. There is much about the science that seems problematic, to say the least (e.g. the definition and measurement of intelligence, the lack of reference to other research that suggests academic success is linked to non-genetic factors), but it isn’t the science that concerns me most. It’s the ethics. I don’t share Gaysina’s optimism that ‘every child in the future could be given the opportunity to achieve their maximum potential’. Her utopianism is my fear of Gattaca-like dystopias. IQ testing, in its early days, promised something similarly wonderful, but look what became of that. When you already have reporting of educational genomics using terms like ‘dictate’, you have to fear for the future of Gaysina’s brave new world.

Futurism.pngEducational genomics could equally well lead to expectations of ‘certain levels of achievement from certain groups of children – perhaps from different socioeconomic or ethnic groups’ and you can be pretty sure it will lead to ‘companies with the means to assess students’ genetic identities [seeking] to create new marketplaces of products to sell to schools, educators and parents’. The very fact that people like Dominic Cummings (described by David Cameron as a ‘career psychopath’ ) have opted to jump on this particular bandwagon is, for me, more than enough cause for concern.

Underlying my doubts about educational genomics is a much broader concern. It’s the apparent belief of educational genomicists that science can provide technical solutions to educational problems. It’s called ‘solutionism’ and it doesn’t have a pretty history.


2014-09-30_2216Jose Ferreira, the fast-talking sales rep-in-chief of Knewton, likes to dazzle with numbers. In a 2012 talk hosted by the US Department of Education, Ferreira rattles off the stats: So Knewton students today, we have about 125,000, 180,000 right now, by December it’ll be 650,000, early next year it’ll be in the millions, and next year it’ll be close to 10 million. And that’s just through our Pearson partnership. For each of these students, Knewton gathers millions of data points every day. That, brags Ferreira, is five orders of magnitude more data about you than Google has. … We literally have more data about our students than any company has about anybody else about anything, and it’s not even close. With just a touch of breathless exaggeration, Ferreira goes on: We literally know everything about what you know and how you learn best, everything.

The data is mined to find correlations between learning outcomes and learning behaviours, and, once correlations have been established, learning programmes can be tailored to individual students. Ferreira explains: We take the combined data problem all hundred million to figure out exactly how to teach every concept to each kid. So the 100 million first shows up to learn the rules of exponents, great let’s go find a group of people who are psychometrically equivalent to that kid. They learn the same ways, they have the same learning style, they know the same stuff, because Knewton can figure out things like you learn math best in the morning between 8:40 and 9:13 am. You learn science best in 42 minute bite sizes the 44 minute mark you click right, you start missing questions you would normally get right.

The basic premise here is that the more data you have, the more accurately you can predict what will work best for any individual learner. But how accurate is it? In the absence of any decent, independent research (or, for that matter, any verifiable claims from Knewton), how should we respond to Ferreira’s contribution to the White House Education Datapalooza?

A 51Oy5J3o0yL._AA258_PIkin4,BottomRight,-46,22_AA280_SH20_OU35_new book by Stephen Finlay, Predictive Analytics, Data Mining and Big Data (Palgrave Macmillan, 2014) suggests that predictive analytics are typically about 20 – 30% more accurate than humans attempting to make the same judgements. That’s pretty impressive and perhaps Knewton does better than that, but the key thing to remember is that, however much data Knewton is playing with, and however good their algorithms are, we are still talking about predictions and not certainties. If an adaptive system could predict with 90% accuracy (and the actual figure is typically much lower than that) what learning content and what learning approach would be effective for an individual learner, it would still mean that it was wrong 10% of the time. When this is scaled up to the numbers of students that use Knewton software, it means that millions of students are getting faulty recommendations. Beyond a certain point, further expansion of the data that is mined is unlikely to make any difference to the accuracy of predictions.

A further problem identified by Stephen Finlay is the tendency of people in predictive analytics to confuse correlation and causation. Certain students may have learnt maths best between 8.40 and 9.13, but it does not follow that they learnt it best because they studied at that time. If strong correlations do not involve causality, then actionable insights (such as individualised course design) can be no more than an informed gamble.

Knewton’s claim that they know how every student learns best is marketing hyperbole and should set alarm bells ringing. When it comes to language learning, we simply do not know how students learn (we do not have any generally accepted theory of second language acquisition), let alone how they learn best. More data won’t help our theories of learning! Ferreira’s claim that, with Knewton, every kid gets a perfectly optimized textbook, except it’s also video and other rich media dynamically generated in real time is equally preposterous, not least since the content of the textbook will be at least as significant as the way in which it is ‘optimized’. And, as we all know, textbooks have their faults.

Cui bono? Perhaps huge data and predictive analytics will benefit students; perhaps not. We will need to wait and find out. But Stephen Finlay reminds us that in gold rushes (and internet booms and the exciting world of Big Data) the people who sell the tools make a lot of money. Far more strike it rich selling picks and shovels to prospectors than do the prospectors. Likewise, there is a lot of money to be made selling Big Data solutions. Whether the buyer actually gets any benefit from them is not the primary concern of the sales people. (p.16/17) Which is, perhaps, one of the reasons that some sales people talk so fast.