Researching research: part 1

Posted: July 19, 2014 in research
Tags: , , , , ,

article-2614966-1D6DC26500000578-127_634x776In the 8th post on this blog (‘Theory, Research and Practice’), I referred to the lack of solid research into learning analytics. Whilst adaptive learning enthusiasts might disagree with much, or even most, of what I have written on this subject, here, at least, was an area of agreement. May of this year, however, saw the launch of the inaugural issue of the Journal of Learning Analytics, the first journal ‘dedicated to research into the challenges of collecting, analysing and reporting data with the specific intent to improve learning’. It is a peer-reviewed, open-access journal, available here , which is published by the Society for Learning Analytics Research (SoLAR), a consortium of academics from 9 universities in the US, Canada, Britain and Australia.

I decided to take a closer look. In this and my next two posts, I will focus on one article from this inaugural issue. It’s called Early Alert of Academically At‐Risk Students: An Open Source Analytics Initiative and it is co-authored by Sandeep M. Jayaprakash, Erik W. Moody, Eitel J.M. Lauría, James R. Regan, and Joshua D. Baron of Marist College in the US. Bear with me, please – it’s more interesting than it might sound!

The background to this paper is the often referred to problem of college drop-outs in the US, and the potential of learning analytics to address what is seen as a ‘national challenge’. The most influential work that has been done in this area to date was carried out at Purdue University. Purdue developed an analytical system, called Course Signals, which identified students at risk of course failure and offered a range of interventions (more about these in the next post) which were designed to improve student outcomes. I will have more to say about the work at Purdue in my third post, but, for the time being, it is enough to say that, in the field, it has been considered very successful, and that the authors of the paper I looked at have based their approach on the work done at Purdue.

Jayaprakash et al developed their own analytical system, based on Purdue’s Course Signals, and used it at their own institution, Marist College. Basically, they wanted to know if they could replicate the good results that had been achieved at Purdue. They then took the same analytical system to four different institutions, of very different kinds (public, as opposed to private; community colleges offering 2-year programmes rather than universities) to see if the results could be replicated there, too. They also wanted to find out if the interventions with students who had been signalled as at-risk would be as effective as they had been at Purdue. So far, so good: it is clearly very important to know if one particular piece of research has any significance beyond its immediate local context.

So, what did Jayaprakash et al find out? Basically, they learnt that their software worked as well at Marist as Course Signals had done at Purdue. They collected data on student demographics and aptitude, course grades and course related data, data on students’ interactions with the LMS they were using and performance data captured by the LMS. Oh, yes, and absenteeism. At the other institutions where they trialled their software, the system was 10% less accurate in predicting drop-outs, but the authors of the research still felt that ‘predictive models developed based on data from one institution may be scalable to other institutions’.

But more interesting than the question of whether or not the predictive analytics worked is the question of which specific features of the data were the most powerful predictors. What they discovered was that absenteeism was highly significant. No surprises there. They also learnt that the other most powerful predictors were (1) the students’ cumulative grade point average (GPA), an average of a student’s academic scores over their entire academic career, and (2) the scores recorded by the LMS of the work that students had done during the course which would contribute to their final grade. No surprises there, either. As the authors point out, ‘given that these two attributes are such fundamental aspects of academic success, it is not surprising that the predictive model has fared so well across these different institutions’.

Agreed, it is not surprising at all that students with lower scores and a history of lower scores are more likely to drop out of college than students with higher scores. But, I couldn’t help wondering, do we really need sophisticated learning analytics to tell us this? Wouldn’t any teacher know this already? They would, of course, if they knew their students, but if the teacher: student ratio is in the order of 1: 100 (not unheard of in lower-funded courses delivered primarily through an LMS), many teachers (and their students) might benefit from automated alert systems.

But back to the differences between the results at Purdue and Marist and at the other institutions. Why were the predictive analytics less successful at the latter? The answer is in the nature of the institutions. Essentially, it boils down to this. In institutions with low drop-out rates, the analytics are more reliable than in institutions with high drop-out rates, because the more at-risk students there are, the harder it is to predict the particular individuals who will actually drop out. Jayaprakash et al provide the key information in a useful table. Students at Marist College are relatively well-off (only 16% receive Pell Grants, which are awarded to students in financial need), and only a small number (12%) are from ‘ethnic minorities’. The rate of course non-completion in normal time is relatively low (at 20%). In contrast, at one of the other institutions, the College of the Redwoods in California, 44% of the students receive Pell Grants and 22% of them are from ‘ethnic minorities’. The non-completion rate is a staggering 96%. At Savannah State University, 78% of the students receive Pell Grants, and the non-completion rate is 70%. The table also shows the strong correlation between student poverty and high student: faculty ratios.

In other words, the poorer you are, the less likely you are to complete your course of study, and the less likely you are to know your tutors (these two factors also correlate). In other other words, the whiter you are, the more likely you are to complete your course of study (because of the strong correlations between race and poverty). While we are playing the game of statistical correlations, let’s take it a little further. As the authors point out, ‘there is considerable evidence that students with lower socio-economic status have lower GPAs and graduation rates’. If, therefore, GPAs are one of the most significant predictors of academic success, we can say that socio-economic status (and therefore race) is one of the most significant predictors of academic success … even if the learning analytics do not capture this directly.

Actually, we have known this for a long time. The socio-economic divide in education is frequently cited as one of the big reasons for moving towards digitally delivered courses. This particular piece of research was funded (more about this in the next posts) with the stipulation that it ‘investigated and demonstrated effective techniques to improve student retention in socio-economically disadvantaged populations’. We have also known for some time that digitally delivered education increases the academic divide between socio-economic groups. So what we now have is a situation where a digital technology (learning analytics) is being used as a partial solution to a problem that has always been around, but which has been exacerbated by the increasing use of another digital technology (LMSs) in education. We could say, then, that if we weren’t using LMSs, learning analytics would not be possible … but we would need them less, anyway.

My next post will look at the results of the interventions with students that were prompted by the alerts generated by the learning analytics. Advance warning: it will make what I have written so far seem positively rosy.

  1. One of the unintended consequences of data-driven educational policies is widespread and massive cheating. In the latest New Yorker, there’s a long piece describing how otherwise dedicated teachers and head teachers in publicly-funded schools in Atlanta ‘conspired’ to alter students’ test-scores so as to be able to keep their schools – serving depressed, lower-income and predominantly black inner-city areas – from being closed and the students relocated. The school superintendent for Atlanta is described as belonging ‘to a movement of reformers who believed that the value of the marketplace could resuscitate public education. She approached the job like a business executive: she courted philanthropists, set accountability measures, and created performance objectives that were more rigorous than those required by No Child Left Behind, which became law in 2002’.

    David Berliner, former dean of the school of education at Arizona State University, is quoted as saying that, ‘with the passage of the law, teachers were asked to compensate for factors outside their control. He said, “The people who say poverty is no excuse for low performance are now using teacher accountability as an excuse for doing nothing about poverty”‘.

    Teachers who wanted to keep their jobs, and to save the school from closing, were driven to such desperate measures as fiddling attendance records, razoring open examination papers, and doctoring the students’ papers. The turnaround that resulted attracted national attention, and the Atlanta superintendent was named national Superintendent of the Year. Under her leadership, the district received more than forty million dollars from charities, including the Bill and Melinda Gates Foundation. Finally, suspicions were roused, investigations mounted, and it was discovered that 44 schools had cheated and that ‘a culture of fear, intimidation and retaliation has infested the district, allowing cheating – at all levels – to go unchecked for years’ and that ‘data had been used as an abusive and cruel weapon to embarrass and punish’.

    The article goes on to quote John Ewing, who served as the executive director of the American Mathematical Society for fifteen years (and this is where the story starts to converge with your own project, Philip, so it’s worth quoting in full):

    ‘[Ewing] told me that he is perlexed by educators’ “infatuation with data,” their faith that it is more authoritative than using their own judgement. He explains the problems in terms of Campbell’s law, a principle that describes the risks of using a single indicator to measure complex social phenomena: the greater the value placed on a quantitative measure, like test scores, the more likely it is that the people using it and the process it measures will be corrupted […] In a 2011 paper in Notices of the American Mathematical Society, he warned that policymakers were using mathematics “to intimidate – to preempt debate about the goals of education and measures of success”.’

    Reference: Aviv, R. (2014) ‘Wrong Answer’, in The New Yorker, July 21, 2014, 54- 65.

  2. Hi Philip,
    Thanks for all this information – it’s really helping to see the bigger picture behind the drive to digital.
    In the last paragraph you state: ‘We have also known for some time that digitally delivered education increases the academic divide between socio-economic groups.’ I’d like to know where this information comes from, please. The rest of your argument is so well backed up that I feel such a significant statement needs more solid justification.

    • philipjkerr says:

      Hi Dan
      Thanks for the question. I think that I have, perhaps, overstated my point a little. Let me nuance it. Claims from edtech advocates that digital technologies offer the potential for increasingly democratized systems of educational provision seem reasonable enough at first glance. However, as Selwyn (Education and Technology, 2011, p. 102) has remarked, such claims are ‘compromised by what appears to be a complex and divided pattern of digital technology (non-) use within society’. He quotes one major study (Pew Internet and American Life Project, ‘The Ever-Shifting Internet Population: A New look at Internet Access and the Digital Divide?’, 2003, p.41) which observed that ‘demography is destiny when it comes to predicting who will go online’. Other research which indicates that the digital divide is rooted in socio-economic conditions are Dutton, W. & Helsper, E. ‘Oxford Internet Survey’ (2009), Jones, S. & Fox, S. ‘Generations Online in 2009’ (Pew Internet and American Life Project, 2009), and Jones, S., Johnson-Yale, C., Millermaier, S. & Seoane Perez, F. ‘US college students’ internet use’ in the Journal of Computer-Mediated Communication, 14, pp.244-264 (2009). At the very least, then, we can say that digital technologies in education have done nothing to bridge the differences in educational opportunities between different socio-economic groups.
      But have they actually increased such differences? In my next post, I want to provide a partial answer to this question, but here are a few other observations. The increasing use of predictive analytics in the US has, as the article I am reviewing describes, been motivated by a desire to increase student retention rates. Knowing that students from less economically privileged groups are less likely to succeed than others, ‘many institutions chose to address the issue of retention via the admission process’ (Jayaprakash et al, p.13). This is one, but not the only, reason why students from less privileged backgrounds in both K-12 and higher education have found an increasing number of doors closed to them. The growing gap between well-to-do and poor students is reported and discussed in Vincent Tinto’s ‘Research and practice of student retention: what next?’ (Journal of College Student Retention, 2007, 8 (1), 1-19).
      Another significant reason is the ever-increasing expansion of private education in the US. Digital technology use in education and privatization of education are, however, closely connected, as I have argued elsewhere. The advocates of digitalization of education (including learning analytics and adaptive learning) are also the advocates of privatization, managerialism and more and more testing. It is all of these things put together that is resulting in the growing gap in educational opportunity. So, it is not the case that digital educational technology, per se, is responsible for the growing gap in educational opportunities. It is the uses to which that technology is being put.

  3. Thanks for the information – it’s certainly an improvement on the didactic approach which I was seeing die out when I was in school, which only catered for good students. Regardless of how trivial the findings sound, it’s always validating to see the research point them out. It does seem odd that internet isn’t the great leveller I thought it was – I’ve done volunteering teaching and saw kids from impoverished backgrounds do very well thanks to it.

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s