In the 8th post on this blog (‘Theory, Research and Practice’), I referred to the lack of solid research into learning analytics. Whilst adaptive learning enthusiasts might disagree with much, or even most, of what I have written on this subject, here, at least, was an area of agreement. May of this year, however, saw the launch of the inaugural issue of the Journal of Learning Analytics, the first journal ‘dedicated to research into the challenges of collecting, analysing and reporting data with the specific intent to improve learning’. It is a peer-reviewed, open-access journal, available here , which is published by the Society for Learning Analytics Research (SoLAR), a consortium of academics from 9 universities in the US, Canada, Britain and Australia.
I decided to take a closer look. In this and my next two posts, I will focus on one article from this inaugural issue. It’s called Early Alert of Academically At‐Risk Students: An Open Source Analytics Initiative and it is co-authored by Sandeep M. Jayaprakash, Erik W. Moody, Eitel J.M. Lauría, James R. Regan, and Joshua D. Baron of Marist College in the US. Bear with me, please – it’s more interesting than it might sound!
The background to this paper is the often referred to problem of college drop-outs in the US, and the potential of learning analytics to address what is seen as a ‘national challenge’. The most influential work that has been done in this area to date was carried out at Purdue University. Purdue developed an analytical system, called Course Signals, which identified students at risk of course failure and offered a range of interventions (more about these in the next post) which were designed to improve student outcomes. I will have more to say about the work at Purdue in my third post, but, for the time being, it is enough to say that, in the field, it has been considered very successful, and that the authors of the paper I looked at have based their approach on the work done at Purdue.
Jayaprakash et al developed their own analytical system, based on Purdue’s Course Signals, and used it at their own institution, Marist College. Basically, they wanted to know if they could replicate the good results that had been achieved at Purdue. They then took the same analytical system to four different institutions, of very different kinds (public, as opposed to private; community colleges offering 2-year programmes rather than universities) to see if the results could be replicated there, too. They also wanted to find out if the interventions with students who had been signalled as at-risk would be as effective as they had been at Purdue. So far, so good: it is clearly very important to know if one particular piece of research has any significance beyond its immediate local context.
So, what did Jayaprakash et al find out? Basically, they learnt that their software worked as well at Marist as Course Signals had done at Purdue. They collected data on student demographics and aptitude, course grades and course related data, data on students’ interactions with the LMS they were using and performance data captured by the LMS. Oh, yes, and absenteeism. At the other institutions where they trialled their software, the system was 10% less accurate in predicting drop-outs, but the authors of the research still felt that ‘predictive models developed based on data from one institution may be scalable to other institutions’.
But more interesting than the question of whether or not the predictive analytics worked is the question of which specific features of the data were the most powerful predictors. What they discovered was that absenteeism was highly significant. No surprises there. They also learnt that the other most powerful predictors were (1) the students’ cumulative grade point average (GPA), an average of a student’s academic scores over their entire academic career, and (2) the scores recorded by the LMS of the work that students had done during the course which would contribute to their final grade. No surprises there, either. As the authors point out, ‘given that these two attributes are such fundamental aspects of academic success, it is not surprising that the predictive model has fared so well across these different institutions’.
Agreed, it is not surprising at all that students with lower scores and a history of lower scores are more likely to drop out of college than students with higher scores. But, I couldn’t help wondering, do we really need sophisticated learning analytics to tell us this? Wouldn’t any teacher know this already? They would, of course, if they knew their students, but if the teacher: student ratio is in the order of 1: 100 (not unheard of in lower-funded courses delivered primarily through an LMS), many teachers (and their students) might benefit from automated alert systems.
But back to the differences between the results at Purdue and Marist and at the other institutions. Why were the predictive analytics less successful at the latter? The answer is in the nature of the institutions. Essentially, it boils down to this. In institutions with low drop-out rates, the analytics are more reliable than in institutions with high drop-out rates, because the more at-risk students there are, the harder it is to predict the particular individuals who will actually drop out. Jayaprakash et al provide the key information in a useful table. Students at Marist College are relatively well-off (only 16% receive Pell Grants, which are awarded to students in financial need), and only a small number (12%) are from ‘ethnic minorities’. The rate of course non-completion in normal time is relatively low (at 20%). In contrast, at one of the other institutions, the College of the Redwoods in California, 44% of the students receive Pell Grants and 22% of them are from ‘ethnic minorities’. The non-completion rate is a staggering 96%. At Savannah State University, 78% of the students receive Pell Grants, and the non-completion rate is 70%. The table also shows the strong correlation between student poverty and high student: faculty ratios.
In other words, the poorer you are, the less likely you are to complete your course of study, and the less likely you are to know your tutors (these two factors also correlate). In other other words, the whiter you are, the more likely you are to complete your course of study (because of the strong correlations between race and poverty). While we are playing the game of statistical correlations, let’s take it a little further. As the authors point out, ‘there is considerable evidence that students with lower socio-economic status have lower GPAs and graduation rates’. If, therefore, GPAs are one of the most significant predictors of academic success, we can say that socio-economic status (and therefore race) is one of the most significant predictors of academic success … even if the learning analytics do not capture this directly.
Actually, we have known this for a long time. The socio-economic divide in education is frequently cited as one of the big reasons for moving towards digitally delivered courses. This particular piece of research was funded (more about this in the next posts) with the stipulation that it ‘investigated and demonstrated effective techniques to improve student retention in socio-economically disadvantaged populations’. We have also known for some time that digitally delivered education increases the academic divide between socio-economic groups. So what we now have is a situation where a digital technology (learning analytics) is being used as a partial solution to a problem that has always been around, but which has been exacerbated by the increasing use of another digital technology (LMSs) in education. We could say, then, that if we weren’t using LMSs, learning analytics would not be possible … but we would need them less, anyway.
My next post will look at the results of the interventions with students that were prompted by the alerts generated by the learning analytics. Advance warning: it will make what I have written so far seem positively rosy.