Abstract
This paper outlines a multi-source data-driven solution to the problem of early predication of academic success of students on a Software Development programme which is delivered in both online and hybrid teaching modes.
Academic failure and subsequent dropout of students enrolled on computer science & software development (programming-based courses) are significant problems, with some UK institutions reporting first year dropout rates of 11%. A key strategy to combat academic failure and dropout is to provide timely and meaningful interventions, directly targeted to the students that need them most. Central to this is a requirement to quickly and accurately identify the students that require such interventions.
Many of the existing approaches either identify students at a point that is too late to deploy any meaningful academic intervention, or if they do make an early prediction they do so with pre-matriculation data alone which often does not provide a clear enough indicator of academic performance; accordingly such systems have a low chance of successfully predicting which students will struggle academically.
Our approach mines multiple sources of data simultaneously. We make use of a diverse range of inputs, including: pre-course aptitude test scores; results from weekly summative assessments; interim results from formative assessments; attendance data from online lectures; and Learning Management System (LMS) Activity Data.
Another problem with existing approaches is that predictions can quickly become stale. The learning path of a student is rarely linear, rather their academic ability will develop at key points during the lifetime of the course they are undertaking. Our approach avoids 'staleness' by using machine learning algorithms to frequently recalculate the prediction of likely academic success for each student. For example, at the outset of a course our system relies entirely on pre-matriculation data sources (aptitude test scores). However, by the end of the first week (and each week thereafter) the model is recalculated using freshly available data from online lecture attendance data, assessment results (both summative and formative) and LMS Activity Data.
Our reliance on a suite of data, along with our ability to dynamically recalculate our prediction model has two key benefits over other systems:
1. Students that were not flagged as 'at risk' of poor academic performance based on pre-matriculation data alone can be identified quickly once the course is running as the model recalculates on a weekly cycle. Any 'at risk' students can then be offered an intervention in short order.
2. Students that were flagged as 'at risk' at the outset of the course (or early on) can be monitored on a weekly basis. Their most recent prediction of academic success can be compared to previous predictions. If no improvement is observed then further interventions or pastoral care can be provided.
This paper describes the process of training a supervised learning model that can predict academic success and then provides a performance evaluation of a number of predictive machine learning methods. Finally it provides a discussion and recommendations for suitable feature selection, training sets and learning algorithms suitable to solve the problem of prediction of academic success.
Academic failure and subsequent dropout of students enrolled on computer science & software development (programming-based courses) are significant problems, with some UK institutions reporting first year dropout rates of 11%. A key strategy to combat academic failure and dropout is to provide timely and meaningful interventions, directly targeted to the students that need them most. Central to this is a requirement to quickly and accurately identify the students that require such interventions.
Many of the existing approaches either identify students at a point that is too late to deploy any meaningful academic intervention, or if they do make an early prediction they do so with pre-matriculation data alone which often does not provide a clear enough indicator of academic performance; accordingly such systems have a low chance of successfully predicting which students will struggle academically.
Our approach mines multiple sources of data simultaneously. We make use of a diverse range of inputs, including: pre-course aptitude test scores; results from weekly summative assessments; interim results from formative assessments; attendance data from online lectures; and Learning Management System (LMS) Activity Data.
Another problem with existing approaches is that predictions can quickly become stale. The learning path of a student is rarely linear, rather their academic ability will develop at key points during the lifetime of the course they are undertaking. Our approach avoids 'staleness' by using machine learning algorithms to frequently recalculate the prediction of likely academic success for each student. For example, at the outset of a course our system relies entirely on pre-matriculation data sources (aptitude test scores). However, by the end of the first week (and each week thereafter) the model is recalculated using freshly available data from online lecture attendance data, assessment results (both summative and formative) and LMS Activity Data.
Our reliance on a suite of data, along with our ability to dynamically recalculate our prediction model has two key benefits over other systems:
1. Students that were not flagged as 'at risk' of poor academic performance based on pre-matriculation data alone can be identified quickly once the course is running as the model recalculates on a weekly cycle. Any 'at risk' students can then be offered an intervention in short order.
2. Students that were flagged as 'at risk' at the outset of the course (or early on) can be monitored on a weekly basis. Their most recent prediction of academic success can be compared to previous predictions. If no improvement is observed then further interventions or pastoral care can be provided.
This paper describes the process of training a supervised learning model that can predict academic success and then provides a performance evaluation of a number of predictive machine learning methods. Finally it provides a discussion and recommendations for suitable feature selection, training sets and learning algorithms suitable to solve the problem of prediction of academic success.
Original language | English |
---|---|
Title of host publication | INTED 2023: 17 International Technology, Education and Development Conference: proceedings |
Editors | Luis Gómez Chova, Chelo González Martínez, Joanna Lees |
Publisher | IATED |
Pages | 7374 |
Number of pages | 1 |
ISBN (Electronic) | 9788409490264 |
DOIs | |
Publication status | Published - 08 Mar 2023 |
Event | 17th International Technology, Education and Development Conference - Valencia, Spain Duration: 06 Mar 2023 → 08 Mar 2023 |
Publication series
Name | INTED Proceedings |
---|---|
ISSN (Electronic) | 2340-1079 |
Conference
Conference | 17th International Technology, Education and Development Conference |
---|---|
Abbreviated title | INTED 2023 |
Country/Territory | Spain |
City | Valencia |
Period | 06/03/2023 → 08/03/2023 |
Keywords
- Academic performance
- academic success
- machine learning
- intervention
- hybrid teaching