We use cookies to ensure that we give you the best experience on our website. By continuing to browse this repository, you give consent for essential cookies to be used. You can read more about our Privacy and Cookie Policy.

Durham Research Online
You are in:

Explaining Individual and Collective Programming Students’ Behavior by Interpreting a Black-Box Predictive Model

Pereira, Filipe Dwan and Fonseca, Samuel C. and Oliveira, Elaine H. T. and Cristea, Alexandra I. and Bellhauser, Henrik and Rodrigues, Luiz and Oliveira, David B. F. and Isotani, Seiji and Carvalho, Leandro S. G. (2021) 'Explaining Individual and Collective Programming Students’ Behavior by Interpreting a Black-Box Predictive Model.', IEEE Access, 9 . pp. 117097-117119.


Predicting student performance as early as possible and analysing to which extent initial student behaviour could lead to failure or success is critical in introductory programming (CS1) courses, for allowing prompt intervention in a move towards alleviating their high failure rate. However, in CS1 performance prediction, there is a serious lack of studies that interpret the predictive model’s decisions. In this sense, we designed a long-term study using very fine-grained log-data of 2056 students, collected from the first two weeks of CS1 courses. We extract features that measure how students deal with deadlines, how they fix errors, how much time they spend programming, and so forth. Subsequently, we construct a predictive model that achieved cutting-edge results with area under the curve (AUC) of.89, and an average accuracy of 81.3%. To allow an effective intervention and to facilitate human-AI collaboration towards prescriptive analytics, we, for the first time, to the best of our knowledge, go a step further than the prediction itself and leverage this field by proposing an approach to explaining our predictive model decisions individually and collectively using a game-theory based framework (SHAP), (Lundberg et al. , 2020) that allows interpreting our black-box non-linear model linearly. In other words, we explain the feature effects, clearly by visualising and analysing individual predictions, the overall importance of features, and identification of typical prediction paths. This method can be further applied to other emerging competitive models, as the CS1 prediction field progresses, ensuring transparency of the process for key stakeholders: administrators, teachers, and students.

Item Type:Article
Full text:(VoR) Version of Record
Available under License - Creative Commons Attribution 4.0.
Download PDF
Publisher Web site:
Publisher statement:This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see
Date accepted:No date available
Date deposited:28 October 2021
Date of first online publication:18 August 2021
Date first made open access:28 October 2021

Save or Share this output

Look up in GoogleScholar