We use cookies to ensure that we give you the best experience on our website. By continuing to browse this repository, you give consent for essential cookies to be used. You can read more about our Privacy and Cookie Policy.

Durham Research Online
You are in:

Fine-grained Main Ideas Extraction and Clustering of Online Course Reviews

Xiao, Chenghao and Shi, Lei and Cristea, Alexandra and Li, Zhaoxing and Pan, Ziqi (2022) 'Fine-grained Main Ideas Extraction and Clustering of Online Course Reviews.', in Artificial Intelligence in Education. , pp. 294-306. Lecture Notes in Computer Science., 13355


Online course reviews have been an essential way in which course providers could get insights into students’ perceptions about the course quality, especially in the context of massive open online courses (MOOCs), where it is hard for both parties to get further interaction. Analyzing online course reviews is thus an inevitable part for course providers towards the improvement of course quality and the structuring of future courses. However, reading through the often-time thousands of comments and extracting key ideas is not efficient and will potentially incur non-coverage of some important ideas. In this work, we propose a key idea extractor that is based on fine-grained aspect-level semantic units from comments, powered by different variations of state-of-the-art pre-trained language models (PLMs). Our approach differs from both previous topic modeling and keyword extraction methods, which lies in: First, we aim to not only eliminate the heavy reliance on human intervention and statistical characteristics that traditional topic models like LDA are based on, but also to overcome the coarse granularity of state-of-the-art topic models like top2vec. Second, different from previous keyword extraction methods, we do not extract keywords to summarize each comment, which we argue is not necessarily helpful for human readers to grasp key ideas at the course level. Instead, we cluster the ideas and concerns that have been most expressed throughout the whole course, without relying on the verbatimness of students’ wording. We show that this method provides high and stable coverage of students’ ideas.

Item Type:Book chapter
Full text:Publisher-imposed embargo until 27 July 2023.
(AM) Accepted Manuscript
File format - PDF
Publisher Web site:
Publisher statement:The final authenticated version is available online at
Date accepted:25 April 2022
Date deposited:01 September 2022
Date of first online publication:27 July 2022
Date first made open access:27 July 2023

Save or Share this output

Look up in GoogleScholar