We use cookies to ensure that we give you the best experience on our website. By continuing to browse this repository, you give consent for essential cookies to be used. You can read more about our Privacy and Cookie Policy.

Durham Research Online
You are in:

Finding Records in Social Media: A Natural Language Processing Fundamentals Exploration

Oladejo, Babatunde Kazeem and Hadžidedić, Sunčica and Ganić, Emir (2021) 'Finding Records in Social Media: A Natural Language Processing Fundamentals Exploration.', in Mediterranean Forum - Data Science Conference. , pp. 151-164. Communications in Computer and Information Science., 1343


Social media postings are now routinely used as proof of activities, events, or transactions in news media, academic institutions, governments, judicial courts, commerce, and various other organizations. The need to preserve social media content as records has drawn the interest of academic researchers, industry professionals, and policy makers. Despite the importance of this research area, selection of records from a pool of social media content remains an area of low research activity. This paper explores the use of Natural Language Processing methods to classify and select records from a pool of tweets (twitter social media content). We experiment with various characteristics of the data and NLP parameters with the goal of determining optimal parameters for training a supervised machine learning classifier. This paper can serve as an aid for understanding the fundamental elements of automating the selection of social media records.

Item Type:Book chapter
Full text:(AM) Accepted Manuscript
Download PDF
Publisher Web site:
Publisher statement:The final authenticated version is available online at
Date accepted:No date available
Date deposited:07 September 2021
Date of first online publication:02 April 2021
Date first made open access:07 September 2021

Save or Share this output

Look up in GoogleScholar