Affiliations 

  • 1 School of Computer Sciences, Universiti Sains Malaysia, Penang, Malaysia
  • 2 Regenerative Medicine Cluster, Advanced Medical and Dental Institute, Universiti Sains Malaysia, Penang, Malaysia
Healthc Inform Res, 2021 Jul;27(3):200-213.
PMID: 34384202 DOI: 10.4258/hir.2021.27.3.200

Abstract

Objective: The main aim of this study was to use text mining on social media to analyze information and gain insight into the health-related concerns of thalassemia patients, thalassemia carriers, and their caregivers.

Methods: Posts from two Facebook groups whose members consisted of thalassemia patients, thalassemia carriers, and caregivers in Malaysia were extracted using the Data Miner tool. In this study, a new framework known as Malay-English social media text pre-processing was proposed for performing the steps of pre-processing the noisy mixed language (Malay-English language) of social media posts. Topic modeling was used to identify hidden topics within posts shared among members. Three different topic models-latent Dirichlet allocation (LDA) in GenSim, LDA in MALLET, and latent semantic analysis-were applied to the dataset with and without stemming using Python.

RESULTS: LDA in MALLET without stemming was found to be the best topic model for this dataset. Eight topics were identified within the posts shared by members. Of those eight topics, four were newly discovered by this study, and four others corresponded to the findings of previous studies that used an interview approach.

Conclusions: Topic 2 (the challenges faced by thalassemia patients) was found to be the topic with the highest attention and engagement. Healthcare practitioners and other concerned parties should make an effort to build a stronger support system related to this issue for those affected by thalassemia.

* Title and MeSH Headings from MEDLINE®/PubMed®, a database of the U.S. National Library of Medicine.