Methods: Posts from two Facebook groups whose members consisted of thalassemia patients, thalassemia carriers, and caregivers in Malaysia were extracted using the Data Miner tool. In this study, a new framework known as Malay-English social media text pre-processing was proposed for performing the steps of pre-processing the noisy mixed language (Malay-English language) of social media posts. Topic modeling was used to identify hidden topics within posts shared among members. Three different topic models-latent Dirichlet allocation (LDA) in GenSim, LDA in MALLET, and latent semantic analysis-were applied to the dataset with and without stemming using Python.
RESULTS: LDA in MALLET without stemming was found to be the best topic model for this dataset. Eight topics were identified within the posts shared by members. Of those eight topics, four were newly discovered by this study, and four others corresponded to the findings of previous studies that used an interview approach.
Conclusions: Topic 2 (the challenges faced by thalassemia patients) was found to be the topic with the highest attention and engagement. Healthcare practitioners and other concerned parties should make an effort to build a stronger support system related to this issue for those affected by thalassemia.