Predicting dengue transmission rates by comparing different machine learning models with vector indices and meteorological data

Ong SQ; Isawasan P; Ngesom AMM; Shahar H; Lasim AM; Nair G

doi:10.1038/s41598-023-46342-2

Fulltext

Predicting dengue transmission rates by comparing different machine learning models with vector indices and meteorological data

Ong SQ ¹ , Isawasan P ² , Ngesom AMM ³ , Shahar H ⁴ , Lasim AM ⁵ , Nair G ⁶

Affiliations

¹ Entomology Laboratory, Institute for Tropical Biology and Conservation, Universiti Malaysia Sabah, Jalan UMS, 88400, Kota Kinabalu, Sabah, Malaysia. songquan.ong@ums.edu.my
² Faculty of Computer and Mathematical Sciences, Universiti Teknologi MARA, Perak Branch, Tapah Campus, 35400, Tapah, Malaysia
³ Centre for Communicable Diseases Research, Institute for Public Health, National Institutes of Health, Ministry of Health, Shah Alam, Malaysia
⁴ Entomology and Pest Unit, Federal Territory of Kuala Lumpur and Putrajaya Health Department, Jalan Cenderasari, 50590, Kuala Lumpur, Malaysia
⁵ Phytochemistry Unit, Herbal Medicine Research Centre, Institute for Medical Research, National Health Institute, Setia Alam, Malaysia
⁶ School of Electrical and Electronics Engineering, Universiti Sains Malaysia, Penang, Malaysia

Sci Rep, 2023 Nov 05;13(1):19129.

PMID: 37926755 DOI: 10.1038/s41598-023-46342-2

Abstract

Machine learning algorithms (ML) are receiving a lot of attention in the development of predictive models for monitoring dengue transmission rates. Previous work has focused only on specific weather variables and algorithms, and there is still a need for a model that uses more variables and algorithms that have higher performance. In this study, we use vector indices and meteorological data as predictors to develop the ML models. We trained and validated seven ML algorithms, including an ensemble ML method, and compared their performance using the receiver operating characteristic (ROC) with the area under the curve (AUC), accuracy and F1 score. Our results show that an ensemble ML such as XG Boost, AdaBoost and Random Forest perform better than the logistics regression, Naïve Bayens, decision tree, and support vector machine (SVM), with XGBoost having the highest AUC, accuracy and F1 score. Analysis of the importance of the variables showed that the container index was the least important. By removing this variable, the ML models improved their performance by at least 6% in AUC and F1 score. Our result provides a framework for future studies on the use of predictive models in the development of an early warning system.

* Title and MeSH Headings from MEDLINE®/PubMed®, a database of the U.S. National Library of Medicine.

MeSH terms

Similar publications