Affiliations 

  • 1 Department of Community Health, Universiti Putra Malaysia, Serdang, Selangor, Malaysia
  • 2 Department of Electrical & Electronic Engineering, Universiti Putra Malaysia, Serdang, Selangor, Malaysia
PeerJ, 2025;13:e18851.
PMID: 40061226 DOI: 10.7717/peerj.18851

Abstract

BACKGROUND: Leptospirosis is an endemic disease in countries with tropical climates such as South America, Southern Asia, and Southeast Asia. There has been an increase in leptospirosis incidence in Malaysia from 1.45 to 25.94 cases per 100,000 population between 2005 and 2014. With increasing incidence in Selangor, Malaysia, and frequent climate change dynamics, a study on the disease hotspot areas and their association with the hydroclimatic factors could enhance disease surveillance and public health interventions.

METHODS: This ecological cross-sectional study utilised a geographic information system (GIS) and remote sensing techniques to analyse the spatiotemporal distribution of leptospirosis in Selangor from 2011 to 2019. Laboratory-confirmed leptospirosis cases (n = 1,045) were obtained from the Selangor State Health Department. Using ArcGIS Pro, spatial autocorrelation analysis (Moran's I) and Getis-Ord Gi* (hotspot analysis) was conducted to identify hotspots based on the monthly aggregated cases for each subdistrict. Satellite-derived rainfall and land surface temperature (LST) data were acquired from NASA's Giovanni EarthData website and processed into monthly averages. These data were integrated into ArcGIS Pro as thematic layers. Machine learning algorithms, including support vector machine (SVM), Random Forest (RF), and light gradient boosting machine (LGBM) were employed to develop predictive models for leptospirosis hotspot areas. Model performance was then evaluated using cross-validation and metrics such as accuracy, precision, sensitivity, and F1-score.

RESULTS: Moran's I analysis revealed a primarily random distribution of cases across Selangor, with only 20 out of 103 observed having a clustered distribution. Meanwhile, hotspot areas were mainly scattered in subdistricts throughout Selangor with clustering in the central region. Machine learning analysis revealed that the LGBM algorithm had the best performance scores compared to having a cross-validation score of 0.61, a precision score of 0.16, and an F1-score of 0.23. The feature importance score indicated river water level and rainfall contributes most to the model.

CONCLUSIONS: This GIS-based study identified a primarily sporadic occurrence of leptospirosis in Selangor with minimal spatial clustering. The LGBM algorithm effectively predicted leptospirosis hotspots based on the analysed hydroclimatic factors. The integration of GIS and machine learning offers a promising framework for disease surveillance, facilitating targeted public health interventions in areas at high risk for leptospirosis.

* Title and MeSH Headings from MEDLINE®/PubMed®, a database of the U.S. National Library of Medicine.