Accurate assessments of epidemiological associations between health outcomes and routinely observed proximal and distal determinants of health are fundamental for the execution of effective public health interventions and policies. Methods to couple big public health data with modern statistical techniques offer greater granularity for describing and understanding data quality, disease distributions, and potential predictive connections between population-level indicators with areal-based health outcomes. This study applied clustering techniques to explore patterns of diabetes burden correlated with local socio-economic inequalities in Malaysia, with a goal of better understanding the factors influencing the collation of these clusters. Through multi-modal secondary data sources, district-wise diabetes crude rates from 271,553 individuals with diabetes sampled from 914 primary care clinics throughout Malaysia were computed. Unsupervised machine learning methods using hierarchical clustering to a set of 144 administrative districts was applied. Differences in characteristics of the areas were evaluated using multivariate non-parametric test statistics. Five statistically significant clusters were identified, each reflecting different levels of diabetes burden at the local level, each with contrasting patterns observed under the influence of population-level characteristics. The hierarchical clustering analysis that grouped local diabetes areas with varying socio-economic, demographic, and geographic characteristics offer opportunities to local public health to implement targeted interventions in an attempt to control the local diabetes burden.
* Title and MeSH Headings from MEDLINE®/PubMed®, a database of the U.S. National Library of Medicine.