Risk prediction analysis for classifying type 2 diabetes occurrence using local dataset

M. Hafiz Fazren Abd Rahman; Wan Wardatul Amani Wan Salim; M. Firdaus Abd-Wahab

The steep rise of cases pertaining to Diabetes Mellitus (DM) condition among global population has encouraged extensive researches on DM, which led to exhaustive accumulation of data related to DM. In this case, data mining and machine learning applications prove to be a powerful tool in transforming data into meaningful deductions. Several machine learning tools have shown great promise in diabetes classification. However, challenges remain in obtaining an accurate model suitable for real world application. Most disease risk-prediction modelling are found to be specific to a local population. Moreover, real-world data are likely to be complex, incomplete and unorganized, thus, convoluting efforts to develop models around it. This research aims to develop a robust prediction model for classification of type 2 diabetes mellitus (T2DM), with the interest of a Malaysian population, using three different machine learning algorithms; Decision Tree, Support Vector Machine and Naïve Bayes. Data pre-processing methods are utilised to the raw data to improve model performance. This study uses datasets obtained from the IIUM Medical Centre for classification and modelling. Ultimately, the performance of each model is validated, evaluated and compared based on several statistical metrics that measures accuracy, precision, sensitivity and efficiency. This study shows that the random forest model provides the best overall prediction performance in terms of accuracy (0.87), sensitivity (0.9), specificity (0.8), precision (0.9), F1-score (0.9) and AUC value (0.93) (Normal).

Risk prediction analysis for classifying type 2 diabetes occurrence using local dataset

Affiliations

Abstract