METHODS: To predict CD while prioritizing patient privacy, our study employed data anonymization involved adding Laplace noise to sensitive features like age and gender. The anonymized dataset underwent analysis using a differential privacy (DP) framework to preserve data privacy. DP ensured confidentiality while extracting insights. Compared with Logistic Regression (LR), Gaussian Naïve Bayes (GNB), and Random Forest (RF), the methodology integrated feature selection, statistical analysis, and SHapley Additive exPlanations (SHAP) and Local Interpretable Model-agnostic Explanations (LIME) for interpretability. This approach facilitates transparent and interpretable AI decision-making, aligning with responsible AI development principles. Overall, it combines privacy preservation, interpretability, and ethical considerations for accurate CD predictions.
RESULTS: Our investigations from the DP framework with LR were promising, with an area under curve (AUC) of 0.848 ± 0.03, an accuracy of 0.797 ± 0.02, precision at 0.789 ± 0.02, recall at 0.797 ± 0.02, and an F1 score of 0.787 ± 0.02, with a comparable performance with the non-privacy framework. The SHAP and LIME based results support clinical findings, show a commitment to transparent and interpretable AI decision-making, and aligns with the principles of responsible AI development.
CONCLUSIONS: Our study endorses a novel approach in predicting CD, amalgamating data anonymization, privacy-preserving methods, interpretability tools SHAP, LIME, and ethical considerations. This responsible AI framework ensures accurate predictions, privacy preservation, and user trust, underscoring the significance of comprehensive and transparent ML models in healthcare. Therefore, this research empowers the ability to forecast CD, providing a vital lifeline to millions of CD patients globally and potentially preventing numerous fatalities.
METHODS: After 10 min of supine rest, the subject was tilted at a 70-degree angle on a tilt table for approximately a total of 35 min. 400 µg of glyceryl trinitrate (GTN) was administered sublingually after the first 20 min and monitoring continued for another 15 min. Mean imputation and K-nearest neighbors (KNN) imputation approaches to handle missing values. Next, feature selection techniques were implemented, including genetic algorithm, recursive feature elimination, and feature importance, to determine the crucial features. The Mann-Whitney U test was then performed to determine the statistical difference between two groups. Patients with VVS are categorized via machine learning models including Support Vector Machine (SVM), Gaussian Naïve Bayes (GNB), Multinomial Naïve Bayes (MNB), KNN, Logistic Regression (LR), and Random Forest (RF). The developed model is interpreted using an explainable artificial intelligence (XAI) model known as partial dependence plot.
RESULTS: A total of 137 subjects aged between 9 and 93 years were recruited for this study, 54 experienced clinical symptoms were considered positive tests, while the remaining 83 tested negative. Optimal results were obtained by combining the KNN imputation technique and three tilting features with SVM with 90.5% accuracy, 87.0% sensitivity, 92.7% specificity, 88.6% precision, 87.8% F1 score, and 95.4% ROC (receiver operating characteristics) AUC (area under curve).
CONCLUSIONS: The proposed algorithm effectively classifies VVS patients with over 90% accuracy. However, the study was confined to a small sample size. More clinical datasets are required to ensure that our approach is generalizable.