Methods: Manual sample size calculation using Microsoft Excel software and sample size tables were tabulated based on a single coefficient alpha and the comparison of two coefficients alpha.
Results: For a single coefficient alpha test, the approach by assuming the Cronbach's alpha coefficient equals to zero in the null hypothesis will yield a smaller sample size of less than 30 to achieve a minimum desired effect size of 0.7. However, setting the coefficient of Cronbach's alpha larger than zero in the null hypothesis could be necessary and this will yield larger sample size. For comparison of two coefficients of Cronbach's alpha, a larger sample size is needed when testing for smaller effect sizes.
Conclusions: In the assessment of the internal consistency of an instrument, the present study proposed the Cronbach's alpha's coefficient to be set at 0.5 in the null hypothesis and hence larger sample size is needed. For comparison of two coefficients' of Cronbach's alpha, justification is needed whether testing for extremely low and extremely large effect sizes are scientifically necessary.
PATIENTS AND METHODS: The dataset encompassed patient data from a tertiary cardiothoracic center in Malaysia between 2011 and 2015, sourced from electronic health records. Extensive preprocessing and feature selection ensured data quality and relevance. Four machine learning algorithms were applied: Logistic Regression, Gradient Boosted Trees, Support Vector Machine, and Random Forest. The dataset was split into training and validation sets and the hyperparameters were tuned. Accuracy, Area Under the ROC Curve (AUC), precision, F-measure, sensitivity, and specificity were some of the evaluation criteria. Ethical guidelines for data use and patient privacy were rigorously followed throughout the study.
RESULTS: With the highest accuracy (88.66%), AUC (94.61%), and sensitivity (91.30%), Gradient Boosted Trees emerged as the top performance. Random Forest displayed strong AUC (94.78%) and accuracy (87.39%). In contrast, the Support Vector Machine showed higher sensitivity (98.57%) with lower specificity (59.55%), but lower accuracy (79.02%) and precision (70.81%). Sensitivity (87.70%) and specificity (87.05%) were maintained in balance via Logistic Regression.
CONCLUSION: These findings imply that Gradient Boosted Trees and Random Forest might be an effective method for identifying patients who would develop AKI following heart surgery. However specific goals, sensitivity/specificity trade-offs, and consideration of the practical ramifications should all be considered when choosing an algorithm.
METHODS: This study analyzed death records from January 2017 to June 2022, sourced from Malaysia's Health Informatics Centre, coded into ICD-10. Data anonymization adhered to ethical standards, with 387,650 death registrations included after quality checks. The dataset, limited to three-digit ICD-10 codes, underwent cleaning and an 80:20 training-testing split. Preprocessing involved HTML tag removal and tokenization. ML approaches, including BERT (Bidirectional Encoder Representations from Transformers), Gzip+KNN (K-Nearest Neighbors), XGBoost (Extreme Gradient Boosting), TensorFlow, SVM (Support Vector Machine), and Naive Bayes, were evaluated for automated ICD-10 coding. Models were fine-tuned and assessed across accuracy, F1-score, precision, recall, specificity, and precision-recall curves using Amazon SageMaker (Amazon Web Services, Seattle, WA). Sensitivity analysis addressed unbalanced data scenarios, enhancing model robustness.
RESULTS: In assessing ICD-10 coding with ML, Gzip+KNN had the longest training time at 10 hours, with BERT leading in memory use. BERT performed best for the F1-score (0.71) and accuracy (0.82), closely followed by Gzip+KNN. TensorFlow excelled in recall, whereas SVM had the highest specificity but lower overall performance. XGBoost was notably less effective across metrics. Precision-recall analysis showed Gzip+KNN's superiority. On an unbalanced dataset, BERT and Gzip+KNN demonstrated consistent accuracy.
CONCLUSION: Our study highlights that BERT and Gzip+KNN optimize ICD-10 coding, balancing efficiency, resource use, and accuracy. BERT excels in precision with higher memory demands, while Gzip+KNN offers robust accuracy and recall. This suggests significant potential for improving healthcare analytics and decision-making through advanced ML models.