Methods: Manual sample size calculation using Microsoft Excel software and sample size tables were tabulated based on a single coefficient alpha and the comparison of two coefficients alpha.
Results: For a single coefficient alpha test, the approach by assuming the Cronbach's alpha coefficient equals to zero in the null hypothesis will yield a smaller sample size of less than 30 to achieve a minimum desired effect size of 0.7. However, setting the coefficient of Cronbach's alpha larger than zero in the null hypothesis could be necessary and this will yield larger sample size. For comparison of two coefficients of Cronbach's alpha, a larger sample size is needed when testing for smaller effect sizes.
Conclusions: In the assessment of the internal consistency of an instrument, the present study proposed the Cronbach's alpha's coefficient to be set at 0.5 in the null hypothesis and hence larger sample size is needed. For comparison of two coefficients' of Cronbach's alpha, justification is needed whether testing for extremely low and extremely large effect sizes are scientifically necessary.
PATIENTS AND METHODS: The dataset encompassed patient data from a tertiary cardiothoracic center in Malaysia between 2011 and 2015, sourced from electronic health records. Extensive preprocessing and feature selection ensured data quality and relevance. Four machine learning algorithms were applied: Logistic Regression, Gradient Boosted Trees, Support Vector Machine, and Random Forest. The dataset was split into training and validation sets and the hyperparameters were tuned. Accuracy, Area Under the ROC Curve (AUC), precision, F-measure, sensitivity, and specificity were some of the evaluation criteria. Ethical guidelines for data use and patient privacy were rigorously followed throughout the study.
RESULTS: With the highest accuracy (88.66%), AUC (94.61%), and sensitivity (91.30%), Gradient Boosted Trees emerged as the top performance. Random Forest displayed strong AUC (94.78%) and accuracy (87.39%). In contrast, the Support Vector Machine showed higher sensitivity (98.57%) with lower specificity (59.55%), but lower accuracy (79.02%) and precision (70.81%). Sensitivity (87.70%) and specificity (87.05%) were maintained in balance via Logistic Regression.
CONCLUSION: These findings imply that Gradient Boosted Trees and Random Forest might be an effective method for identifying patients who would develop AKI following heart surgery. However specific goals, sensitivity/specificity trade-offs, and consideration of the practical ramifications should all be considered when choosing an algorithm.