Objective: To explore the predictive value of machine learning in cognitive impairment, and identify important factors for cognitive impairment. Methods: A total of 2,326 middle-aged and elderly people completed questionnaire, and physical examination evaluation at baseline, Year 2, and Year 4 follow-ups. A random forest machine learning (ML) model was used to predict the cognitive impairment at Year 2 and Year 4 longitudinally. Based on Year 4 cross-sectional data, the same method was applied to establish a prediction model and verify its longitudinal prediction accuracy for cognitive impairment. Meanwhile, the ability of random forest and traditional logistic regression model to longitudinally predict 2-year and 4-year cognitive impairment was compared. Results: Random forest models showed high accuracy for all outcomes at Year 2, Year 4, and cross-sectional Year 4 [AUC = 0.81, 0.79, 0.80] compared with logistic regression [AUC = 0.61, 0.62, 0.70]. Baseline physical examination (e.g., BMI, Blood pressure), biomarkers (e.g., cholesterol), functioning (e.g., functional limitations), demography (e.g., age), and emotional status (e.g., depression) characteristics were identified as the top ten important predictors of cognitive impairment. Conclusion: ML algorithms could enhance the prediction of cognitive impairment among the middle-aged and older Chinese for 4 years and identify essential risk markers.
* Title and MeSH Headings from MEDLINE®/PubMed®, a database of the U.S. National Library of Medicine.