RESULTS: Our models learned several syntactic, lexical, and n-gram linguistic biomarkers to distinguish the probable AD group from the healthy group. In contrast to the healthy group, we found that the probable AD patients had significantly less usage of syntactic components and significantly higher usage of lexical components in their language. Also, we observed a significant difference in the use of n-grams as the healthy group were able to identify and make sense of more objects in their n-grams than the probable AD group. As such, our best diagnostic model significantly distinguished the probable AD group from the healthy elderly group with a better Area Under the Receiving Operating Characteristics Curve (AUC) using the Support Vector Machines (SVM).
CONCLUSIONS: Experimental and statistical evaluations suggest that using ML algorithms for learning linguistic biomarkers from the verbal utterances of elderly individuals could help the clinical diagnosis of probable AD. We emphasise that the best ML model for predicting the disease group combines significant syntactic, lexical and top n-gram features. However, there is a need to train the diagnostic models on larger datasets, which could lead to a better AUC and clinical diagnosis of probable AD.
METHODS: A search of four databases was conducted: Web of Science, PubMed, Dimensions, and Scopus for research papers dated between January 2016 and September 2021. The search keywords are 'data mining', 'machine learning' in combination with 'suicidal behaviour', 'suicide', 'suicide attempt', 'suicidal ideation', 'suicide plan' and 'self-harm'. The studies that used machine learning techniques were synthesized according to the countries of the articles, sample description, sample size, classification tasks, number of features used to develop the models, types of machine learning techniques, and evaluation of performance metrics.
RESULTS: Thirty-five empirical articles met the criteria to be included in the current review. We provide a general overview of machine learning techniques, examine the feature categories, describe methodological challenges, and suggest areas for improvement and research directions. Ensemble prediction models have been shown to be more accurate and useful than single prediction models.
CONCLUSIONS: Machine learning has great potential for improving estimates of future suicidal behaviour and monitoring changes in risk over time. Further research can address important challenges and potential opportunities that may contribute to significant advances in suicide prediction.