Utilizing machine learning techniques to predict the blood-brain barrier permeability of compounds detected using LCQTOF-MS in Malaysian Kelulut honey

Edros R; Feng TW; Dong RH

doi:10.1080/1062936X.2023.2230868

Utilizing machine learning techniques to predict the blood-brain barrier permeability of compounds detected using LCQTOF-MS in Malaysian Kelulut honey

Edros R ¹ , Feng TW ¹ , Dong RH ²

Affiliations

¹ Faculty of Chemical and Process Engineering Technology, Universiti Malaysia Pahang, Gambang, Malaysia
² The Insight Centre for Data Analytics, School of Computer Science, University College Dublin, Dublin, Ireland

SAR QSAR Environ Res, 2023;34(6):475-500.

PMID: 37409842 DOI: 10.1080/1062936X.2023.2230868

Abstract

Current in silico modelling techniques, such as molecular dynamics, typically focus on compounds with the highest concentration from chromatographic analyses for bioactivity screening. Consequently, they reduce the need for labour-intensive in vitro studies but limit the utilization of extensive chromatographic data and molecular diversity for compound classification. Compound permeability across the blood-brain barrier (BBB) is a key concern in central nervous system (CNS) drug development, and this limitation can be addressed by applying cheminformatics with codeless machine learning (ML). Among the four models developed in this study, the Random Forest (RF) algorithm with the most robust performance in both internal and external validation was selected for model construction, with an accuracy (ACC) of 87.5% and 86.9% and area under the curve (AUC) of 0.907 and 0.726, respectively. The RF model was deployed to classify 285 compounds detected using liquid chromatography quadrupole time-of-flight mass spectrometry (LCQTOF-MS) in Kelulut honey; of which, 140 compounds were screened with 94 descriptors. Seventeen compounds were predicted to permeate the BBB, revealing their potential as drugs for treating neurodegenerative diseases. Our results highlight the importance of employing ML pattern recognition to identify compounds with neuroprotective potential from the entire pool of chromatographic data.

* Title and MeSH Headings from MEDLINE®/PubMed®, a database of the U.S. National Library of Medicine.