MyMedR

Displaying publications 41 - 60 of 325 in total

Abstract:

Sort:

Current strategies to address data scarcity in artificial intelligence-based drug discovery: A comprehensive review

Gangwal A, Ansari A, Ahmad I, Azad AK, Wan Sulaiman WMA

Comput Biol Med, 2024 Sep;179:108734.
PMID: 38964243 DOI: 10.1016/j.compbiomed.2024.108734

Artificial intelligence (AI) has played a vital role in computer-aided drug design (CADD). This development has been further accelerated with the increasing use of machine learning (ML), mainly deep learning (DL), and computing hardware and software advancements. As a result, initial doubts about the application of AI in drug discovery have been dispelled, leading to significant benefits in medicinal chemistry. At the same time, it is crucial to recognize that AI is still in its infancy and faces a few limitations that need to be addressed to harness its full potential in drug discovery. Some notable limitations are insufficient, unlabeled, and non-uniform data, the resemblance of some AI-generated molecules with existing molecules, unavailability of inadequate benchmarks, intellectual property rights (IPRs) related hurdles in data sharing, poor understanding of biology, focus on proxy data and ligands, lack of holistic methods to represent input (molecular structures) to prevent pre-processing of input molecules (feature engineering), etc. The major component in AI infrastructure is input data, as most of the successes of AI-driven efforts to improve drug discovery depend on the quality and quantity of data, used to train and test AI algorithms, besides a few other factors. Additionally, data-gulping DL approaches, without sufficient data, may collapse to live up to their promise. Current literature suggests a few methods, to certain extent, effectively handle low data for better output from the AI models in the context of drug discovery. These are transferring learning (TL), active learning (AL), single or one-shot learning (OSL), multi-task learning (MTL), data augmentation (DA), data synthesis (DS), etc. One different method, which enables sharing of proprietary data on a common platform (without compromising data privacy) to train ML model, is federated learning (FL). In this review, we compare and discuss these methods, their recent applications, and limitations while modeling small molecule data to get the improved output of AI methods in drug discovery. Article also sums up some other novel methods to handle inadequate data.

Matched MeSH terms: Machine Learning
Hierarchical gated recurrent neural network with adversarial and virtual adversarial training on text classification

Poon HK, Yap WS, Tee YK, Lee WK, Goi BM

Neural Netw, 2019 Nov;119:299-312.
PMID: 31499354 DOI: 10.1016/j.neunet.2019.08.017

Document classification aims to assign one or more classes to a document for ease of management by understanding the content of a document. Hierarchical attention network (HAN) has been showed effective to classify documents that are ambiguous. HAN parses information-intense documents into slices (i.e., words and sentences) such that each slice can be learned separately and in parallel before assigning the classes. However, introducing hierarchical attention approach leads to the redundancy of training parameters which is prone to overfitting. To mitigate the concern of overfitting, we propose a variant of hierarchical attention network using adversarial and virtual adversarial perturbations in 1) word representation, 2) sentence representation and 3) both word and sentence representations. The proposed variant is tested on eight publicly available datasets. The results show that the proposed variant outperforms the hierarchical attention network with and without using random perturbation. More importantly, the proposed variant achieves state-of-the-art performance on multiple benchmark datasets. Visualizations and analysis are provided to show that perturbation can effectively alleviate the overfitting issue and improve the performance of hierarchical attention network.

Matched MeSH terms: Machine Learning*
Fulltext DeepMoney: counterfeit money detection using generative adversarial networks

Ali T, Jan S, Alkhodre A, Nauman M, Amin M, Siddiqui MS

PeerJ Comput Sci, 2019;5:e216.
PMID: 33816869 DOI: 10.7717/peerj-cs.216

Conventional paper currency and modern electronic currency are two important modes of transactions. In several parts of the world, conventional methodology has clear precedence over its electronic counterpart. However, the identification of forged currency paper notes is now becoming an increasingly crucial problem because of the new and improved tactics employed by counterfeiters. In this paper, a machine assisted system-dubbed DeepMoney-is proposed which has been developed to discriminate fake notes from genuine ones. For this purpose, state-of-the-art models of machine learning called Generative Adversarial Networks (GANs) are employed. GANs use unsupervised learning to train a model that can then be used to perform supervised predictions. This flexibility provides the best of both worlds by allowing unlabelled data to be trained on whilst still making concrete predictions. This technique was applied to Pakistani banknotes. State-of-the-art image processing and feature recognition techniques were used to design the overall approach of a valid input. Augmented samples of images were used in the experiments which show that a high-precision machine can be developed to recognize genuine paper money. An accuracy of 80% has been achieved. The code is available as an open source to allow others to reproduce and build upon the efforts already made.

Matched MeSH terms: Machine Learning; Unsupervised Machine Learning
Fulltext An Improved Vulnerability Exploitation Prediction Model with Novel Cost Function and Custom Trained Word Vector Embedding

Hoque MS, Jamil N, Amin N, Lam KY

Sensors (Basel), 2021 Jun 20;21(12).
PMID: 34202977 DOI: 10.3390/s21124220

Successful cyber-attacks are caused by the exploitation of some vulnerabilities in the software and/or hardware that exist in systems deployed in premises or the cloud. Although hundreds of vulnerabilities are discovered every year, only a small fraction of them actually become exploited, thereby there exists a severe class imbalance between the number of exploited and non-exploited vulnerabilities. The open source national vulnerability database, the largest repository to index and maintain all known vulnerabilities, assigns a unique identifier to each vulnerability. Each registered vulnerability also gets a severity score based on the impact it might inflict upon if compromised. Recent research works showed that the cvss score is not the only factor to select a vulnerability for exploitation, and other attributes in the national vulnerability database can be effectively utilized as predictive feature to predict the most exploitable vulnerabilities. Since cybersecurity management is highly resource savvy, organizations such as cloud systems will benefit when the most likely exploitable vulnerabilities that exist in their system software or hardware can be predicted with as much accuracy and reliability as possible, to best utilize the available resources to fix those first. Various existing research works have developed vulnerability exploitation prediction models by addressing the existing class imbalance based on algorithmic and artificial data resampling techniques but still suffer greatly from the overfitting problem to the major class rendering them practically unreliable. In this research, we have designed a novel cost function feature to address the existing class imbalance. We also have utilized the available large text corpus in the extracted dataset to develop a custom-trained word vector that can better capture the context of the local text data for utilization as an embedded layer in neural networks. Our developed vulnerability exploitation prediction models powered by a novel cost function and custom-trained word vector have achieved very high overall performance metrics for accuracy, precision, recall, F1-Score and AUC score with values of 0.92, 0.89, 0.98, 0.94 and 0.97, respectively, thereby outperforming any existing models while successfully overcoming the existing overfitting problem for class imbalance.

Matched MeSH terms: Machine Learning*
Fulltext Ridge regression and its applications in genetic studies

Arashi M, Roozbeh M, Hamzah NA, Gasparini M

PLoS One, 2021;16(4):e0245376.
PMID: 33831027 DOI: 10.1371/journal.pone.0245376

With the advancement of technology, analysis of large-scale data of gene expression is feasible and has become very popular in the era of machine learning. This paper develops an improved ridge approach for the genome regression modeling. When multicollinearity exists in the data set with outliers, we consider a robust ridge estimator, namely the rank ridge regression estimator, for parameter estimation and prediction. On the other hand, the efficiency of the rank ridge regression estimator is highly dependent on the ridge parameter. In general, it is difficult to provide a satisfactory answer about the selection for the ridge parameter. Because of the good properties of generalized cross validation (GCV) and its simplicity, we use it to choose the optimum value of the ridge parameter. The GCV function creates a balance between the precision of the estimators and the bias caused by the ridge estimation. It behaves like an improved estimator of risk and can be used when the number of explanatory variables is larger than the sample size in high-dimensional problems. Finally, some numerical illustrations are given to support our findings.

Matched MeSH terms: Machine Learning*
A survey of recently emerged genome-wide computational enhancer predictor tools

Lim LWK, Chung HH, Chong YL, Lee NK

Comput Biol Chem, 2018 Jun;74:132-141.
PMID: 29602043 DOI: 10.1016/j.compbiolchem.2018.03.019

The race for the discovery of enhancers at a genome-wide scale has been on since the commencement of next generation sequencing decades after the discovery of the first enhancer, SV40. A few enhancer-predicting features such as chromatin feature, histone modifications and sequence feature had been implemented with varying success rates. However, to date, there is no consensus yet on the single enhancer marker that can be employed to ultimately distinguish and uncover enhancers from the enormous genomic regions. Many supervised, unsupervised and semi-supervised computational approaches had emerged to complement and facilitate experimental approaches in enhancer discovery. In this review, we placed our focus on the recently emerged enhancer predictor tools that work on general enhancer features such as sequences, chromatin states and histone modifications, eRNA and of multiple feature approach. Comparisons of their prediction methods and outcomes were done across their functionally similar counterparts. We provide some recommendations and insights for future development of more comprehensive and robust tools.

Matched MeSH terms: Machine Learning*
Fulltext An improved adaptive memetic differential evolution optimization algorithms for data clustering problems

Mustafa HMJ, Ayob M, Nazri MZA, Kendall G

PLoS One, 2019;14(5):e0216906.
PMID: 31137034 DOI: 10.1371/journal.pone.0216906

The performance of data clustering algorithms is mainly dependent on their ability to balance between the exploration and exploitation of the search process. Although some data clustering algorithms have achieved reasonable quality solutions for some datasets, their performance across real-life datasets could be improved. This paper proposes an adaptive memetic differential evolution optimisation algorithm (AMADE) for addressing data clustering problems. The memetic algorithm (MA) employs an adaptive differential evolution (DE) mutation strategy, which can offer superior mutation performance across many combinatorial and continuous problem domains. By hybridising an adaptive DE mutation operator with the MA, we propose that it can lead to faster convergence and better balance the exploration and exploitation of the search. We would also expect that the performance of AMADE to be better than MA and DE if executed separately. Our experimental results, based on several real-life benchmark datasets, shows that AMADE outperformed other compared clustering algorithms when compared using statistical analysis. We conclude that the hybridisation of MA and the adaptive DE is a suitable approach for addressing data clustering problems and can improve the balance between global exploration and local exploitation of the optimisation algorithm.

Matched MeSH terms: Machine Learning*
Multivariate relationship modeling using nested fuzzy cognitive map

Motlagh O, Papageorgiou E, Tang S, Zamberi Jamaludin

Sains Malaysiana, 2014;43:1781-1790.

Soft computing is an alternative to hard and classic math models especially when it comes to uncertain and incomplete data. This includes regression and relationship modeling of highly interrelated variables with applications in curve fitting, interpolation, classification, supervised learning, generalization, unsupervised learning and forecast. Fuzzy cognitive map (FCM) is a recurrent neural structure that encompasses all possible connections including relationships among inputs, inputs to outputs and feedbacks. This article examines a new methods for nonlinear multivariate regression using fuzzy cognitive map. The main contribution is the application of nested FCM structure to define edge weights in form of meaningful functions rather than crisp values. There are example cases in this article which serve as a platform to modelling even more complex engineering systems. The obtained results, analysis and comparison with similar techniques are included to show the robustness and accuracy of the developed method in multivariate regression, along with future lines of research.

Matched MeSH terms: Supervised Machine Learning; Unsupervised Machine Learning
Fulltext A static analysis approach for Android permission-based malware detection systems

Mohamad Arif J, Ab Razak MF, Awang S, Tuan Mat SR, Ismail NSN, Firdaus A

PLoS One, 2021;16(9):e0257968.
PMID: 34591930 DOI: 10.1371/journal.pone.0257968

The evolution of malware is causing mobile devices to crash with increasing frequency. Therefore, adequate security evaluations that detect Android malware are crucial. Two techniques can be used in this regard: Static analysis, which meticulously examines the full codes of applications, and dynamic analysis, which monitors malware behaviour. While both perform security evaluations successfully, there is still room for improvement. The goal of this research is to examine the effectiveness of static analysis to detect Android malware by using permission-based features. This study proposes machine learning with different sets of classifiers was used to evaluate Android malware detection. The feature selection method in this study was applied to determine which features were most capable of distinguishing malware. A total of 5,000 Drebin malware samples and 5,000 Androzoo benign samples were utilised. The performances of the different sets of classifiers were then compared. The results indicated that with a TPR value of 91.6%, the Random Forest algorithm achieved the highest level of accuracy in malware detection.

Matched MeSH terms: Machine Learning*
Fulltext Machine learning assisted hepta band THz metamaterial absorber for biomedical applications

Jain P, Chhabra H, Chauhan U, Prakash K, Gupta A, Soliman MS, et al.

Sci Rep, 2023 Jan 31;13(1):1792.
PMID: 36720922 DOI: 10.1038/s41598-023-29024-x

A hepta-band terahertz metamaterial absorber (MMA) with modified dual T-shaped resonators deposited on polyimide is presented for sensing applications. The proposed polarization sensitive MMA is ultra-thin (0.061 λ) and compact (0.21 λ) at its lowest operational frequency, with multiple absorption peaks at 1.89, 4.15, 5.32, 5.84, 7.04, 8.02, and 8.13 THz. The impedance matching theory and electric field distribution are investigated to understand the physical mechanism of hepta-band absorption. The sensing functionality is evaluated using a surrounding medium with a refractive index between 1 and 1.1, resulting in good Quality factor (Q) value of 117. The proposed sensor has the highest sensitivity of 4.72 THz/RIU for glucose detection. Extreme randomized tree (ERT) model is utilized to predict absorptivities for intermediate frequencies with unit cell dimensions, substrate thickness, angle variation, and refractive index values to reduce simulation time. The effectiveness of the ERT model in predicting absorption values is evaluated using the Adjusted R2 score, which is close to 1.0 for nmin = 2, demonstrating the prediction efficiency in various test cases. The experimental results show that 60% of simulation time and resources can be saved by simulating absorber design using the ERT model. The proposed MMA sensor with an ERT model has potential applications in biomedical fields such as bacterial infections, malaria, and other diseases.

Matched MeSH terms: Machine Learning*
An intrusion detection model to detect zero-day attacks in unseen data using machine learning

Dai Z, Por LY, Chen YL, Yang J, Ku CS, Alizadehsani R, et al.

PLoS One, 2024;19(9):e0308469.
PMID: 39259729 DOI: 10.1371/journal.pone.0308469

In an era marked by pervasive digital connectivity, cybersecurity concerns have escalated. The rapid evolution of technology has led to a spectrum of cyber threats, including sophisticated zero-day attacks. This research addresses the challenge of existing intrusion detection systems in identifying zero-day attacks using the CIC-MalMem-2022 dataset and autoencoders for anomaly detection. The trained autoencoder is integrated with XGBoost and Random Forest, resulting in the models XGBoost-AE and Random Forest-AE. The study demonstrates that incorporating an anomaly detector into traditional models significantly enhances performance. The Random Forest-AE model achieved 100% accuracy, precision, recall, F1 score, and Matthews Correlation Coefficient (MCC), outperforming the methods proposed by Balasubramanian et al., Khan, Mezina et al., Smith et al., and Dener et al. When tested on unseen data, the Random Forest-AE model achieved an accuracy of 99.9892%, precision of 100%, recall of 99.9803%, F1 score of 99.9901%, and MCC of 99.8313%. This research highlights the effectiveness of the proposed model in maintaining high accuracy even with previously unseen data.

Matched MeSH terms: Machine Learning*
Fulltext Local interpretable model-agnostic explanation approach for medical imaging analysis: A systematic literature review

Hassan SU, Abdulkadir SJ, Zahid MSM, Al-Selwi SM

Comput Biol Med, 2025 Feb;185:109569.
PMID: 39705792 DOI: 10.1016/j.compbiomed.2024.109569

BACKGROUND: The interpretability and explainability of machine learning (ML) and artificial intelligence systems are critical for generating trust in their outcomes in fields such as medicine and healthcare. Errors generated by these systems, such as inaccurate diagnoses or treatments, can have serious and even life-threatening effects on patients. Explainable Artificial Intelligence (XAI) is emerging as an increasingly significant area of research nowadays, focusing on the black-box aspect of sophisticated and difficult-to-interpret ML algorithms. XAI techniques such as Local Interpretable Model-Agnostic Explanations (LIME) can give explanations for these models, raising confidence in the systems and improving trust in their predictions. Numerous works have been published that respond to medical problems through the use of ML models in conjunction with XAI algorithms to give interpretability and explainability. The primary objective of the study is to evaluate the performance of the newly emerging LIME techniques within healthcare domains that require more attention in the realm of XAI research.
METHOD: A systematic search was conducted in numerous databases (Scopus, Web of Science, IEEE Xplore, ScienceDirect, MDPI, and PubMed) that identified 1614 peer-reviewed articles published between 2019 and 2023.
RESULTS: 52 articles were selected for detailed analysis that showed a growing trend in the application of LIME techniques in healthcare, with significant improvements in the interpretability of ML models used for diagnostic and prognostic purposes.
CONCLUSION: The findings suggest that the integration of XAI techniques, particularly LIME, enhances the transparency and trustworthiness of AI systems in healthcare, thereby potentially improving patient outcomes and fostering greater acceptance of AI-driven solutions among medical professionals.

Matched MeSH terms: Machine Learning*
Fulltext Prediction of Poisson's ratio for a petroleum engineering application: Machine learning methods

Alakbari FS, Mahmood SM, Ayoub MA, Khan MJ, Afolabi F, Mohyaldinn ME, et al.

PLoS One, 2025;20(2):e0317754.
PMID: 39982951 DOI: 10.1371/journal.pone.0317754

Static Poisson's ratio (νs) is an essential property used in petroleum calculations, namely fracture pressure (FP). The νs is often determined in the laboratory; however, due to time and cost constraints, quicker and cheaper alternatives are sought, such as data-driven models. However, existing methods lack the accuracy needed for critical applications, necessitating the need to explore more accurate methods. In addition, the previous studies used limited datasets and they do not show the relationships between the inputs and output. Therefore, this study developed a reliable model to predict the νs accurately using the nineteen most common learning methods. The proposed models were created based on a large data of 1691 datasets from different countries. The best-performing model of the nineteen models was selected and further enhanced using various approaches such as trend analysis to improve the model's performance and robustness as some models show high accuracy but show incorrect relationships between the inputs and output because the machine learning model only built based on the data and do not consider the physical behavior of the model. The proposed Gaussian process regression (GPR) model was also compared with published models. After the proposed GPR model was developed, the FP was determined based on the proposed GPR νs model and the previous νs models to evaluate their accuracy on the FP determinations. The best approach out of the published and proposed methods was GPR with a coefficient of determination (R2) and average-absolute-percentage-relative-error (AAPRE) of 0.95 and 2.73%. The GPR model showed proper trends for all inputs. The cross-plotting and group error analyses also confirmed that the proposed GPR approach had high precision and surpassed other methods within all practical ranges. The GPR model decreased the residual error of FP from 87% to 26%. It is believed that such a significant improvement in the accuracy of the GPR model will have a significant effect on realistic FP determination.

Matched MeSH terms: Machine Learning*
Fulltext Identification of Diagnostic Schizophrenia Biomarkers Based on the Assessment of Immune and Systemic Inflammation Parameters Using Machine Learning Modeling

Malashenkova IK, Krynskiy SA, Ogurtsov DP, Khailov NA, Druzhinina PV, Bernstein AV, et al.

Sovrem Tekhnologii Med, 2023;15(6):5-12.
PMID: 39944368 DOI: 10.17691/stm2023.15.6.01

Disorders of systemic immunity and immune processes in the brain have now been shown to play an essential role in the development and progression of schizophrenia. Nevertheless, only a few works were devoted to the study of some immune parameters to objectify the diagnosis by means of machine learning. At the same time, machine learning methods have not yet been applied to a set of data fully reflecting systemic characteristics of the immune status (parameters of adaptive immunity, the level of inflammatory markers, the content of major cytokines). Considering a complex nature of immune system disorders in schizophrenia, incorporation of a broad panel of immunological data into machine learning models is promising for improving classification accuracy and identifying the parameters reflecting the immune disorders typical for the majority of patients. The aim of the study is to assess the possibility of using immunological parameters to objectify the diagnosis of schizophrenia applying machine learning models.
MATERIALS AND METHODS: We have analyzed 17 immunological parameters in 63 schizophrenia patients and 36 healthy volunteers. The parameters of humoral immunity, systemic level of the key cytokines of adaptive immunity, anti-inflammatory and pro-inflammatory cytokines, and other inflammatory markers were determined by enzyme immunoassay. Applied methods of machine learning covered the main group of approaches to supervised learning such as linear models (logistic regression), quadratic discriminant analysis (QDA), support vector machine (linear SVM, RBF SVM), k-nearest neighbors algorithm, Gaussian processes, naive Bayes classifier, decision trees, and ensemble models (AdaBoost, random forest, XGBoost). The importance of features for prediction from the best fold has been analyzed for the machine learning methods, which demonstrated the best quality. The most significant features were selected using 70% quantile threshold.
RESULTS: The AdaBoost ensemble model with ROC AUC of 0.71±0.15 and average accuracy (ACC) of 0.78±0.11 has demonstrated the best quality on a 10-fold cross validation test sample. Within the frameworks of the present investigation, the AdaBoost model has shown a good quality of classification between the patients with schizophrenia and healthy volunteers (ROC AUC over 0.70) at a high stability of the results (σ less than 0.2). The most important immunological parameters have been established for differentiation between the patients and healthy volunteers: the level of some systemic inflammatory markers, activation of humoral immunity, pro-inflammatory cytokines, immunoregulatory cytokines and proteins, Th1 and Th2 immunity cytokines. It was for the first time that the possibility of differentiating schizophrenia patients from healthy volunteers was shown with the accuracy of more than 70% with the help of machine learning using only immune parameters.The results of this investigation confirm a high importance of the immune system in the pathogenesis of schizophrenia.

Matched MeSH terms: Machine Learning*
Fulltext Multi-population Black Hole Algorithm for the problem of data clustering

Salih SQ, Alsewari AA, Wahab HA, Mohammed MKA, Rashid TA, Das D, et al.

PLoS One, 2023;18(7):e0288044.
PMID: 37406006 DOI: 10.1371/journal.pone.0288044

The retrieval of important information from a dataset requires applying a special data mining technique known as data clustering (DC). DC classifies similar objects into a groups of similar characteristics. Clustering involves grouping the data around k-cluster centres that typically are selected randomly. Recently, the issues behind DC have called for a search for an alternative solution. Recently, a nature-based optimization algorithm named Black Hole Algorithm (BHA) was developed to address the several well-known optimization problems. The BHA is a metaheuristic (population-based) that mimics the event around the natural phenomena of black holes, whereby an individual star represents the potential solutions revolving around the solution space. The original BHA algorithm showed better performance compared to other algorithms when applied to a benchmark dataset, despite its poor exploration capability. Hence, this paper presents a multi-population version of BHA as a generalization of the BHA called MBHA wherein the performance of the algorithm is not dependent on the best-found solution but a set of generated best solutions. The method formulated was subjected to testing using a set of nine widespread and popular benchmark test functions. The ensuing experimental outcomes indicated the highly precise results generated by the method compared to BHA and comparable algorithms in the study, as well as excellent robustness. Furthermore, the proposed MBHA achieved a high rate of convergence on six real datasets (collected from the UCL machine learning lab), making it suitable for DC problems. Lastly, the evaluations conclusively indicated the appropriateness of the proposed algorithm to resolve DC issues.

Matched MeSH terms: Machine Learning*
Fulltext Knowledge-Based Methods To Train and Optimize Virtual Screening Ensembles

Swift RV, Jusoh SA, Offutt TL, Li ES, Amaro RE

J Chem Inf Model, 2016 05 23;56(5):830-42.
PMID: 27097522 DOI: 10.1021/acs.jcim.5b00684

Ensemble docking can be a successful virtual screening technique that addresses the innate conformational heterogeneity of macromolecular drug targets. Yet, lacking a method to identify a subset of conformational states that effectively segregates active and inactive small molecules, ensemble docking may result in the recommendation of a large number of false positives. Here, three knowledge-based methods that construct structural ensembles for virtual screening are presented. Each method selects ensembles by optimizing an objective function calculated using the receiver operating characteristic (ROC) curve: either the area under the ROC curve (AUC) or a ROC enrichment factor (EF). As the number of receptor conformations, N, becomes large, the methods differ in their asymptotic scaling. Given a set of small molecules with known activities and a collection of target conformations, the most resource intense method is guaranteed to find the optimal ensemble but scales as O(2(N)). A recursive approximation to the optimal solution scales as O(N(2)), and a more severe approximation leads to a faster method that scales linearly, O(N). The techniques are generally applicable to any system, and we demonstrate their effectiveness on the androgen nuclear hormone receptor (AR), cyclin-dependent kinase 2 (CDK2), and the peroxisome proliferator-activated receptor δ (PPAR-δ) drug targets. Conformations that consisted of a crystal structure and molecular dynamics simulation cluster centroids were used to form AR and CDK2 ensembles. Multiple available crystal structures were used to form PPAR-δ ensembles. For each target, we show that the three methods perform similarly to one another on both the training and test sets.

Matched MeSH terms: Machine Learning*
Fulltext Landslide Susceptibility Mapping Using Machine Learning Algorithms and Remote Sensing Data in a Tropical Environment

Nhu VH, Mohammadi A, Shahabi H, Ahmad BB, Al-Ansari N, Shirzadi A, et al.

Int J Environ Res Public Health, 2020 07 08;17(14).
PMID: 32650595 DOI: 10.3390/ijerph17144933

We used AdaBoost (AB), alternating decision tree (ADTree), and their combination as an ensemble model (AB-ADTree) to spatially predict landslides in the Cameron Highlands, Malaysia. The models were trained with a database of 152 landslides compiled using Synthetic Aperture Radar Interferometry, Google Earth images, and field surveys, and 17 conditioning factors (slope, aspect, elevation, distance to road, distance to river, proximity to fault, road density, river density, normalized difference vegetation index, rainfall, land cover, lithology, soil types, curvature, profile curvature, stream power index, and topographic wetness index). We carried out the validation process using the area under the receiver operating characteristic curve (AUC) and several parametric and non-parametric performance metrics, including positive predictive value, negative predictive value, sensitivity, specificity, accuracy, root mean square error, and the Friedman and Wilcoxon sign rank tests. The AB model (AUC = 0.96) performed better than the ensemble AB-ADTree model (AUC = 0.94) and successfully outperformed the ADTree model (AUC = 0.59) in predicting landslide susceptibility. Our findings provide insights into the development of more efficient and accurate landslide predictive models that can be used by decision makers and land-use managers to mitigate landslide hazards.

Matched MeSH terms: Machine Learning*
Fulltext Automatic Annotation of Unlabeled Data from Smartphone-Based Motion and Location Sensors

Pius Owoh N, Mahinderjit Singh M, Zaaba ZF

Sensors (Basel), 2018 Jul 03;18(7).
PMID: 29970823 DOI: 10.3390/s18072134

Automatic data annotation eliminates most of the challenges we faced due to the manual methods of annotating sensor data. It significantly improves users’ experience during sensing activities since their active involvement in the labeling process is reduced. An unsupervised learning technique such as clustering can be used to automatically annotate sensor data. However, the lingering issue with clustering is the validation of generated clusters. In this paper, we adopted the k-means clustering algorithm for annotating unlabeled sensor data for the purpose of detecting sensitive location information of mobile crowd sensing users. Furthermore, we proposed a cluster validation index for the k-means algorithm, which is based on Multiple Pair-Frequency. Thereafter, we trained three classifiers (Support Vector Machine, K-Nearest Neighbor, and Naïve Bayes) using cluster labels generated from the k-means clustering algorithm. The accuracy, precision, and recall of these classifiers were evaluated during the classification of “non-sensitive” and “sensitive” data from motion and location sensors. Very high accuracy scores were recorded from Support Vector Machine and K-Nearest Neighbor classifiers while a fairly high accuracy score was recorded from the Naïve Bayes classifier. With the hybridized machine learning (unsupervised and supervised) technique presented in this paper, unlabeled sensor data was automatically annotated and then classified.

Matched MeSH terms: Machine Learning; Unsupervised Machine Learning
A soft computing approach for diabetes disease classification

Nilashi M, Bin Ibrahim O, Mardani A, Ahani A, Jusoh A

Health Informatics J, 2018 12;24(4):379-393.
PMID: 30376769 DOI: 10.1177/1460458216675500

As a chronic disease, diabetes mellitus has emerged as a worldwide epidemic. The aim of this study is to classify diabetes disease by developing an intelligence system using machine learning techniques. Our method is developed through clustering, noise removal and classification approaches. Accordingly, we use expectation maximization, principal component analysis and support vector machine for clustering, noise removal and classification tasks, respectively. We also develop the proposed method for incremental situation by applying the incremental principal component analysis and incremental support vector machine for incremental learning of data. Experimental results on Pima Indian Diabetes dataset show that proposed method remarkably improves the accuracy of prediction and reduces computation time in relation to the non-incremental approaches. The hybrid intelligent system can assist medical practitioners in the healthcare practice as a decision support system.

Matched MeSH terms: Machine Learning*
Kernel Bayesian ART and ARTMAP

Masuyama N, Loo CK, Dawood F

Neural Netw, 2018 Feb;98:76-86.
PMID: 29202265 DOI: 10.1016/j.neunet.2017.11.003

Adaptive Resonance Theory (ART) is one of the successful approaches to resolving "the plasticity-stability dilemma" in neural networks, and its supervised learning model called ARTMAP is a powerful tool for classification. Among several improvements, such as Fuzzy or Gaussian based models, the state of art model is Bayesian based one, while solving the drawbacks of others. However, it is known that the Bayesian approach for the high dimensional and a large number of data requires high computational cost, and the covariance matrix in likelihood becomes unstable. This paper introduces Kernel Bayesian ART (KBA) and ARTMAP (KBAM) by integrating Kernel Bayes' Rule (KBR) and Correntropy Induced Metric (CIM) to Bayesian ART (BA) and ARTMAP (BAM), respectively, while maintaining the properties of BA and BAM. The kernel frameworks in KBA and KBAM are able to avoid the curse of dimensionality. In addition, the covariance-free Bayesian computation by KBR provides the efficient and stable computational capability to KBA and KBAM. Furthermore, Correntropy-based similarity measurement allows improving the noise reduction ability even in the high dimensional space. The simulation experiments show that KBA performs an outstanding self-organizing capability than BA, and KBAM provides the superior classification ability than BAM, respectively.

Matched MeSH terms: Machine Learning/standards*

Filters

Please provide feedback to Administrator (afdal@afpm.org.my)

External Links