MyMedR

Displaying publications 61 - 80 of 269 in total

Abstract:

Sort:

Fulltext Improved prediction and characterization of anticancer activities of peptides using a novel flexible scoring card method

Charoenkwan P, Chiangjong W, Lee VS, Nantasenamat C, Hasan MM, Shoombuatong W

Sci Rep, 2021 Feb 04;11(1):3017.
PMID: 33542286 DOI: 10.1038/s41598-021-82513-9

As anticancer peptides (ACPs) have attracted great interest for cancer treatment, several approaches based on machine learning have been proposed for ACP identification. Although existing methods have afforded high prediction accuracies, however such models are using a large number of descriptors together with complex ensemble approaches that consequently leads to low interpretability and thus poses a challenge for biologists and biochemists. Therefore, it is desirable to develop a simple, interpretable and efficient predictor for accurate ACP identification as well as providing the means for the rational design of new anticancer peptides with promising potential for clinical application. Herein, we propose a novel flexible scoring card method (FSCM) making use of propensity scores of local and global sequential information for the development of a sequence-based ACP predictor (named iACP-FSCM) for improving the prediction accuracy and model interpretability. To the best of our knowledge, iACP-FSCM represents the first sequence-based ACP predictor for rationalizing an in-depth understanding into the molecular basis for the enhancement of anticancer activities of peptides via the use of FSCM-derived propensity scores. The independent testing results showed that the iACP-FSCM provided accuracies of 0.825 and 0.910 as evaluated on the main and alternative datasets, respectively. Results from comparative benchmarking demonstrated that iACP-FSCM could outperform seven other existing ACP predictors with marked improvements of 7% and 17% for accuracy and MCC, respectively, on the main dataset. Furthermore, the iACP-FSCM (0.910) achieved very comparable results to that of the state-of-the-art ensemble model AntiCP2.0 (0.920) as evaluated on the alternative dataset. Comparative results demonstrated that iACP-FSCM was the most suitable choice for ACP identification and characterization considering its simplicity, interpretability and generalizability. It is highly anticipated that the iACP-FSCM may be a robust tool for the rapid screening and identification of promising ACPs for clinical use.

Matched MeSH terms: Machine Learning
Morphometric dataset of Varanus salvator for non-invasive sex identification using machine learning

Alymann AA, Alymann IA, Ong SQ, Rusli MU, Ahmad AH, Salim H

Sci Data, 2024 Apr 05;11(1):337.
PMID: 38580692 DOI: 10.1038/s41597-024-03172-9

Reliable sex identification in Varanus salvator traditionally relied on invasive methods like genetic analysis or dissection, as less invasive techniques such as hemipenes inversion are unreliable. Given the ecological importance of this species and skewed sex ratios in disturbed habitats, a dataset that allows ecologists or zoologists to study the sex determination of the lizard is crucial. We present a new dataset containing morphometric measurements of V. salvator individuals from the skin trade, with sex confirmed by dissection post- measurement. The dataset consists of a mixture of primary and secondary data such as weight, skull size, tail length, condition etc. and can be used in modelling studies for ecological and conservation research to monitor the sex ratio of this species. Validity was demonstrated by training and testing six machine learning models. This dataset has the potential to streamline sex determination, offering a non-invasive alternative to complement existing methods in V. salvator research, mitigating the need for invasive procedures.

Matched MeSH terms: Machine Learning
Multivariate relationship modeling using nested fuzzy cognitive map

Motlagh O, Papageorgiou E, Tang S, Zamberi Jamaludin

Sains Malaysiana, 2014;43:1781-1790.

Soft computing is an alternative to hard and classic math models especially when it comes to uncertain and incomplete data. This includes regression and relationship modeling of highly interrelated variables with applications in curve fitting, interpolation, classification, supervised learning, generalization, unsupervised learning and forecast. Fuzzy cognitive map (FCM) is a recurrent neural structure that encompasses all possible connections including relationships among inputs, inputs to outputs and feedbacks. This article examines a new methods for nonlinear multivariate regression using fuzzy cognitive map. The main contribution is the application of nested FCM structure to define edge weights in form of meaningful functions rather than crisp values. There are example cases in this article which serve as a platform to modelling even more complex engineering systems. The obtained results, analysis and comparison with similar techniques are included to show the robustness and accuracy of the developed method in multivariate regression, along with future lines of research.

Matched MeSH terms: Supervised Machine Learning; Unsupervised Machine Learning
Utilizing machine learning techniques to predict the blood-brain barrier permeability of compounds detected using LCQTOF-MS in Malaysian Kelulut honey

Edros R, Feng TW, Dong RH

SAR QSAR Environ Res, 2023;34(6):475-500.
PMID: 37409842 DOI: 10.1080/1062936X.2023.2230868

Current in silico modelling techniques, such as molecular dynamics, typically focus on compounds with the highest concentration from chromatographic analyses for bioactivity screening. Consequently, they reduce the need for labour-intensive in vitro studies but limit the utilization of extensive chromatographic data and molecular diversity for compound classification. Compound permeability across the blood-brain barrier (BBB) is a key concern in central nervous system (CNS) drug development, and this limitation can be addressed by applying cheminformatics with codeless machine learning (ML). Among the four models developed in this study, the Random Forest (RF) algorithm with the most robust performance in both internal and external validation was selected for model construction, with an accuracy (ACC) of 87.5% and 86.9% and area under the curve (AUC) of 0.907 and 0.726, respectively. The RF model was deployed to classify 285 compounds detected using liquid chromatography quadrupole time-of-flight mass spectrometry (LCQTOF-MS) in Kelulut honey; of which, 140 compounds were screened with 94 descriptors. Seventeen compounds were predicted to permeate the BBB, revealing their potential as drugs for treating neurodegenerative diseases. Our results highlight the importance of employing ML pattern recognition to identify compounds with neuroprotective potential from the entire pool of chromatographic data.

Matched MeSH terms: Machine Learning
Using artificial intelligence methods for systematic review in health sciences: A systematic review

Blaizot A, Veettil SK, Saidoung P, Moreno-Garcia CF, Wiratunga N, Aceves-Martins M, et al.

Res Synth Methods, 2022 May;13(3):353-362.
PMID: 35174972 DOI: 10.1002/jrsm.1553

The exponential increase in published articles makes a thorough and expedient review of literature increasingly challenging. This review delineated automated tools and platforms that employ artificial intelligence (AI) approaches and evaluated the reported benefits and challenges in using such methods. A search was conducted in 4 databases (Medline, Embase, CDSR, and Epistemonikos) up to April 2021 for systematic reviews and other related reviews implementing AI methods. To be included, the review must use any form of AI method, including machine learning, deep learning, neural network, or any other applications used to enable the full or semi-autonomous performance of one or more stages in the development of evidence synthesis. Twelve reviews were included, using nine different tools to implement 15 different AI methods. Eleven methods were used in the screening stages of the review (73%). The rest were divided: two in data extraction (13%) and two in risk of bias assessment (13%). The ambiguous benefits of the data extractions, combined with the reported advantages from 10 reviews, indicating that AI platforms have taken hold with varying success in evidence synthesis. However, the results are qualified by the reliance on the self-reporting of the review authors. Extensive human validation still appears required at this stage in implementing AI methods, though further evaluation is required to define the overall contribution of such platforms in enhancing efficiency and quality in evidence synthesis.

Matched MeSH terms: Machine Learning
Machine learning approaches in diagnosing tuberculosis through biomarkers - A systematic review

Balakrishnan V, Kherabi Y, Ramanathan G, Paul SA, Tiong CK

Prog Biophys Mol Biol, 2023 May;179:16-25.
PMID: 36931609 DOI: 10.1016/j.pbiomolbio.2023.03.001

Biomarker-based tests may facilitate Tuberculosis (TB) diagnosis, accelerate treatment initiation, and thus improve outcomes. This review synthesizes the literature on biomarker-based detection for TB diagnosis using machine learning. The systematic review approach follows the PRISMA guideline. Articles were sought using relevant keywords from Web of Science, PubMed, and Scopus, resulting in 19 eligible studies after a meticulous screening. All the studies were found to have focused on the supervised learning approach, with Support Vector Machine (SVM) and Random Forest emerging as the top two algorithms, with the highest accuracy, sensitivity and specificity reported to be 97.0%, 99.2%, and 98.0%, respectively. Further, protein-based biomarkers were widely explored, followed by gene-based such as RNA sequence and, Spoligotypes. Publicly available datasets were observed to be popularly used by the studies reviewed whilst studies targeting specific cohorts such as HIV patients or children gathering their own data from healthcare facilities, leading to smaller datasets. Of these, most studies used the leave one out cross validation technique to mitigate overfitting. The review shows that machine learning is increasingly assessed in research to improve TB diagnosis through biomarkers, as promising results were shown in terms of model's detection performance. This provides insights on the possible application of machine learning approaches to diagnose TB using biomarkers as opposed to the traditional methods that can be time consuming. Low-middle income settings, where access to basic biomarkers could be provided as compared to sputum-based tests that are not always available, could be a major application of such models.

Matched MeSH terms: Machine Learning
Fulltext Predictive analysis across spatial scales links zoonotic malaria to deforestation

Brock PM, Fornace KM, Grigg MJ, Anstey NM, William T, Cox J, et al.

Proc Biol Sci, 2019 Jan 16;286(1894):20182351.
PMID: 30963872 DOI: 10.1098/rspb.2018.2351

The complex transmission ecologies of vector-borne and zoonotic diseases pose challenges to their control, especially in changing landscapes. Human incidence of zoonotic malaria ( Plasmodium knowlesi) is associated with deforestation although mechanisms are unknown. Here, a novel application of a method for predicting disease occurrence that combines machine learning and statistics is used to identify the key spatial scales that define the relationship between zoonotic malaria cases and environmental change. Using data from satellite imagery, a case-control study, and a cross-sectional survey, predictive models of household-level occurrence of P. knowlesi were fitted with 16 variables summarized at 11 spatial scales simultaneously. The method identified a strong and well-defined peak of predictive influence of the proportion of cleared land within 1 km of households on P. knowlesi occurrence. Aspect (1 and 2 km), slope (0.5 km) and canopy regrowth (0.5 km) were important at small scales. By contrast, fragmentation of deforested areas influenced P. knowlesi occurrence probability most strongly at large scales (4 and 5 km). The identification of these spatial scales narrows the field of plausible mechanisms that connect land use change and P. knowlesi, allowing for the refinement of disease occurrence predictions and the design of spatially-targeted interventions.

Matched MeSH terms: Machine Learning*
Machine learning improves early prediction of small-for-gestational-age births and reveals nuchal fold thickness as unexpected predictor

Saw SN, Biswas A, Mattar CNZ, Lee HK, Yap CH

Prenat Diagn, 2021 Mar;41(4):505-516.
PMID: 33462877 DOI: 10.1002/pd.5903

OBJECTIVE: To investigate the performance of the machine learning (ML) model in predicting small-for-gestational-age (SGA) at birth, using second-trimester data.
METHODS: Retrospective data of 347 patients, consisting of maternal demographics and ultrasound parameters collected between the 20th and 25th gestational weeks, were studied. ML models were applied to different combinations of the parameters to predict SGA and severe SGA at birth (defined as 10th and third centile birth weight).
RESULTS: Using second-trimester measurements, ML models achieved an accuracy of 70% and 73% in predicting SGA and severe SGA whereas clinical guidelines had accuracies of 64% and 48%. Uterine PI (Ut PI) was found to be an important predictor, corroborating with existing literature, but surprisingly, so was nuchal fold thickness (NF). Logistic regression showed that Ut PI and NF were significant predictors and statistical comparisons showed that these parameters were significantly different in disease. Further, including NF was found to improve ML model performance, and vice versa.
CONCLUSION: ML could potentially improve the prediction of SGA at birth from second-trimester measurements, and demonstrated reduced NF to be an important predictor. Early prediction of SGA allows closer clinical monitoring, which provides an opportunity to discover any underlying diseases associated with SGA.

Matched MeSH terms: Machine Learning
Fulltext SMARTbot: A Behavioral Analysis Framework Augmented with Machine Learning to Identify Mobile Botnet Applications

Karim A, Salleh R, Khan MK

PLoS One, 2016;11(3):e0150077.
PMID: 26978523 DOI: 10.1371/journal.pone.0150077

Botnet phenomenon in smartphones is evolving with the proliferation in mobile phone technologies after leaving imperative impact on personal computers. It refers to the network of computers, laptops, mobile devices or tablets which is remotely controlled by the cybercriminals to initiate various distributed coordinated attacks including spam emails, ad-click fraud, Bitcoin mining, Distributed Denial of Service (DDoS), disseminating other malwares and much more. Likewise traditional PC based botnet, Mobile botnets have the same operational impact except the target audience is particular to smartphone users. Therefore, it is import to uncover this security issue prior to its widespread adaptation. We propose SMARTbot, a novel dynamic analysis framework augmented with machine learning techniques to automatically detect botnet binaries from malicious corpus. SMARTbot is a component based off-device behavioral analysis framework which can generate mobile botnet learning model by inducing Artificial Neural Networks' back-propagation method. Moreover, this framework can detect mobile botnet binaries with remarkable accuracy even in case of obfuscated program code. The results conclude that, a classifier model based on simple logistic regression outperform other machine learning classifier for botnet apps' detection, i.e 99.49% accuracy is achieved. Further, from manual inspection of botnet dataset we have extracted interesting trends in those applications. As an outcome of this research, a mobile botnet dataset is devised which will become the benchmark for future studies.

Matched MeSH terms: Machine Learning
Fulltext Malay sentiment analysis based on combined classification approaches and Senti-lexicon algorithm

Al-Saffar A, Awang S, Tao H, Omar N, Al-Saiagh W, Al-Bared M

PLoS One, 2018;13(4):e0194852.
PMID: 29684036 DOI: 10.1371/journal.pone.0194852

Sentiment analysis techniques are increasingly exploited to categorize the opinion text to one or more predefined sentiment classes for the creation and automated maintenance of review-aggregation websites. In this paper, a Malay sentiment analysis classification model is proposed to improve classification performances based on the semantic orientation and machine learning approaches. First, a total of 2,478 Malay sentiment-lexicon phrases and words are assigned with a synonym and stored with the help of more than one Malay native speaker, and the polarity is manually allotted with a score. In addition, the supervised machine learning approaches and lexicon knowledge method are combined for Malay sentiment classification with evaluating thirteen features. Finally, three individual classifiers and a combined classifier are used to evaluate the classification accuracy. In experimental results, a wide-range of comparative experiments is conducted on a Malay Reviews Corpus (MRC), and it demonstrates that the feature extraction improves the performance of Malay sentiment analysis based on the combined classification. However, the results depend on three factors, the features, the number of features and the classification approach.

Matched MeSH terms: Supervised Machine Learning/classification
Fulltext A machine learning approach of predicting high potential archers by means of physical fitness indicators

Muazu Musa R, P P Abdul Majeed A, Taha Z, Chang SW, Ab Nasir AF, Abdullah MR

PLoS One, 2019;14(1):e0209638.
PMID: 30605456 DOI: 10.1371/journal.pone.0209638

k-nearest neighbour (k-NN) has been shown to be an effective learning algorithm for classification and prediction. However, the application of k-NN for prediction and classification in specific sport is still in its infancy. The present study classified and predicted high and low potential archers from a set of physical fitness variables trained on a variation of k-NN algorithms and logistic regression. 50 youth archers with the mean age and standard deviation of (17.0 ± 0.56) years drawn from various archery programmes completed a one end archery shooting score test. Standard fitness measurements of the handgrip, vertical jump, standing broad jump, static balance, upper muscle strength and the core muscle strength were conducted. Multiple linear regression was utilised to ascertain the significant variables that affect the shooting score. It was demonstrated from the analysis that core muscle strength and vertical jump were statistically significant. Hierarchical agglomerative cluster analysis (HACA) was used to cluster the archers based on the significant variables identified. k-NN model variations, i.e., fine, medium, coarse, cosine, cubic and weighted functions as well as logistic regression, were trained based on the significant performance variables. The HACA clustered the archers into high potential archers (HPA) and low potential archers (LPA). The weighted k-NN outperformed all the tested models at itdemonstrated reasonably good classification on the evaluated indicators with an accuracy of 82.5 ± 4.75% for the prediction of the HPA and the LPA. Moreover, the performance of the classifiers was further investigated against fresh data, which also indicates the efficacy of the weighted k-NN model. These findings could be valuable to coaches and sports managers to recognise high potential archers from a combination of the selected few physical fitness performance indicators identified which would subsequently save cost, time and energy for a talent identification programme.

Matched MeSH terms: Machine Learning
Fulltext Multi-stage feature selection (MSFS) algorithm for UWB-based early breast cancer size prediction

Vijayasarveswari V, Andrew AM, Jusoh M, Sabapathy T, Raof RAA, Yasin MNM, et al.

PLoS One, 2020;15(8):e0229367.
PMID: 32790672 DOI: 10.1371/journal.pone.0229367

Breast cancer is the most common cancer among women and it is one of the main causes of death for women worldwide. To attain an optimum medical treatment for breast cancer, an early breast cancer detection is crucial. This paper proposes a multi- stage feature selection method that extracts statistically significant features for breast cancer size detection using proposed data normalization techniques. Ultra-wideband (UWB) signals, controlled using microcontroller are transmitted via an antenna from one end of the breast phantom and are received on the other end. These ultra-wideband analogue signals are represented in both time and frequency domain. The preprocessed digital data is passed to the proposed multi- stage feature selection algorithm. This algorithm has four selection stages. It comprises of data normalization methods, feature extraction, data dimensional reduction and feature fusion. The output data is fused together to form the proposed datasets, namely, 8-HybridFeature, 9-HybridFeature and 10-HybridFeature datasets. The classification performance of these datasets is tested using the Support Vector Machine, Probabilistic Neural Network and Naïve Bayes classifiers for breast cancer size classification. The research findings indicate that the 8-HybridFeature dataset performs better in comparison to the other two datasets. For the 8-HybridFeature dataset, the Naïve Bayes classifier (91.98%) outperformed the Support Vector Machine (90.44%) and Probabilistic Neural Network (80.05%) classifiers in terms of classification accuracy. The finalized method is tested and visualized in the MATLAB based 2D and 3D environment.

Matched MeSH terms: Machine Learning
Fulltext Ridge regression and its applications in genetic studies

Arashi M, Roozbeh M, Hamzah NA, Gasparini M

PLoS One, 2021;16(4):e0245376.
PMID: 33831027 DOI: 10.1371/journal.pone.0245376

With the advancement of technology, analysis of large-scale data of gene expression is feasible and has become very popular in the era of machine learning. This paper develops an improved ridge approach for the genome regression modeling. When multicollinearity exists in the data set with outliers, we consider a robust ridge estimator, namely the rank ridge regression estimator, for parameter estimation and prediction. On the other hand, the efficiency of the rank ridge regression estimator is highly dependent on the ridge parameter. In general, it is difficult to provide a satisfactory answer about the selection for the ridge parameter. Because of the good properties of generalized cross validation (GCV) and its simplicity, we use it to choose the optimum value of the ridge parameter. The GCV function creates a balance between the precision of the estimators and the bias caused by the ridge estimation. It behaves like an improved estimator of risk and can be used when the number of explanatory variables is larger than the sample size in high-dimensional problems. Finally, some numerical illustrations are given to support our findings.

Matched MeSH terms: Machine Learning*
Fulltext Spoken language identification based on the enhanced self-adjusting extreme learning machine approach

Albadr MAA, Tiun S, Al-Dhief FT, Sammour MAM

PLoS One, 2018;13(4):e0194770.
PMID: 29672546 DOI: 10.1371/journal.pone.0194770

Spoken Language Identification (LID) is the process of determining and classifying natural language from a given content and dataset. Typically, data must be processed to extract useful features to perform LID. The extracting features for LID, based on literature, is a mature process where the standard features for LID have already been developed using Mel-Frequency Cepstral Coefficients (MFCC), Shifted Delta Cepstral (SDC), the Gaussian Mixture Model (GMM) and ending with the i-vector based framework. However, the process of learning based on extract features remains to be improved (i.e. optimised) to capture all embedded knowledge on the extracted features. The Extreme Learning Machine (ELM) is an effective learning model used to perform classification and regression analysis and is extremely useful to train a single hidden layer neural network. Nevertheless, the learning process of this model is not entirely effective (i.e. optimised) due to the random selection of weights within the input hidden layer. In this study, the ELM is selected as a learning model for LID based on standard feature extraction. One of the optimisation approaches of ELM, the Self-Adjusting Extreme Learning Machine (SA-ELM) is selected as the benchmark and improved by altering the selection phase of the optimisation process. The selection process is performed incorporating both the Split-Ratio and K-Tournament methods, the improved SA-ELM is named Enhanced Self-Adjusting Extreme Learning Machine (ESA-ELM). The results are generated based on LID with the datasets created from eight different languages. The results of the study showed excellent superiority relating to the performance of the Enhanced Self-Adjusting Extreme Learning Machine LID (ESA-ELM LID) compared with the SA-ELM LID, with ESA-ELM LID achieving an accuracy of 96.25%, as compared to the accuracy of SA-ELM LID of only 95.00%.

Matched MeSH terms: Machine Learning*
Fulltext An improved adaptive memetic differential evolution optimization algorithms for data clustering problems

Mustafa HMJ, Ayob M, Nazri MZA, Kendall G

PLoS One, 2019;14(5):e0216906.
PMID: 31137034 DOI: 10.1371/journal.pone.0216906

The performance of data clustering algorithms is mainly dependent on their ability to balance between the exploration and exploitation of the search process. Although some data clustering algorithms have achieved reasonable quality solutions for some datasets, their performance across real-life datasets could be improved. This paper proposes an adaptive memetic differential evolution optimisation algorithm (AMADE) for addressing data clustering problems. The memetic algorithm (MA) employs an adaptive differential evolution (DE) mutation strategy, which can offer superior mutation performance across many combinatorial and continuous problem domains. By hybridising an adaptive DE mutation operator with the MA, we propose that it can lead to faster convergence and better balance the exploration and exploitation of the search. We would also expect that the performance of AMADE to be better than MA and DE if executed separately. Our experimental results, based on several real-life benchmark datasets, shows that AMADE outperformed other compared clustering algorithms when compared using statistical analysis. We conclude that the hybridisation of MA and the adaptive DE is a suitable approach for addressing data clustering problems and can improve the balance between global exploration and local exploitation of the optimisation algorithm.

Matched MeSH terms: Machine Learning*
Fulltext Predicting the number of defects in a new software version

Felix EA, Lee SP

PLoS One, 2020;15(3):e0229131.
PMID: 32187181 DOI: 10.1371/journal.pone.0229131

Predicting the number of defects in software at the method level is important. However, little or no research has focused on method-level defect prediction. Therefore, considerable efforts are still required to demonstrate how method-level defect prediction can be achieved for a new software version. In the current study, we present an analysis of the relevant information obtained from the current version of a software product to construct regression models to predict the estimated number of defects in a new version using the variables of defect density, defect velocity and defect introduction time, which show considerable correlation with the number of method-level defects. These variables also show a mathematical relationship between defect density and defect acceleration at the method level, further indicating that the increase in the number of defects and the defect density are functions of the defect acceleration. We report an experiment conducted on the Finding Faults Using Ensemble Learners (ELFF) open-source Java projects, which contain 289,132 methods. The results show correlation coefficients of 60% for the defect density, -4% for the defect introduction time, and 93% for the defect velocity. These findings indicate that the average defect velocity shows a firm and considerable correlation with the number of defects at the method level. The proposed approach also motivates an investigation and comparison of the average performances of classifiers before and after method-level data preprocessing and of the level of entropy in the datasets.

Matched MeSH terms: Machine Learning
Fulltext Advanced machine learning model for better prediction accuracy of soil temperature at different depths

Alizamir M, Kisi O, Ahmed AN, Mert C, Fai CM, Kim S, et al.

PLoS One, 2020;15(4):e0231055.
PMID: 32287272 DOI: 10.1371/journal.pone.0231055

Soil temperature has a vital importance in biological, physical and chemical processes of terrestrial ecosystem and its modeling at different depths is very important for land-atmosphere interactions. The study compares four machine learning techniques, extreme learning machine (ELM), artificial neural networks (ANN), classification and regression trees (CART) and group method of data handling (GMDH) in estimating monthly soil temperatures at four different depths. Various combinations of climatic variables are utilized as input to the developed models. The models' outcomes are also compared with multi-linear regression based on Nash-Sutcliffe efficiency, root mean square error, and coefficient of determination statistics. ELM is found to be generally performs better than the other four alternatives in estimating soil temperatures. A decrease in performance of the models is observed by an increase in soil depth. It is found that soil temperatures at three depths (5, 10 and 50 cm) could be mapped utilizing only air temperature data as input while solar radiation and wind speed information are also required for estimating soil temperature at the depth of 100 cm.

Matched MeSH terms: Machine Learning
Fulltext Credit card fraud detection using a hierarchical behavior-knowledge space model

Nandi AK, Randhawa KK, Chua HS, Seera M, Lim CP

PLoS One, 2022 01 20;17(1):e0260579.
PMID: 35051184 DOI: 10.1371/journal.pone.0260579

With the advancement in machine learning, researchers continue to devise and implement effective intelligent methods for fraud detection in the financial sector. Indeed, credit card fraud leads to billions of dollars in losses for merchants every year. In this paper, a multi-classifier framework is designed to address the challenges of credit card fraud detections. An ensemble model with multiple machine learning classification algorithms is designed, in which the Behavior-Knowledge Space (BKS) is leveraged to combine the predictions from multiple classifiers. To ascertain the effectiveness of the developed ensemble model, publicly available data sets as well as real financial records are employed for performance evaluations. Through statistical tests, the results positively indicate the effectiveness of the developed model as compared with the commonly used majority voting method for combination of predictions from multiple classifiers in tackling noisy data classification as well as credit card fraud detection problems.

Matched MeSH terms: Machine Learning*
Fulltext Investigating the association of CD36 gene polymorphisms (rs1761667 and rs1527483) with T2DM and dyslipidemia: Statistical analysis, machine learning based prediction, and meta-analysis

Hatmal MM, Alshaer W, Mahmoud IS, Al-Hatamleh MAI, Al-Ameer HJ, Abuyaman O, et al.

PLoS One, 2021;16(10):e0257857.
PMID: 34648514 DOI: 10.1371/journal.pone.0257857

CD36 (cluster of differentiation 36) is a membrane protein involved in lipid metabolism and has been linked to pathological conditions associated with metabolic disorders, such as diabetes and dyslipidemia. A case-control study was conducted and included 177 patients with type-2 diabetes mellitus (T2DM) and 173 control subjects to study the involvement of CD36 gene rs1761667 (G>A) and rs1527483 (C>T) polymorphisms in the pathogenesis of T2DM and dyslipidemia among Jordanian population. Lipid profile, blood sugar, gender and age were measured and recorded. Also, genotyping analysis for both polymorphisms was performed. Following statistical analysis, 10 different neural networks and machine learning (ML) tools were used to predict subjects with diabetes or dyslipidemia. Towards further understanding of the role of CD36 protein and gene in T2DM and dyslipidemia, a protein-protein interaction network and meta-analysis were carried out. For both polymorphisms, the genotypic frequencies were not significantly different between the two groups (p > 0.05). On the other hand, some ML tools like multilayer perceptron gave high prediction accuracy (≥ 0.75) and Cohen's kappa (κ) (≥ 0.5). Interestingly, in K-star tool, the accuracy and Cohen's κ values were enhanced by including the genotyping results as inputs (0.73 and 0.46, respectively, compared to 0.67 and 0.34 without including them). This study confirmed, for the first time, that there is no association between CD36 polymorphisms and T2DM or dyslipidemia among Jordanian population. Prediction of T2DM and dyslipidemia, using these extensive ML tools and based on such input data, is a promising approach for developing diagnostic and prognostic prediction models for a wide spectrum of diseases, especially based on large medical databases.

Matched MeSH terms: Machine Learning
Fulltext Short- and long-term mortality prediction after an acute ST-elevation myocardial infarction (STEMI) in Asians: A machine learning approach

Aziz F, Malek S, Ibrahim KS, Raja Shariff RE, Wan Ahmad WA, Ali RM, et al.

PLoS One, 2021;16(8):e0254894.
PMID: 34339432 DOI: 10.1371/journal.pone.0254894

BACKGROUND: Conventional risk score for predicting short and long-term mortality following an ST-segment elevation myocardial infarction (STEMI) is often not population specific.
OBJECTIVE: Apply machine learning for the prediction and identification of factors associated with short and long-term mortality in Asian STEMI patients and compare with a conventional risk score.
METHODS: The National Cardiovascular Disease Database for Malaysia registry, of a multi-ethnic, heterogeneous Asian population was used for in-hospital (6299 patients), 30-days (3130 patients), and 1-year (2939 patients) model development. 50 variables were considered. Mortality prediction was analysed using feature selection methods with machine learning algorithms and compared to Thrombolysis in Myocardial Infarction (TIMI) score. Invasive management of varying degrees was selected as important variables that improved mortality prediction.
RESULTS: Model performance using a complete and reduced variable produced an area under the receiver operating characteristic curve (AUC) from 0.73 to 0.90. The best machine learning model for in-hospital, 30 days, and 1-year outperformed TIMI risk score (AUC = 0.88, 95% CI: 0.846-0.910; vs AUC = 0.81, 95% CI:0.772-0.845, AUC = 0.90, 95% CI: 0.870-0.935; vs AUC = 0.80, 95% CI: 0.746-0.838, AUC = 0.84, 95% CI: 0.798-0.872; vs AUC = 0.76, 95% CI: 0.715-0.802, p < 0.0001 for all). TIMI score underestimates patients' risk of mortality. 90% of non-survival patients are classified as high risk (>50%) by machine learning algorithm compared to 10-30% non-survival patients by TIMI. Common predictors identified for short- and long-term mortality were age, heart rate, Killip class, fasting blood glucose, prior primary PCI or pharmaco-invasive therapy and diuretics. The final algorithm was converted into an online tool with a database for continuous data archiving for algorithm validation.
CONCLUSIONS: In a multi-ethnic population, patients with STEMI were better classified using the machine learning method compared to TIMI scoring. Machine learning allows for the identification of distinct factors in individual Asian populations for better mortality prediction. Ongoing continuous testing and validation will allow for better risk stratification and potentially alter management and outcomes in the future.

Matched MeSH terms: Machine Learning*

Filters

Please provide feedback to Administrator (afdal@afpm.org.my)

External Links