MyMedR

Displaying publications 81 - 100 of 269 in total

Abstract:

Sort:

Fulltext Acoustic emission corrosion feature extraction and severity prediction using hybrid wavelet packet transform and linear support vector classifier

May Z, Alam MK, Nayan NA, Rahman NAA, Mahmud MS

PLoS One, 2021;16(12):e0261040.
PMID: 34914761 DOI: 10.1371/journal.pone.0261040

Corrosion in carbon-steel pipelines leads to failure, which is a major cause of breakdown maintenance in the oil and gas industries. The acoustic emission (AE) signal is a reliable method for corrosion detection and classification in the modern Structural Health Monitoring (SHM) system. The efficiency of this system in detection and classification mainly depends on the suitable AE features. Therefore, many feature extraction and classification methods have been developed for corrosion detection and severity assessment. However, the extraction of appropriate AE features and classification of various levels of corrosion utilizing these extracted features are still challenging issues. To overcome these issues, this article proposes a hybrid machine learning approach that combines Wavelet Packet Transform (WPT) integrated with Fast Fourier Transform (FFT) for multiresolution feature extraction and Linear Support Vector Classifier (L-SVC) for predicting corrosion severity levels. A Laboratory-based Linear Polarization Resistance (LPR) test was performed on carbon-steel samples for AE data acquisition over a different time span. AE signals were collected at a high sampling rate with a sound well AE sensor using AEWin software. Simulation results show a linear relationship between the proposed approach-based extracted AE features and the corrosion process. For multi-class problems, three corrosion severity stages have been made based on the corrosion rate over time and AE activity. The ANOVA test results indicate the significance within and between the feature-groups where F-values (F-value>1) rejects the null hypothesis and P-values (P-value<0.05) are less than the significance level. The utilized L-SVC classifier achieves higher prediction accuracy of 99.0% than the accuracy of other benchmarked classifiers. Findings of our proposed machine learning approach confirm that it can be effectively utilized for corrosion detection and severity assessment in SHM applications.

Matched MeSH terms: Machine Learning*
Fulltext A static analysis approach for Android permission-based malware detection systems

Mohamad Arif J, Ab Razak MF, Awang S, Tuan Mat SR, Ismail NSN, Firdaus A

PLoS One, 2021;16(9):e0257968.
PMID: 34591930 DOI: 10.1371/journal.pone.0257968

The evolution of malware is causing mobile devices to crash with increasing frequency. Therefore, adequate security evaluations that detect Android malware are crucial. Two techniques can be used in this regard: Static analysis, which meticulously examines the full codes of applications, and dynamic analysis, which monitors malware behaviour. While both perform security evaluations successfully, there is still room for improvement. The goal of this research is to examine the effectiveness of static analysis to detect Android malware by using permission-based features. This study proposes machine learning with different sets of classifiers was used to evaluate Android malware detection. The feature selection method in this study was applied to determine which features were most capable of distinguishing malware. A total of 5,000 Drebin malware samples and 5,000 Androzoo benign samples were utilised. The performances of the different sets of classifiers were then compared. The results indicated that with a TPR value of 91.6%, the Random Forest algorithm achieved the highest level of accuracy in malware detection.

Matched MeSH terms: Machine Learning*
Fulltext An improved wrapper-based feature selection method for machinery fault diagnosis

Hui KH, Ooi CS, Lim MH, Leong MS, Al-Obaidi SM

PLoS One, 2017;12(12):e0189143.
PMID: 29261689 DOI: 10.1371/journal.pone.0189143

A major issue of machinery fault diagnosis using vibration signals is that it is over-reliant on personnel knowledge and experience in interpreting the signal. Thus, machine learning has been adapted for machinery fault diagnosis. The quantity and quality of the input features, however, influence the fault classification performance. Feature selection plays a vital role in selecting the most representative feature subset for the machine learning algorithm. In contrast, the trade-off relationship between capability when selecting the best feature subset and computational effort is inevitable in the wrapper-based feature selection (WFS) method. This paper proposes an improved WFS technique before integration with a support vector machine (SVM) model classifier as a complete fault diagnosis system for a rolling element bearing case study. The bearing vibration dataset made available by the Case Western Reserve University Bearing Data Centre was executed using the proposed WFS and its performance has been analysed and discussed. The results reveal that the proposed WFS secures the best feature subset with a lower computational effort by eliminating the redundancy of re-evaluation. The proposed WFS has therefore been found to be capable and efficient to carry out feature selection tasks.

Matched MeSH terms: Machine Learning
Fulltext Extreme learning machine based optimal embedding location finder for image steganography

Atee HA, Ahmad R, Noor NM, Rahma AM, Aljeroudi Y

PLoS One, 2017;12(2):e0170329.
PMID: 28196080 DOI: 10.1371/journal.pone.0170329

In image steganography, determining the optimum location for embedding the secret message precisely with minimum distortion of the host medium remains a challenging issue. Yet, an effective approach for the selection of the best embedding location with least deformation is far from being achieved. To attain this goal, we propose a novel approach for image steganography with high-performance, where extreme learning machine (ELM) algorithm is modified to create a supervised mathematical model. This ELM is first trained on a part of an image or any host medium before being tested in the regression mode. This allowed us to choose the optimal location for embedding the message with best values of the predicted evaluation metrics. Contrast, homogeneity, and other texture features are used for training on a new metric. Furthermore, the developed ELM is exploited for counter over-fitting while training. The performance of the proposed steganography approach is evaluated by computing the correlation, structural similarity (SSIM) index, fusion matrices, and mean square error (MSE). The modified ELM is found to outperform the existing approaches in terms of imperceptibility. Excellent features of the experimental results demonstrate that the proposed steganographic approach is greatly proficient for preserving the visual information of an image. An improvement in the imperceptibility as much as 28% is achieved compared to the existing state of the art methods.

Matched MeSH terms: Machine Learning*
Fulltext Mpropred: A machine learning (ML) driven Web-App for bioactivity prediction of SARS-CoV-2 main protease (Mpro) antagonists

Ferdous N, Reza MN, Hossain MU, Mahmud S, Napis S, Chowdhury K, et al.

PLoS One, 2023;18(6):e0287179.
PMID: 37352252 DOI: 10.1371/journal.pone.0287179

The severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) pandemic emerged in 2019 and still requiring treatments with fast clinical translatability. Frequent occurrence of mutations in spike glycoprotein of SARS-CoV-2 led the consideration of an alternative therapeutic target to combat the ongoing pandemic. The main protease (Mpro) is such an attractive drug target due to its importance in maturating several polyproteins during the replication process. In the present study, we used a classification structure-activity relationship (CSAR) model to find substructures that leads to to anti-Mpro activities among 758 non-redundant compounds. A set of 12 fingerprints were used to describe Mpro inhibitors, and the random forest approach was used to build prediction models from 100 distinct data splits. The data set's modelability (MODI index) was found to be robust, with a value of 0.79 above the 0.65 threshold. The accuracy (89%), sensitivity (89%), specificity (73%), and Matthews correlation coefficient (79%) used to calculate the prediction performance, was also found to be statistically robust. An extensive analysis of the top significant descriptors unveiled the significance of methyl side chains, aromatic ring and halogen groups for Mpro inhibition. Finally, the predictive model is made publicly accessible as a web-app named Mpropred in order to allow users to predict the bioactivity of compounds against SARS-CoV-2 Mpro. Later, CMNPD, a marine compound database was screened by our app to predict bioactivity of all the compounds and results revealed significant correlation with their binding affinity to Mpro. Molecular dynamics (MD) simulation and molecular mechanics/Poisson Boltzmann surface area (MM/PBSA) analysis showed improved properties of the complexes. Thus, the knowledge and web-app shown herein can be used to develop more effective and specific inhibitors against the SARS-CoV-2 Mpro. The web-app can be accessed from https://share.streamlit.io/nadimfrds/mpropred/Mpropred_app.py.

Matched MeSH terms: Machine Learning
Fulltext On the use of QDE-SVM for gene feature selection and cell type classification from scRNA-seq data

Ng GYL, Tan SC, Ong CS

PLoS One, 2023;18(10):e0292961.
PMID: 37856458 DOI: 10.1371/journal.pone.0292961

Cell type identification is one of the fundamental tasks in single-cell RNA sequencing (scRNA-seq) studies. It is a key step to facilitate downstream interpretations such as differential expression, trajectory inference, etc. scRNA-seq data contains technical variations that could affect the interpretation of the cell types. Therefore, gene selection, also known as feature selection in data science, plays an important role in selecting informative genes for scRNA-seq cell type identification. Generally speaking, feature selection methods are categorized into filter-, wrapper-, and embedded-based approaches. From the existing literature, methods from filter- and embedded-based approaches are widely applied in scRNA-seq gene selection tasks. The wrapper-based method that gives promising results in other fields has yet been extensively utilized for selecting gene features from scRNA-seq data; in addition, most of the existing wrapper methods used in this field are clustering instead of classification-based. With a large number of annotated data available today, this study applied a classification-based approach as an alternative to the clustering-based wrapper method. In our work, a quantum-inspired differential evolution (QDE) wrapped with a classification method was introduced to select a subset of genes from twelve well-known scRNA-seq transcriptomic datasets to identify cell types. In particular, the QDE was combined with different machine-learning (ML) classifiers namely logistic regression, decision tree, support vector machine (SVM) with linear and radial basis function kernels, as well as extreme learning machine. The linear SVM wrapped with QDE, namely QDE-SVM, was chosen by referring to the feature selection results from the experiment. QDE-SVM showed a superior cell type classification performance among QDE wrapping with other ML classifiers as well as the recent wrapper methods (i.e., FSCAM, SSD-LAHC, MA-HS, and BSF). QDE-SVM achieved an average accuracy of 0.9559, while the other wrapper methods achieved average accuracies in the range of 0.8292 to 0.8872.

Matched MeSH terms: Machine Learning
Fulltext Ensemble learning for multi-class COVID-19 detection from big data

Kaleem S, Sohail A, Tariq MU, Babar M, Qureshi B

PLoS One, 2023;18(10):e0292587.
PMID: 37819992 DOI: 10.1371/journal.pone.0292587

Coronavirus disease (COVID-19), which has caused a global pandemic, continues to have severe effects on human lives worldwide. Characterized by symptoms similar to pneumonia, its rapid spread requires innovative strategies for its early detection and management. In response to this crisis, data science and machine learning (ML) offer crucial solutions to complex problems, including those posed by COVID-19. One cost-effective approach to detect the disease is the use of chest X-rays, which is a common initial testing method. Although existing techniques are useful for detecting COVID-19 using X-rays, there is a need for further improvement in efficiency, particularly in terms of training and execution time. This article introduces an advanced architecture that leverages an ensemble learning technique for COVID-19 detection from chest X-ray images. Using a parallel and distributed framework, the proposed model integrates ensemble learning with big data analytics to facilitate parallel processing. This approach aims to enhance both execution and training times, ensuring a more effective detection process. The model's efficacy was validated through a comprehensive analysis of predicted and actual values, and its performance was meticulously evaluated for accuracy, precision, recall, and F-measure, and compared to state-of-the-art models. The work presented here not only contributes to the ongoing fight against COVID-19 but also showcases the wider applicability and potential of ensemble learning techniques in healthcare.

Matched MeSH terms: Machine Learning
Fulltext Multi-population Black Hole Algorithm for the problem of data clustering

Salih SQ, Alsewari AA, Wahab HA, Mohammed MKA, Rashid TA, Das D, et al.

PLoS One, 2023;18(7):e0288044.
PMID: 37406006 DOI: 10.1371/journal.pone.0288044

The retrieval of important information from a dataset requires applying a special data mining technique known as data clustering (DC). DC classifies similar objects into a groups of similar characteristics. Clustering involves grouping the data around k-cluster centres that typically are selected randomly. Recently, the issues behind DC have called for a search for an alternative solution. Recently, a nature-based optimization algorithm named Black Hole Algorithm (BHA) was developed to address the several well-known optimization problems. The BHA is a metaheuristic (population-based) that mimics the event around the natural phenomena of black holes, whereby an individual star represents the potential solutions revolving around the solution space. The original BHA algorithm showed better performance compared to other algorithms when applied to a benchmark dataset, despite its poor exploration capability. Hence, this paper presents a multi-population version of BHA as a generalization of the BHA called MBHA wherein the performance of the algorithm is not dependent on the best-found solution but a set of generated best solutions. The method formulated was subjected to testing using a set of nine widespread and popular benchmark test functions. The ensuing experimental outcomes indicated the highly precise results generated by the method compared to BHA and comparable algorithms in the study, as well as excellent robustness. Furthermore, the proposed MBHA achieved a high rate of convergence on six real datasets (collected from the UCL machine learning lab), making it suitable for DC problems. Lastly, the evaluations conclusively indicated the appropriateness of the proposed algorithm to resolve DC issues.

Matched MeSH terms: Machine Learning*
Fulltext Improved accuracy and less fault prediction errors via modified sequential minimal optimization algorithm

Asim Shahid M, Alam MM, Mohd Su'ud M

PLoS One, 2023;18(4):e0284209.
PMID: 37053173 DOI: 10.1371/journal.pone.0284209

The benefits and opportunities offered by cloud computing are among the fastest-growing technologies in the computer industry. Additionally, it addresses the difficulties and issues that make more users more likely to accept and use the technology. The proposed research comprised of machine learning (ML) algorithms is Naïve Bayes (NB), Library Support Vector Machine (LibSVM), Multinomial Logistic Regression (MLR), Sequential Minimal Optimization (SMO), K Nearest Neighbor (KNN), and Random Forest (RF) to compare the classifier gives better results in accuracy and less fault prediction. In this research, the secondary data results (CPU-Mem Mono) give the highest percentage of accuracy and less fault prediction on the NB classifier in terms of 80/20 (77.01%), 70/30 (76.05%), and 5 folds cross-validation (74.88%), and (CPU-Mem Multi) in terms of 80/20 (89.72%), 70/30 (90.28%), and 5 folds cross-validation (92.83%). Furthermore, on (HDD Mono) the SMO classifier gives the highest percentage of accuracy and less fault prediction fault in terms of 80/20 (87.72%), 70/30 (89.41%), and 5 folds cross-validation (88.38%), and (HDD-Multi) in terms of 80/20 (93.64%), 70/30 (90.91%), and 5 folds cross-validation (88.20%). Whereas, primary data results found RF classifier gives the highest percentage of accuracy and less fault prediction in terms of 80/20 (97.14%), 70/30 (96.19%), and 5 folds cross-validation (95.85%) in the primary data results, but the algorithm complexity (0.17 seconds) is not good. In terms of 80/20 (95.71%), 70/30 (95.71%), and 5 folds cross-validation (95.71%), SMO has the second highest accuracy and less fault prediction, but the algorithm complexity is good (0.3 seconds). The difference in accuracy and less fault prediction between RF and SMO is only (.13%), and the difference in time complexity is (14 seconds). We have decided that we will modify SMO. Finally, the Modified Sequential Minimal Optimization (MSMO) Algorithm method has been proposed to get the highest accuracy & less fault prediction errors in terms of 80/20 (96.42%), 70/30 (96.42%), & 5 fold cross validation (96.50%).

Matched MeSH terms: Machine Learning*
Fulltext Machine learning models development for accurate multi-months ahead drought forecasting: Case study of the Great Lakes, North America

Hameed MM, Razali SFM, Mohtar WHMW, Rahman NA, Yaseen ZM

PLoS One, 2023;18(10):e0290891.
PMID: 37906556 DOI: 10.1371/journal.pone.0290891

The Great Lakes are critical freshwater sources, supporting millions of people, agriculture, and ecosystems. However, climate change has worsened droughts, leading to significant economic and social consequences. Accurate multi-month drought forecasting is, therefore, essential for effective water management and mitigating these impacts. This study introduces the Multivariate Standardized Lake Water Level Index (MSWI), a modified drought index that utilizes water level data collected from 1920 to 2020. Four hybrid models are developed: Support Vector Regression with Beluga whale optimization (SVR-BWO), Random Forest with Beluga whale optimization (RF-BWO), Extreme Learning Machine with Beluga whale optimization (ELM-BWO), and Regularized ELM with Beluga whale optimization (RELM-BWO). The models forecast droughts up to six months ahead for Lake Superior and Lake Michigan-Huron. The best-performing model is then selected to forecast droughts for the remaining three lakes, which have not experienced severe droughts in the past 50 years. The results show that incorporating the BWO improves the accuracy of all classical models, particularly in forecasting drought turning and critical points. Among the hybrid models, the RELM-BWO model achieves the highest level of accuracy, surpassing both classical and hybrid models by a significant margin (7.21 to 76.74%). Furthermore, Monte-Carlo simulation is employed to analyze uncertainties and ensure the reliability of the forecasts. Accordingly, the RELM-BWO model reliably forecasts droughts for all lakes, with a lead time ranging from 2 to 6 months. The study's findings offer valuable insights for policymakers, water managers, and other stakeholders to better prepare drought mitigation strategies.

Matched MeSH terms: Machine Learning
Fulltext Common institutional investors and the quality of management earnings forecasts-Empirical and machine learning evidences

Yang S, Li X, Jiang Z, Xiao M

PLoS One, 2023;18(10):e0290126.
PMID: 37844110 DOI: 10.1371/journal.pone.0290126

Based on the data of the Chinese A-share listed firms in China Shanghai and Shenzhen Stock Exchange from 2014 to 2021, this article explores the relationship between common institutional investors and the quality of management earnings forecasts. The study used the multiple linear regression model and empirically found that common institutional investors positively impact the precision of earnings forecasts. This article also uses graph neural networks to predict the precision of earnings forecasts. Our findings have shown that common institutional investors form external supervision over restricting management to release a wide width of earnings forecasts, which helps to improve the risk warning function of earnings forecasts and promote the sustainable development of information disclosure from management in the Chinese capital market. One of the marginal contributions of this paper is that it enriches the literature related to the economic consequences of common institutional shareholding. Then, the neural network method used to predict the quality of management forecasts enhances the research method of institutional investors and the behavior of management earnings forecasts. Thirdly, this paper calls for strengthening information sharing and circulation among institutional investors to reduce information asymmetry between investors and management.

Matched MeSH terms: Machine Learning
Fulltext Machine learning algorithm for ventilator mode selection, pressure and volume control

T A, G G, P AMD, Assaad M

PLoS One, 2024;19(3):e0299653.
PMID: 38478485 DOI: 10.1371/journal.pone.0299653

Mechanical ventilation techniques are vital for preserving individuals with a serious condition lives in the prolonged hospitalization unit. Nevertheless, an imbalance amid the hospitalized people demands and the respiratory structure could cause to inconsistencies in the patient's inhalation. To tackle this problem, this study presents an Iterative Learning PID Controller (ILC-PID), a unique current cycle feedback type controller that helps in gaining the correct pressure and volume. The paper also offers a clear and complete examination of the primarily efficient neural approach for generating optimal inhalation strategies. Moreover, machine learning-based classifiers are used to evaluate the precision and performance of the ILC-PID controller. These classifiers able to forecast and choose the perfect type for various inhalation modes, eliminating the likelihood that patients will require mechanical ventilation. In pressure control, the suggested accurate neural categorization exhibited an average accuracy rate of 88.2% in continuous positive airway pressure (CPAP) mode and 91.7% in proportional assist ventilation (PAV) mode while comparing with the other classifiers like ensemble classifier has reduced accuracy rate of 69.5% in CPAP mode and also 71.7% in PAV mode. An average accuracy of 78.9% rate in other classifiers compared to neutral network in CPAP. The neural model had an typical range of 81.6% in CPAP mode and 84.59% in PAV mode for 20 cm H2O of volume created by the neural network classifier in the volume investigation. Compared to the other classifiers, an average of 72.17% was in CPAP mode, and 77.83% was in PAV mode in volume control. Different approaches, such as decision trees, optimizable Bayes trees, naive Bayes trees, nearest neighbour trees, and an ensemble of trees, were also evaluated regarding the accuracy by confusion matrix concept, training duration, specificity, sensitivity, and F1 score.

Matched MeSH terms: Machine Learning
Fulltext HDG-select: A novel GUI based application for gene selection and classification in high dimensional datasets

Hameed SS, Hassan R, Hassan WH, Muhammadsharif FF, Latiff LA

PLoS One, 2021;16(1):e0246039.
PMID: 33507983 DOI: 10.1371/journal.pone.0246039

The selection and classification of genes is essential for the identification of related genes to a specific disease. Developing a user-friendly application with combined statistical rigor and machine learning functionality to help the biomedical researchers and end users is of great importance. In this work, a novel stand-alone application, which is based on graphical user interface (GUI), is developed to perform the full functionality of gene selection and classification in high dimensional datasets. The so-called HDG-select application is validated on eleven high dimensional datasets of the format CSV and GEO soft. The proposed tool uses the efficient algorithm of combined filter-GBPSO-SVM and it was made freely available to users. It was found that the proposed HDG-select outperformed other tools reported in literature and presented a competitive performance, accessibility, and functionality.

Matched MeSH terms: Machine Learning*
Fulltext Optimised genetic algorithm-extreme learning machine approach for automatic COVID-19 detection

Albadr MAA, Tiun S, Ayob M, Al-Dhief FT, Omar K, Hamzah FA

PLoS One, 2020;15(12):e0242899.
PMID: 33320858 DOI: 10.1371/journal.pone.0242899

The coronavirus disease (COVID-19), is an ongoing global pandemic caused by severe acute respiratory syndrome. Chest Computed Tomography (CT) is an effective method for detecting lung illnesses, including COVID-19. However, the CT scan is expensive and time-consuming. Therefore, this work focus on detecting COVID-19 using chest X-ray images because it is widely available, faster, and cheaper than CT scan. Many machine learning approaches such as Deep Learning, Neural Network, and Support Vector Machine; have used X-ray for detecting the COVID-19. Although the performance of those approaches is acceptable in terms of accuracy, however, they require high computational time and more memory space. Therefore, this work employs an Optimised Genetic Algorithm-Extreme Learning Machine (OGA-ELM) with three selection criteria (i.e., random, K-tournament, and roulette wheel) to detect COVID-19 using X-ray images. The most crucial strength factors of the Extreme Learning Machine (ELM) are: (i) high capability of the ELM in avoiding overfitting; (ii) its usability on binary and multi-type classifiers; and (iii) ELM could work as a kernel-based support vector machine with a structure of a neural network. These advantages make the ELM efficient in achieving an excellent learning performance. ELMs have successfully been applied in many domains, including medical domains such as breast cancer detection, pathological brain detection, and ductal carcinoma in situ detection, but not yet tested on detecting COVID-19. Hence, this work aims to identify the effectiveness of employing OGA-ELM in detecting COVID-19 using chest X-ray images. In order to reduce the dimensionality of a histogram oriented gradient features, we use principal component analysis. The performance of OGA-ELM is evaluated on a benchmark dataset containing 188 chest X-ray images with two classes: a healthy and a COVID-19 infected. The experimental result shows that the OGA-ELM achieves 100.00% accuracy with fast computation time. This demonstrates that OGA-ELM is an efficient method for COVID-19 detecting using chest X-ray images.

Matched MeSH terms: Machine Learning*
Fulltext An improved method to detect arrhythmia using ensemble learning-based model in multi lead electrocardiogram (ECG)

Mandala S, Rizal A, Adiwijaya, Nurmaini S, Suci Amini S, Almayda Sudarisman G, et al.

PLoS One, 2024;19(4):e0297551.
PMID: 38593145 DOI: 10.1371/journal.pone.0297551

Arrhythmia is a life-threatening cardiac condition characterized by irregular heart rhythm. Early and accurate detection is crucial for effective treatment. However, single-lead electrocardiogram (ECG) methods have limited sensitivity and specificity. This study propose an improved ensemble learning approach for arrhythmia detection using multi-lead ECG data. Proposed method, based on a boosting algorithm, namely Fine Tuned Boosting (FTBO) model detects multiple arrhythmia classes. For the feature extraction, introduce a new technique that utilizes a sliding window with a window size of 5 R-peaks. This study compared it with other models, including bagging and stacking, and assessed the impact of parameter tuning. Rigorous experiments on the MIT-BIH arrhythmia database focused on Premature Ventricular Contraction (PVC), Atrial Premature Contraction (PAC), and Atrial Fibrillation (AF) have been performed. The results showed that the proposed method achieved high sensitivity, specificity, and accuracy for all three classes of arrhythmia. It accurately detected Atrial Fibrillation (AF) with 100% sensitivity and specificity. For Premature Ventricular Contraction (PVC) detection, it achieved 99% sensitivity and specificity in both leads. Similarly, for Atrial Premature Contraction (PAC) detection, proposed method achieved almost 96% sensitivity and specificity in both leads. The proposed method shows great potential for early arrhythmia detection using multi-lead ECG data.

Matched MeSH terms: Machine Learning
Fulltext Machine learning in internet financial risk management: A systematic literature review

Tian X, Tian Z, Khatib SFA, Wang Y

PLoS One, 2024;19(4):e0300195.
PMID: 38625972 DOI: 10.1371/journal.pone.0300195

Internet finance has permeated into myriad households, bringing about lifestyle convenience alongside potential risks. Presently, internet finance enterprises are progressively adopting machine learning and other artificial intelligence methods for risk alertness. What is the current status of the application of various machine learning models and algorithms across different institutions? Is there an optimal machine learning algorithm suited for the majority of internet finance platforms and application scenarios? Scholars have embarked on a series of studies addressing these questions; however, the focus predominantly lies in comparing different algorithms within specific platforms and contexts, lacking a comprehensive discourse and summary on the utilization of machine learning in this domain. Thus, based on the data from Web of Science and Scopus databases, this paper conducts a systematic literature review on all aspects of machine learning in internet finance risk in recent years, based on publications trends, geographical distribution, literature focus, machine learning models and algorithms, and evaluations. The research reveals that machine learning, as a nascent technology, whether through basic algorithms or intricate algorithmic combinations, has made significant strides compared to traditional credit scoring methods in predicting accuracy, time efficiency, and robustness in internet finance risk management. Nonetheless, there exist noticeable disparities among different algorithms, and factors such as model structure, sample data, and parameter settings also influence prediction accuracy, although generally, updated algorithms tend to achieve higher accuracy. Consequently, there is no one-size-fits-all approach applicable to all platforms; each platform should enhance its machine learning models and algorithms based on its unique characteristics, data, and the development of AI technology, starting from key evaluation indicators to mitigate internet finance risks.

Matched MeSH terms: Machine Learning*
Stacking with Recursive Feature Elimination-Isolation Forest for classification of diabetes mellitus

Idris NF, Ismail MA, Jaya MIM, Ibrahim AO, Abulfaraj AW, Binzagr F

PLoS One, 2024;19(5):e0302595.
PMID: 38718024 DOI: 10.1371/journal.pone.0302595

Diabetes Mellitus is one of the oldest diseases known to humankind, dating back to ancient Egypt. The disease is a chronic metabolic disorder that heavily burdens healthcare providers worldwide due to the steady increment of patients yearly. Worryingly, diabetes affects not only the aging population but also children. It is prevalent to control this problem, as diabetes can lead to many health complications. As evolution happens, humankind starts integrating computer technology with the healthcare system. The utilization of artificial intelligence assists healthcare to be more efficient in diagnosing diabetes patients, better healthcare delivery, and more patient eccentric. Among the advanced data mining techniques in artificial intelligence, stacking is among the most prominent methods applied in the diabetes domain. Hence, this study opts to investigate the potential of stacking ensembles. The aim of this study is to reduce the high complexity inherent in stacking, as this problem contributes to longer training time and reduces the outliers in the diabetes data to improve the classification performance. In addressing this concern, a novel machine learning method called the Stacking Recursive Feature Elimination-Isolation Forest was introduced for diabetes prediction. The application of stacking with Recursive Feature Elimination is to design an efficient model for diabetes diagnosis while using fewer features as resources. This method also incorporates the utilization of Isolation Forest as an outlier removal method. The study uses accuracy, precision, recall, F1 measure, training time, and standard deviation metrics to identify the classification performances. The proposed method acquired an accuracy of 79.077% for PIMA Indians Diabetes and 97.446% for the Diabetes Prediction dataset, outperforming many existing methods and demonstrating effectiveness in the diabetes domain.

Matched MeSH terms: Machine Learning*
Fulltext Review of the State of the Art of Deep Learning for Plant Diseases: A Broad Analysis and Discussion

Hasan RI, Yusuf SM, Alzubaidi L

Plants (Basel), 2020 Oct 01;9(10).
PMID: 33019765 DOI: 10.3390/plants9101302

Deep learning (DL) represents the golden era in the machine learning (ML) domain, and it has gradually become the leading approach in many fields. It is currently playing a vital role in the early detection and classification of plant diseases. The use of ML techniques in this field is viewed as having brought considerable improvement in cultivation productivity sectors, particularly with the recent emergence of DL, which seems to have increased accuracy levels. Recently, many DL architectures have been implemented accompanying visualisation techniques that are essential for determining symptoms and classifying plant diseases. This review investigates and analyses the most recent methods, developed over three years leading up to 2020, for training, augmentation, feature fusion and extraction, recognising and counting crops, and detecting plant diseases, including how these methods can be harnessed to feed deep classifiers and their effects on classifier accuracy.

Matched MeSH terms: Machine Learning
Fulltext Using SVMs for classification of cross-document relationships

Kumar, Yogan Jaya, Naomie Salim, Ahmed Hamza Osman, Abuobieda, Albaraa

Pertanika Journal of Science & Technology, 2013;21(1):239-246.
MyJurnal

Cross-document Structure Theory (CST) has recently been proposed to facilitate tasks related to multidocument analysis. Classifying and identifying the CST relationships between sentences across topically related documents have since been proven as necessary. However, there have not been sufficient studies presented in literature to automatically identify these CST relationships. In this study, a supervised machine learning technique, i.e. Support Vector Machines (SVMs), was applied to identify four types of CST relationships, namely “Identity”, “Overlap”, “Subsumption”, and “Description” on the datasets obtained from CSTBank corpus. The performance of the SVMs classification was measured using Precision, Recall and F-measure. In addition, the results obtained using SVMs were also compared with those from the previous literature using boosting classification algorithm. It was found that SVMs yielded better results in classifying the four CST relationships.

Matched MeSH terms: Supervised Machine Learning
Fulltext Empirical investigation of feature sets effectiveness in product review sentiment classification

Nurfadhlina Mohd Sharef, Rozilah Rosli

Pertanika Journal of Science & Technology, 2017;25(106):125-132.
MyJurnal

Sentiment analysis classification has been typically performed by combining features that represent the dataset at hand. Existing works have employed various features individually such as the syntactical, lexical and machine learning, and some have hybridized to reach optimistic results. Since the debate on the best combination is still unresolved this paper addresses the empirical investigation of the combination of features for product review classification. Results indicate the Support Vector Machine classification model combined with any of the observed lexicon namely MPQA, BingLiu and General Inquirer and either the unigram or inte-gration of unigram and bigram features is the top performer.

Matched MeSH terms: Machine Learning

Filters

Please provide feedback to Administrator (afdal@afpm.org.my)

External Links