MyMedR

Displaying publications 1 - 20 of 289 in total

Abstract:

Sort:

Fulltext Earthquake multi-classification detection based velocity and displacement data filtering using machine learning algorithms

Murti MA, Junior R, Ahmed AN, Elshafie A

Sci Rep, 2022 Dec 08;12(1):21200.
PMID: 36482200 DOI: 10.1038/s41598-022-25098-1

Earthquake is one of the natural disasters that have a big impact on society. Currently, there are many studies on earthquake detection. However, the vibrations that were detected by sensors were not only vibrations caused by the earthquake, but also other vibrations. Therefore, this study proposed an earthquake multi-classification detection with machine learning algorithms that can distinguish earthquake and non-earthquake, and vandalism vibration using acceleration seismic waves. In addition, velocity and displacement as integration products of acceleration have been considered additional features to improve the performances of machine learning algorithms. Several machine learning algorithms such as Support Vector Machine (SVM), Random Forest (RF), Decision Tree (DT), and Artificial Neural Network (ANN) have been used to develop the best algorithm for earthquake multi-classification detection. The results of this study indicate that the ANN algorithm is the best algorithm to distinguish between earthquake and non-earthquake, and vandalism vibrations. Moreover, it's also more resistant to various input features. Furthermore, using velocity and displacement as additional features has been proven to increase the performance of every model.

Matched MeSH terms: Machine Learning*
Fulltext A Heterogeneous Ensemble Approach for Travel Time Prediction Using Hybridized Feature Spaces and Support Vector Regression

Chughtai JU, Haq IU, Islam SU, Gani A

Sensors (Basel), 2022 Dec 12;22(24).
PMID: 36560104 DOI: 10.3390/s22249735

Travel time prediction is essential to intelligent transportation systems directly affecting smart cities and autonomous vehicles. Accurately predicting traffic based on heterogeneous factors is highly beneficial but remains a challenging problem. The literature shows significant performance improvements when traditional machine learning and deep learning models are combined using an ensemble learning approach. This research mainly contributes by proposing an ensemble learning model based on hybridized feature spaces obtained from a bidirectional long short-term memory module and a bidirectional gated recurrent unit, followed by support vector regression to produce the final travel time prediction. The proposed approach consists of three stages-initially, six state-of-the-art deep learning models are applied to traffic data obtained from sensors. Then the feature spaces and decision scores (outputs) of the model with the highest performance are fused to obtain hybridized deep feature spaces. Finally, a support vector regressor is applied to the hybridized feature spaces to get the final travel time prediction. The performance of our proposed heterogeneous ensemble using test data showed significant improvements compared to the baseline techniques in terms of the root mean square error (53.87±3.50), mean absolute error (12.22±1.35) and the coefficient of determination (0.99784±0.00019). The results demonstrated that the hybridized deep feature space concept could produce more stable and superior results than the other baseline techniques.

Matched MeSH terms: Machine Learning*
Review on machine learning-based bioprocess optimization, monitoring, and control systems

Mondal PP, Galodha A, Verma VK, Singh V, Show PL, Awasthi MK, et al.

Bioresour Technol, 2023 Feb;370:128523.
PMID: 36565820 DOI: 10.1016/j.biortech.2022.128523

Machine Learning is quickly becoming an impending game changer for transforming big data thrust from the bioprocessing industry into actionable output. However, the complex data set from bioprocess, lagging cyber-integrated sensor system, and issues with storage scalability limit machine learning real-time application. Hence, it is imperative to know the state of technology to address prevailing issues. This review first gives an insight into the basic understanding of the machine learning domain and discusses its complexities for more comprehensive applications. Followed by an outline of how relevant machine learning models are for statistical and logical analysis of the enormous datasets generated to control bioprocess operations. Then this review critically discusses the current knowledge, its limitations, and future aspects in different subfields of the bioprocessing industry. Further, this review discusses the prospects of adopting a hybrid method to dovetail different modeling strategies, cyber-networking, and integrated sensors to develop new digital biotechnologies.

Matched MeSH terms: Machine Learning*
Application of machine learning on understanding biomolecule interactions in cellular machinery

Dixit R, Khambhati K, Supraja KV, Singh V, Lederer F, Show PL, et al.

Bioresour Technol, 2023 Feb;370:128522.
PMID: 36565819 DOI: 10.1016/j.biortech.2022.128522

Machine learning (ML) applications have become ubiquitous in all fields of research including protein science and engineering. Apart from protein structure and mutation prediction, scientists are focusing on knowledge gaps with respect to the molecular mechanisms involved in protein binding and interactions with other components in the experimental setups or the human body. Researchers are working on several wet-lab techniques and generating data for a better understanding of concepts and mechanics involved. The information like biomolecular structure, binding affinities, structure fluctuations and movements are enormous which can be handled and analyzed by ML. Therefore, this review highlights the significance of ML in understanding the biomolecular interactions while assisting in various fields of research such as drug discovery, nanomedicine, nanotoxicity and material science. Hence, the way ahead would be to force hand-in hand of laboratory work and computational techniques.

Matched MeSH terms: Machine Learning*
Fulltext BOO-ST and CBCEC: two novel hybrid machine learning methods aim to reduce the mortality of heart failure patients

Sutradhar A, Al Rafi M, Shamrat FMJM, Ghosh P, Das S, Islam MA, et al.

Sci Rep, 2023 Dec 18;13(1):22874.
PMID: 38129433 DOI: 10.1038/s41598-023-48486-7

Heart failure (HF) is a leading cause of mortality worldwide. Machine learning (ML) approaches have shown potential as an early detection tool for improving patient outcomes. Enhancing the effectiveness and clinical applicability of the ML model necessitates training an efficient classifier with a diverse set of high-quality datasets. Hence, we proposed two novel hybrid ML methods ((a) consisting of Boosting, SMOTE, and Tomek links (BOO-ST); (b) combining the best-performing conventional classifier with ensemble classifiers (CBCEC)) to serve as an efficient early warning system for HF mortality. The BOO-ST was introduced to tackle the challenge of class imbalance, while CBCEC was responsible for training the processed and selected features derived from the Feature Importance (FI) and Information Gain (IG) feature selection techniques. We also conducted an explicit and intuitive comprehension to explore the impact of potential characteristics correlating with the fatality cases of HF. The experimental results demonstrated the proposed classifier CBCEC showcases a significant accuracy of 93.67% in terms of providing the early forecasting of HF mortality. Therefore, we can reveal that our proposed aspects (BOO-ST and CBCEC) can be able to play a crucial role in preventing the death rate of HF and reducing stress in the healthcare sector.

Matched MeSH terms: Machine Learning*
Fulltext An intelligent decision support system for crop yield prediction using hybrid machine learning algorithms

Anbananthen KSM, Subbiah S, Chelliah D, Sivakumar P, Somasundaram V, Velshankar KH, et al.

F1000Res, 2021;10:1143.
PMID: 34987773 DOI: 10.12688/f1000research.73009.1

Background: In recent times, digitization is gaining importance in different domains of knowledge such as agriculture, medicine, recommendation platforms, the Internet of Things (IoT), and weather forecasting. In agriculture, crop yield estimation is essential for improving productivity and decision-making processes such as financial market forecasting, and addressing food security issues. The main objective of the article is to predict and improve the accuracy of crop yield forecasting using hybrid machine learning (ML) algorithms. Methods: This article proposes hybrid ML algorithms that use specialized ensembling methods such as stacked generalization, gradient boosting, random forest, and least absolute shrinkage and selection operator (LASSO) regression. Stacked generalization is a new model which learns how to best combine the predictions from two or more models trained on the dataset. To demonstrate the applications of the proposed algorithm, aerial-intel datasets from the github data science repository are used. Results: Based on the experimental results done on the agricultural data, the following observations have been made. The performance of the individual algorithm and hybrid ML algorithms are compared using cross-validation to identify the most promising performers for the agricultural dataset. The accuracy of random forest regressor, gradient boosted tree regression, and stacked generalization ensemble methods are 87.71%, 86.98%, and 88.89% respectively. Conclusions: The proposed stacked generalization ML algorithm statistically outperforms with an accuracy of 88.89% and hence demonstrates that the proposed approach is an effective algorithm for predicting crop yield. The system also gives fast and accurate responses to the farmers.

Matched MeSH terms: Machine Learning*
Fulltext Credit card fraud detection using a hierarchical behavior-knowledge space model

Nandi AK, Randhawa KK, Chua HS, Seera M, Lim CP

PLoS One, 2022;17(1):e0260579.
PMID: 35051184 DOI: 10.1371/journal.pone.0260579

With the advancement in machine learning, researchers continue to devise and implement effective intelligent methods for fraud detection in the financial sector. Indeed, credit card fraud leads to billions of dollars in losses for merchants every year. In this paper, a multi-classifier framework is designed to address the challenges of credit card fraud detections. An ensemble model with multiple machine learning classification algorithms is designed, in which the Behavior-Knowledge Space (BKS) is leveraged to combine the predictions from multiple classifiers. To ascertain the effectiveness of the developed ensemble model, publicly available data sets as well as real financial records are employed for performance evaluations. Through statistical tests, the results positively indicate the effectiveness of the developed model as compared with the commonly used majority voting method for combination of predictions from multiple classifiers in tackling noisy data classification as well as credit card fraud detection problems.

Matched MeSH terms: Machine Learning*
Morphometric dataset of Varanus salvator for non-invasive sex identification using machine learning

Alymann AA, Alymann IA, Ong SQ, Rusli MU, Ahmad AH, Salim H

Sci Data, 2024 Apr 05;11(1):337.
PMID: 38580692 DOI: 10.1038/s41597-024-03172-9

Reliable sex identification in Varanus salvator traditionally relied on invasive methods like genetic analysis or dissection, as less invasive techniques such as hemipenes inversion are unreliable. Given the ecological importance of this species and skewed sex ratios in disturbed habitats, a dataset that allows ecologists or zoologists to study the sex determination of the lizard is crucial. We present a new dataset containing morphometric measurements of V. salvator individuals from the skin trade, with sex confirmed by dissection post- measurement. The dataset consists of a mixture of primary and secondary data such as weight, skull size, tail length, condition etc. and can be used in modelling studies for ecological and conservation research to monitor the sex ratio of this species. Validity was demonstrated by training and testing six machine learning models. This dataset has the potential to streamline sex determination, offering a non-invasive alternative to complement existing methods in V. salvator research, mitigating the need for invasive procedures.

Matched MeSH terms: Machine Learning
Fulltext Deep-Learning-Based Approach for Iraqi and Malaysian Vehicle License Plate Recognition

Habeeb D, Noman F, Alkahtani AA, Alsariera YA, Alkawsi G, Fazea Y, et al.

Comput Intell Neurosci, 2021;2021:3971834.
PMID: 34782832 DOI: 10.1155/2021/3971834

Recognizing vehicle plate numbers is a key step towards implementing the legislation on traffic and reducing the number of daily traffic accidents. Although machine learning has advanced considerably, the recognition of license plates remains an obstacle, particularly in countries whose plate numbers are written in different languages or blended with Latin alphabets. This paper introduces a recognition system for Arabic and Latin alphabet license plates using a deep-learning-based approach in conjugation with data collected from two specific countries: Iraq and Malaysia. The system under study is proposed to detect, segment, and recognize vehicle plate numbers. Moreover, Iraqi and Malaysian plates were used to compare these processes. A total of 404 Iraqi images and 681 Malaysian images were tested and used for the proposed techniques. The evaluation took place under various atmospheric environments, including fog, different contrasts, dirt, different colours, and distortion problems. The proposed approach showed an average recognition rate of 85.56% and 88.86% on Iraqi and Malaysian datasets, respectively. Thus, this evidences that the deep-learning-based method outperforms other state-of-the-art methods as it can successfully detect plate numbers regardless of the deterioration level of image quality.

Matched MeSH terms: Machine Learning
Cardiovascular complications in a diabetes prediction model using machine learning: a systematic review

Kee OT, Harun H, Mustafa N, Abdul Murad NA, Chin SF, Jaafar R, et al.

Cardiovasc Diabetol, 2023 Jan 19;22(1):13.
PMID: 36658644 DOI: 10.1186/s12933-023-01741-7

Prediction model has been the focus of studies since the last century in the diagnosis and prognosis of various diseases. With the advancement in computational technology, machine learning (ML) has become the widely used tool to develop a prediction model. This review is to investigate the current development of prediction model for the risk of cardiovascular disease (CVD) among type 2 diabetes (T2DM) patients using machine learning. A systematic search on Scopus and Web of Science (WoS) was conducted to look for relevant articles based on the research question. The risk of bias (ROB) for all articles were assessed based on the Prediction model Risk of Bias Assessment Tool (PROBAST) statement. Neural network with 76.6% precision, 88.06% sensitivity, and area under the curve (AUC) of 0.91 was found to be the most reliable algorithm in developing prediction model for cardiovascular disease among type 2 diabetes patients. The overall concern of applicability of all included studies is low. While two out of 10 studies were shown to have high ROB, another studies ROB are unknown due to the lack of information. The adherence to reporting standards was conducted based on the Transparent Reporting of a multivariable prediction model for Individual Prognosis or Diagnosis (TRIPOD) standard where the overall score is 53.75%. It is highly recommended that future model development should adhere to the PROBAST and TRIPOD assessment to reduce the risk of bias and ensure its applicability in clinical settings. Potential lipid peroxidation marker is also recommended in future cardiovascular disease prediction model to improve overall model applicability.

Matched MeSH terms: Machine Learning
Relating molecular descriptors to frontier orbital energy levels, singlet and triplet excited states of fused tricyclics using machine learning

Woon KL, Chong ZX, Ariffin A, Chan CS

J Mol Graph Model, 2021 06;105:107891.
PMID: 33765526 DOI: 10.1016/j.jmgm.2021.107891

Fused tricyclic organic compounds are an important class of organic electronic materials. In designing molecules for organic electronics, knowing what chemical structure that be used to tune the molecular property is one of the keys that can help to improve the material performance. In this research, we applied machine learning and data analytic approaches in addressing this problem. The energy states (Lowest Unoccupied Molecular Orbital (HOMO), Highest Occupied Molecular Orbitals (LUMO), singlet (Es) and triplet (ET) energy) of more than 10 thousand fused tricyclics are calculated. Corresponding descriptors are also generated. We find that the Coulomb matrix is a poorer descriptor than high-level descriptors in a multilayer perceptron neural network. Correlations as high as 0.95 is obtained using a multilayer perceptron neural network with Mean Absolute Error as low as 0.08 eV. The descriptors that are important in tuning the energy levels are revealed using the Random Forest algorithm. Correlations of such descriptors are also plotted. We found that the higher the number of tertiary amines, the deeper are the HOMO and LUMO levels. The presence of NN in the aromatic rings can be used to tune the ES. However, there is no single dominant descriptor that can be correlated with the ET. A collection of descriptors is found to give a far better correlation with ET. This research demonstrated that machine learning and data analytics in guiding how certain chemical substructures correlate with the molecule energy states.

Matched MeSH terms: Machine Learning*
Fulltext Spoken language identification based on the enhanced self-adjusting extreme learning machine approach

Albadr MAA, Tiun S, Al-Dhief FT, Sammour MAM

PLoS One, 2018;13(4):e0194770.
PMID: 29672546 DOI: 10.1371/journal.pone.0194770

Spoken Language Identification (LID) is the process of determining and classifying natural language from a given content and dataset. Typically, data must be processed to extract useful features to perform LID. The extracting features for LID, based on literature, is a mature process where the standard features for LID have already been developed using Mel-Frequency Cepstral Coefficients (MFCC), Shifted Delta Cepstral (SDC), the Gaussian Mixture Model (GMM) and ending with the i-vector based framework. However, the process of learning based on extract features remains to be improved (i.e. optimised) to capture all embedded knowledge on the extracted features. The Extreme Learning Machine (ELM) is an effective learning model used to perform classification and regression analysis and is extremely useful to train a single hidden layer neural network. Nevertheless, the learning process of this model is not entirely effective (i.e. optimised) due to the random selection of weights within the input hidden layer. In this study, the ELM is selected as a learning model for LID based on standard feature extraction. One of the optimisation approaches of ELM, the Self-Adjusting Extreme Learning Machine (SA-ELM) is selected as the benchmark and improved by altering the selection phase of the optimisation process. The selection process is performed incorporating both the Split-Ratio and K-Tournament methods, the improved SA-ELM is named Enhanced Self-Adjusting Extreme Learning Machine (ESA-ELM). The results are generated based on LID with the datasets created from eight different languages. The results of the study showed excellent superiority relating to the performance of the Enhanced Self-Adjusting Extreme Learning Machine LID (ESA-ELM LID) compared with the SA-ELM LID, with ESA-ELM LID achieving an accuracy of 96.25%, as compared to the accuracy of SA-ELM LID of only 95.00%.

Matched MeSH terms: Machine Learning*
Fulltext A soft artificial muscle driven robot with reinforcement learning

Yang T, Xiao Y, Zhang Z, Liang Y, Li G, Zhang M, et al.

Sci Rep, 2018 09 28;8(1):14518.
PMID: 30266999 DOI: 10.1038/s41598-018-32757-9

Soft robots driven by stimuli-responsive materials have their own unique advantages over traditional rigid robots such as large actuation, light weight, good flexibility and biocompatibility. However, the large actuation of soft robots inherently co-exists with difficulty in control with high precision. This article presents a soft artificial muscle driven robot mimicking cuttlefish with a fully integrated on-board system including power supply and wireless communication system. Without any motors, the movements of the cuttlefish robot are solely actuated by dielectric elastomer which exhibits muscle-like properties including large deformation and high energy density. Reinforcement learning is used to optimize the control strategy of the cuttlefish robot instead of manual adjustment. From scratch, the swimming speed of the robot is enhanced by 91% with reinforcement learning, reaching to 21 mm/s (0.38 body length per second). The design principle behind the structure and the control of the robot can be potentially useful in guiding device designs for demanding applications such as flexible devices and soft robots.

Matched MeSH terms: Machine Learning*
Fulltext Incremental Learning of Human Activities in Smart Homes

Chua SL, Foo LK, Guesgen HW, Marsland S

Sensors (Basel), 2022 Nov 03;22(21).
PMID: 36366154 DOI: 10.3390/s22218458

Sensor-based human activity recognition has been extensively studied. Systems learn from a set of training samples to classify actions into a pre-defined set of ground truth activities. However, human behaviours vary over time, and so a recognition system should ideally be able to continuously learn and adapt, while retaining the knowledge of previously learned activities, and without failing to highlight novel, and therefore potentially risky, behaviours. In this paper, we propose a method based on compression that can incrementally learn new behaviours, while retaining prior knowledge. Evaluation was conducted on three publicly available smart home datasets.

Matched MeSH terms: Machine Learning*
Fulltext Convolutional Neural Network Model Based on 2D Fingerprint for Bioactivity Prediction

Hentabli H, Bengherbia B, Saeed F, Salim N, Nafea I, Toubal A, et al.

Int J Mol Sci, 2022 Oct 30;23(21).
PMID: 36362018 DOI: 10.3390/ijms232113230

Determining and modeling the possible behaviour and actions of molecules requires investigating the basic structural features and physicochemical properties that determine their behaviour during chemical, physical, biological, and environmental processes. Computational approaches such as machine learning methods are alternatives to predicting the physiochemical properties of molecules based on their structures. However, the limited accuracy and high error rates of such predictions restrict their use. In this paper, a novel technique based on a deep learning convolutional neural network (CNN) for the prediction of chemical compounds' bioactivity is proposed and developed. The molecules are represented in the new matrix format Mol2mat, a molecular matrix representation adapted from the well-known 2D-fingerprint descriptors. To evaluate the performance of the proposed methods, a series of experiments were conducted using two standard datasets, namely the MDL Drug Data Report (MDDR) and Sutherland, datasets comprising 10 homogeneous and 14 heterogeneous activity classes. After analysing the eight fingerprints, all the probable combinations were investigated using the five best descriptors. The results showed that a combination of three fingerprints, ECFP4, EPFP4, and ECFC4, along with a CNN activity prediction process, achieved the highest performance of 98% AUC when compared to the state-of-the-art ML algorithms NaiveB, LSVM, and RBFN.

Matched MeSH terms: Machine Learning*
Fulltext A Novel Feature-Engineered-NGBoost Machine-Learning Framework for Fraud Detection in Electric Power Consumption Data

Hussain S, Mustafa MW, Al-Shqeerat KHA, Saeed F, Al-Rimy BAS

Sensors (Basel), 2021 Dec 17;21(24).
PMID: 34960516 DOI: 10.3390/s21248423

This study presents a novel feature-engineered-natural gradient descent ensemble-boosting (NGBoost) machine-learning framework for detecting fraud in power consumption data. The proposed framework was sequentially executed in three stages: data pre-processing, feature engineering, and model evaluation. It utilized the random forest algorithm-based imputation technique initially to impute the missing data entries in the acquired smart meter dataset. In the second phase, the majority weighted minority oversampling technique (MWMOTE) algorithm was used to avoid an unequal distribution of data samples among different classes. The time-series feature-extraction library and whale optimization algorithm were utilized to extract and select the most relevant features from the kWh reading of consumers. Once the most relevant features were acquired, the model training and testing process was initiated by using the NGBoost algorithm to classify the consumers into two distinct categories ("Healthy" and "Theft"). Finally, each input feature's impact (positive or negative) in predicting the target variable was recognized with the tree SHAP additive-explanations algorithm. The proposed framework achieved an accuracy of 93%, recall of 91%, and precision of 95%, which was greater than all the competing models, and thus validated its efficacy and significance in the studied field of research.

Matched MeSH terms: Machine Learning*
Fulltext A novel framework for addressing uncertainties in machine learning-based geospatial approaches for flood prediction

Adnan MSG, Siam ZS, Kabir I, Kabir Z, Ahmed MR, Hassan QK, et al.

J Environ Manage, 2023 Jan 15;326(Pt B):116813.
PMID: 36435143 DOI: 10.1016/j.jenvman.2022.116813

Globally, many studies on machine learning (ML)-based flood susceptibility modeling have been carried out in recent years. While majority of those models produce reasonably accurate flood predictions, the outcomes are subject to uncertainty since flood susceptibility models (FSMs) may produce varying spatial predictions. However, there have not been many attempts to address these uncertainties because identifying spatial agreement in flood projections is a complex process. This study presents a framework for reducing spatial disagreement among four standalone and hybridized ML-based FSMs: random forest (RF), k-nearest neighbor (KNN), multilayer perceptron (MLP), and hybridized genetic algorithm-gaussian radial basis function-support vector regression (GA-RBF-SVR). Besides, an optimized model was developed combining the outcomes of those four models. The southwest coastal region of Bangladesh was selected as the case area. A comparable percentage of flood potential area (approximately 60% of the total land areas) was produced by all ML-based models. Despite achieving high prediction accuracy, spatial discrepancy in the model outcomes was observed, with pixel-wise correlation coefficients across different models ranging from 0.62 to 0.91. The optimized model exhibited high prediction accuracy and improved spatial agreement by reducing the number of classification errors. The framework presented in this study might aid in the formulation of risk-based development plans and enhancement of current early warning systems.

Matched MeSH terms: Machine Learning*
Fulltext Machine Learning-Based Anomaly Detection in NFV: A Comprehensive Survey

Zehra S, Faseeha U, Syed HJ, Samad F, Ibrahim AO, Abulfaraj AW, et al.

Sensors (Basel), 2023 Jun 05;23(11).
PMID: 37300067 DOI: 10.3390/s23115340

Network function virtualization (NFV) is a rapidly growing technology that enables the virtualization of traditional network hardware components, offering benefits such as cost reduction, increased flexibility, and efficient resource utilization. Moreover, NFV plays a crucial role in sensor and IoT networks by ensuring optimal resource usage and effective network management. However, adopting NFV in these networks also brings security challenges that must promptly and effectively address. This survey paper focuses on exploring the security challenges associated with NFV. It proposes the utilization of anomaly detection techniques as a means to mitigate the potential risks of cyber attacks. The research evaluates the strengths and weaknesses of various machine learning-based algorithms for detecting network-based anomalies in NFV networks. By providing insights into the most efficient algorithm for timely and effective anomaly detection in NFV networks, this study aims to assist network administrators and security professionals in enhancing the security of NFV deployments, thus safeguarding the integrity and performance of sensors and IoT systems.

Matched MeSH terms: Machine Learning*
Fulltext Improved accuracy and less fault prediction errors via modified sequential minimal optimization algorithm

Asim Shahid M, Alam MM, Mohd Su'ud M

PLoS One, 2023;18(4):e0284209.
PMID: 37053173 DOI: 10.1371/journal.pone.0284209

The benefits and opportunities offered by cloud computing are among the fastest-growing technologies in the computer industry. Additionally, it addresses the difficulties and issues that make more users more likely to accept and use the technology. The proposed research comprised of machine learning (ML) algorithms is Naïve Bayes (NB), Library Support Vector Machine (LibSVM), Multinomial Logistic Regression (MLR), Sequential Minimal Optimization (SMO), K Nearest Neighbor (KNN), and Random Forest (RF) to compare the classifier gives better results in accuracy and less fault prediction. In this research, the secondary data results (CPU-Mem Mono) give the highest percentage of accuracy and less fault prediction on the NB classifier in terms of 80/20 (77.01%), 70/30 (76.05%), and 5 folds cross-validation (74.88%), and (CPU-Mem Multi) in terms of 80/20 (89.72%), 70/30 (90.28%), and 5 folds cross-validation (92.83%). Furthermore, on (HDD Mono) the SMO classifier gives the highest percentage of accuracy and less fault prediction fault in terms of 80/20 (87.72%), 70/30 (89.41%), and 5 folds cross-validation (88.38%), and (HDD-Multi) in terms of 80/20 (93.64%), 70/30 (90.91%), and 5 folds cross-validation (88.20%). Whereas, primary data results found RF classifier gives the highest percentage of accuracy and less fault prediction in terms of 80/20 (97.14%), 70/30 (96.19%), and 5 folds cross-validation (95.85%) in the primary data results, but the algorithm complexity (0.17 seconds) is not good. In terms of 80/20 (95.71%), 70/30 (95.71%), and 5 folds cross-validation (95.71%), SMO has the second highest accuracy and less fault prediction, but the algorithm complexity is good (0.3 seconds). The difference in accuracy and less fault prediction between RF and SMO is only (.13%), and the difference in time complexity is (14 seconds). We have decided that we will modify SMO. Finally, the Modified Sequential Minimal Optimization (MSMO) Algorithm method has been proposed to get the highest accuracy & less fault prediction errors in terms of 80/20 (96.42%), 70/30 (96.42%), & 5 fold cross validation (96.50%).

Matched MeSH terms: Machine Learning*
Fulltext Predicting dengue transmission rates by comparing different machine learning models with vector indices and meteorological data

Ong SQ, Isawasan P, Ngesom AMM, Shahar H, Lasim AM, Nair G

Sci Rep, 2023 Nov 05;13(1):19129.
PMID: 37926755 DOI: 10.1038/s41598-023-46342-2

Machine learning algorithms (ML) are receiving a lot of attention in the development of predictive models for monitoring dengue transmission rates. Previous work has focused only on specific weather variables and algorithms, and there is still a need for a model that uses more variables and algorithms that have higher performance. In this study, we use vector indices and meteorological data as predictors to develop the ML models. We trained and validated seven ML algorithms, including an ensemble ML method, and compared their performance using the receiver operating characteristic (ROC) with the area under the curve (AUC), accuracy and F1 score. Our results show that an ensemble ML such as XG Boost, AdaBoost and Random Forest perform better than the logistics regression, Naïve Bayens, decision tree, and support vector machine (SVM), with XGBoost having the highest AUC, accuracy and F1 score. Analysis of the importance of the variables showed that the container index was the least important. By removing this variable, the ML models improved their performance by at least 6% in AUC and F1 score. Our result provides a framework for future studies on the use of predictive models in the development of an early warning system.

Matched MeSH terms: Machine Learning*

Filters

Please provide feedback to Administrator (afdal@afpm.org.my)

External Links