MyMedR

Displaying publications 21 - 40 of 325 in total

Abstract:

Sort:

Fulltext Improved accuracy and less fault prediction errors via modified sequential minimal optimization algorithm

Asim Shahid M, Alam MM, Mohd Su'ud M

PLoS One, 2023;18(4):e0284209.
PMID: 37053173 DOI: 10.1371/journal.pone.0284209

The benefits and opportunities offered by cloud computing are among the fastest-growing technologies in the computer industry. Additionally, it addresses the difficulties and issues that make more users more likely to accept and use the technology. The proposed research comprised of machine learning (ML) algorithms is Naïve Bayes (NB), Library Support Vector Machine (LibSVM), Multinomial Logistic Regression (MLR), Sequential Minimal Optimization (SMO), K Nearest Neighbor (KNN), and Random Forest (RF) to compare the classifier gives better results in accuracy and less fault prediction. In this research, the secondary data results (CPU-Mem Mono) give the highest percentage of accuracy and less fault prediction on the NB classifier in terms of 80/20 (77.01%), 70/30 (76.05%), and 5 folds cross-validation (74.88%), and (CPU-Mem Multi) in terms of 80/20 (89.72%), 70/30 (90.28%), and 5 folds cross-validation (92.83%). Furthermore, on (HDD Mono) the SMO classifier gives the highest percentage of accuracy and less fault prediction fault in terms of 80/20 (87.72%), 70/30 (89.41%), and 5 folds cross-validation (88.38%), and (HDD-Multi) in terms of 80/20 (93.64%), 70/30 (90.91%), and 5 folds cross-validation (88.20%). Whereas, primary data results found RF classifier gives the highest percentage of accuracy and less fault prediction in terms of 80/20 (97.14%), 70/30 (96.19%), and 5 folds cross-validation (95.85%) in the primary data results, but the algorithm complexity (0.17 seconds) is not good. In terms of 80/20 (95.71%), 70/30 (95.71%), and 5 folds cross-validation (95.71%), SMO has the second highest accuracy and less fault prediction, but the algorithm complexity is good (0.3 seconds). The difference in accuracy and less fault prediction between RF and SMO is only (.13%), and the difference in time complexity is (14 seconds). We have decided that we will modify SMO. Finally, the Modified Sequential Minimal Optimization (MSMO) Algorithm method has been proposed to get the highest accuracy & less fault prediction errors in terms of 80/20 (96.42%), 70/30 (96.42%), & 5 fold cross validation (96.50%).

Matched MeSH terms: Machine Learning*
Fulltext An automated 3D modeling pipeline for constructing 3D models of MONOGENEAN HARDPART using machine learning techniques

Teo BG, Dhillon SK

BMC Bioinformatics, 2019 Dec 24;20(Suppl 19):658.
PMID: 31870297 DOI: 10.1186/s12859-019-3210-x

BACKGROUND: Studying structural and functional morphology of small organisms such as monogenean, is difficult due to the lack of visualization in three dimensions. One possible way to resolve this visualization issue is to create digital 3D models which may aid researchers in studying morphology and function of the monogenean. However, the development of 3D models is a tedious procedure as one will have to repeat an entire complicated modelling process for every new target 3D shape using a comprehensive 3D modelling software. This study was designed to develop an alternative 3D modelling approach to build 3D models of monogenean anchors, which can be used to understand these morphological structures in three dimensions. This alternative 3D modelling approach is aimed to avoid repeating the tedious modelling procedure for every single target 3D model from scratch.
RESULT: An automated 3D modeling pipeline empowered by an Artificial Neural Network (ANN) was developed. This automated 3D modelling pipeline enables automated deformation of a generic 3D model of monogenean anchor into another target 3D anchor. The 3D modelling pipeline empowered by ANN has managed to automate the generation of the 8 target 3D models (representing 8 species: Dactylogyrus primaries, Pellucidhaptor merus, Dactylogyrus falcatus, Dactylogyrus vastator, Dactylogyrus pterocleidus, Dactylogyrus falciunguis, Chauhanellus auriculatum and Chauhanellus caelatus) of monogenean anchor from the respective 2D illustrations input without repeating the tedious modelling procedure.
CONCLUSIONS: Despite some constraints and limitation, the automated 3D modelling pipeline developed in this study has demonstrated a working idea of application of machine learning approach in a 3D modelling work. This study has not only developed an automated 3D modelling pipeline but also has demonstrated a cross-disciplinary research design that integrates machine learning into a specific domain of study such as 3D modelling of the biological structures.

Matched MeSH terms: Machine Learning*
Fulltext Predicting dengue transmission rates by comparing different machine learning models with vector indices and meteorological data

Ong SQ, Isawasan P, Ngesom AMM, Shahar H, Lasim AM, Nair G

Sci Rep, 2023 Nov 05;13(1):19129.
PMID: 37926755 DOI: 10.1038/s41598-023-46342-2

Machine learning algorithms (ML) are receiving a lot of attention in the development of predictive models for monitoring dengue transmission rates. Previous work has focused only on specific weather variables and algorithms, and there is still a need for a model that uses more variables and algorithms that have higher performance. In this study, we use vector indices and meteorological data as predictors to develop the ML models. We trained and validated seven ML algorithms, including an ensemble ML method, and compared their performance using the receiver operating characteristic (ROC) with the area under the curve (AUC), accuracy and F1 score. Our results show that an ensemble ML such as XG Boost, AdaBoost and Random Forest perform better than the logistics regression, Naïve Bayens, decision tree, and support vector machine (SVM), with XGBoost having the highest AUC, accuracy and F1 score. Analysis of the importance of the variables showed that the container index was the least important. By removing this variable, the ML models improved their performance by at least 6% in AUC and F1 score. Our result provides a framework for future studies on the use of predictive models in the development of an early warning system.

Matched MeSH terms: Machine Learning*
Fulltext Predicting software reuse using machine learning techniques-A case study on open-source Java software systems

Yeow MYH, Chong CY, Lim MK, Yee Yen Y

PLoS One, 2025;20(2):e0314512.
PMID: 39946354 DOI: 10.1371/journal.pone.0314512

Software reuse is an essential practice to increase efficiency and reduce costs in software production. Software reuse practices range from reusing artifacts, libraries, components, packages, and APIs. Identifying suitable software for reuse requires pinpointing potential candidates. However, there are no objective methods in place to measure software reuse. This makes it challenging to identify highly reusable software. Software reuse research mainly addresses two hurdles: 1) identifying reusable candidates effectively and efficiently, and 2) selecting high-quality software components that improve maintainability and extensibility. This paper proposes automating software reuse prediction by leveraging machine learning (ML) algorithms, enabling future research and practitioners to better identify highly reusable software. Our approach uses cross-project code clone detection to establish the ground truth for software reuse, identifying code clones across popular GitHub projects as indicators of potential reuse candidates. Software metrics were extracted from Maven artifacts and used to train classification and regression models to predict and estimate software reuse. The average F1-score of the ML classification models is 77.19%. The best-performing model, Ridge Regression, achieved an F1-score of 79.17%. Additionally, this research aims to assist developers by identifying key metrics that significantly impact software reuse. Our findings suggest that the file-level PUA (Public Undocumented API) metric is the most important factor influencing software reuse. We also present suitable value ranges for the top five important metrics that developers can follow to create highly reusable software. Furthermore, we developed a tool that utilizes the trained models to predict the reuse potential of existing GitHub projects and rank Maven artifacts by their domain.

Matched MeSH terms: Machine Learning*
Fulltext Dynamic Learning Framework for Smooth-Aided Machine-Learning-Based Backbone Traffic Forecasts

Hassan MK, Syed Ariffin SH, Ghazali NE, Hamad M, Hamdan M, Hamdi M, et al.

Sensors (Basel), 2022 May 09;22(9).
PMID: 35591282 DOI: 10.3390/s22093592

Recently, there has been an increasing need for new applications and services such as big data, blockchains, vehicle-to-everything (V2X), the Internet of things, 5G, and beyond. Therefore, to maintain quality of service (QoS), accurate network resource planning and forecasting are essential steps for resource allocation. This study proposes a reliable hybrid dynamic bandwidth slice forecasting framework that combines the long short-term memory (LSTM) neural network and local smoothing methods to improve the network forecasting model. Moreover, the proposed framework can dynamically react to all the changes occurring in the data series. Backbone traffic was used to validate the proposed method. As a result, the forecasting accuracy improved significantly with the proposed framework and with minimal data loss from the smoothing process. The results showed that the hybrid moving average LSTM (MLSTM) achieved the most remarkable improvement in the training and testing forecasts, with 28% and 24% for long-term evolution (LTE) time series and with 35% and 32% for the multiprotocol label switching (MPLS) time series, respectively, while robust locally weighted scatter plot smoothing and LSTM (RLWLSTM) achieved the most significant improvement for upstream traffic with 45%; moreover, the dynamic learning framework achieved improvement percentages that can reach up to 100%.

Matched MeSH terms: Machine Learning*
Fulltext Investigation of alpha-glucosidase inhibition activity of Artabotrys sumatranus leaf extract using metabolomics, machine learning and molecular docking analysis

Rosa D, Elya B, Hanafi M, Khatib A, Budiarto E, Nur S, et al.

PLoS One, 2025;20(1):e0313592.
PMID: 39752479 DOI: 10.1371/journal.pone.0313592

One way to treat diabetes mellitus type II is by using α-glucosidase inhibitor, that will slow down the postprandial glucose intake. Metabolomics analysis of Artabotrys sumatranus leaf extract was used in this research to predict the active compounds as α-glucosidase inhibitors from this extract. Both multivariate statistical analysis and machine learning approaches were used to improve the confidence of the predictions. After performance comparisons with other machine learning methods, random forest was chosen to make predictive model for the activity of the extract samples. Feature importance analysis (using random feature permutation and Shapley score calculation) was used to identify the predicted active compound as the important features that influenced the activity prediction of the extract samples. The combined analysis of multivariate statistical analysis and machine learning predicted 9 active compounds, where 6 of them were identified as mangiferin, neomangiferin, norisocorydine, apigenin-7-O-galactopyranoside, lirioferine, and 15,16-dihydrotanshinone I. The activities of norisocorydine, apigenin-7-O-galactopyranoside, and lirioferine as α-glucosidase inhibitors have not yet reported before. Molecular docking simulation, both to 3A4A (α-glucosidase enzyme from Saccharomyces cerevisiae, usually used in bioassay test) and 3TOP (a part of α-glucosidase enzyme in human gut) showed strong to very strong binding of the identified predicted active compounds to both receptors, with exception of neomangiferin which only showed strong binding to 3TOP receptor. Isolation based on bioassay guided fractionation further verified the metabolomics prediction by succeeding to isolate mangiferin from the extract, which showed strong α-glucosidase activity when subjected to bioassay test. The correlation analysis also showed a possibility of 3 groups in the predicted active compounds, which might be related to the biosynthesis pathway (need further research for verification). Another result from correlation analysis was that in general the α-glucosidase inhibition activity in the extract had strong correlation to antioxidant activity, which was also reflected in the predicted active compounds. Only one predicted compound had very low positive correlation to antioxidant activity.

Matched MeSH terms: Machine Learning*
Fulltext Review of the State of the Art of Deep Learning for Plant Diseases: A Broad Analysis and Discussion

Hasan RI, Yusuf SM, Alzubaidi L

Plants (Basel), 2020 Oct 01;9(10).
PMID: 33019765 DOI: 10.3390/plants9101302

Deep learning (DL) represents the golden era in the machine learning (ML) domain, and it has gradually become the leading approach in many fields. It is currently playing a vital role in the early detection and classification of plant diseases. The use of ML techniques in this field is viewed as having brought considerable improvement in cultivation productivity sectors, particularly with the recent emergence of DL, which seems to have increased accuracy levels. Recently, many DL architectures have been implemented accompanying visualisation techniques that are essential for determining symptoms and classifying plant diseases. This review investigates and analyses the most recent methods, developed over three years leading up to 2020, for training, augmentation, feature fusion and extraction, recognising and counting crops, and detecting plant diseases, including how these methods can be harnessed to feed deep classifiers and their effects on classifier accuracy.

Matched MeSH terms: Machine Learning
Fulltext A Comprehensive Study of Artificial Intelligence and Machine Learning Approaches in Confronting the Coronavirus (COVID-19) Pandemic

Rahman MM, Khatun F, Uzzaman A, Sami SI, Bhuiyan MA, Kiong TS

Int J Health Serv, 2021 10;51(4):446-461.
PMID: 33999732 DOI: 10.1177/00207314211017469

The novel coronavirus disease (COVID-19) has spread over 219 countries of the globe as a pandemic, creating alarming impacts on health care, socioeconomic environments, and international relationships. The principal objective of the study is to provide the current technological aspects of artificial intelligence (AI) and other relevant technologies and their implications for confronting COVID-19 and preventing the pandemic's dreadful effects. This article presents AI approaches that have significant contributions in the fields of health care, then highlights and categorizes their applications in confronting COVID-19, such as detection and diagnosis, data analysis and treatment procedures, research and drug development, social control and services, and the prediction of outbreaks. The study addresses the link between the technologies and the epidemics as well as the potential impacts of technology in health care with the introduction of machine learning and natural language processing tools. It is expected that this comprehensive study will support researchers in modeling health care systems and drive further studies in advanced technologies. Finally, we propose future directions in research and conclude that persuasive AI strategies, probabilistic models, and supervised learning are required to tackle future pandemic challenges.

Matched MeSH terms: Machine Learning
Fulltext Tracking-learning-detection using extreme learning machine with Haar-like features

Melisa Anak Adeh, Mohd Ibrahim Shapiai, Ayman Maliha, Muhammad Hafiz Md Zaini

Journal of Advanced Research in Applied Sciences and Engineering Technology, 2016;5(2):1-11.
MyJurnal

Nowadays, the applications of tracking moving object are commonly used in various
areas especially in computer vision applications. There are many tracking algorithms
have been introduced and they are divided into three groups which are generative
trackers, discriminative trackers and hybrid trackers. One of the methods is TrackingLearning-Detection
(TLD) framework which is an example of the hybrid trackers where
combination between the generative trackers and the discriminative trackers occur. In
TLD, the detector consists of three stages which are patch variance, ensemble classifier
and KNearest Neighbor classifier. In the second stage, the ensemble classifier depends
on simple pixel comparison hence, it is likely fail to offer a better generalization of the
appearances of the target object in the detection process. In this paper, OnlineSequential
Extreme Learning Machine (OS-ELM) was used to replace the ensemble
classifier in the TLD framework. Besides that, different types of Haar-like features were
used for the feature extraction process instead of using raw pixel value as the features.
The objectives of this study are to improve the classifier in the second stage of detector
in TLD framework by using Haar-like features as an input to the classifier and to get a
more generalized detector in TLD framework by using OS-ELM based detector. The
results showed that the proposed method performs better in Pedestrian 1 in terms of
F-measure and also offers good performance in terms of Precision in four out of six
videos.

Matched MeSH terms: Machine Learning
Automatic eye blink artifact removal for EEG based on a sparse coding technique for assessing major mental disorders

Zafar R, Qayyum A, Mumtaz W

J Integr Neurosci, 2019 Sep 30;18(3):217-229.
PMID: 31601069 DOI: 10.31083/j.jin.2019.03.164

In the electroencephalogram recorded data are often confounded with artifacts, especially in the case of eye blinks. Different methods for artifact detection and removal are discussed in the literature, including automatic detection and removal. Here, an automatic method of eye blink detection and correction is proposed where sparse coding is used for an electroencephalogram dataset. In this method, a hybrid dictionary based on a ridgelet transformation is used to capture prominent features by analyzing independent components extracted from a different number of electroencephalogram channels. In this study, the proposed method has been tested and validated with five different datasets for artifact detection and correction. Results show that the proposed technique is promising as it successfully extracted the exact locations of eye blinking artifacts. The accuracy of the method (automatic detection) is 89.6% which represents a better estimate than that obtained by an extreme machine learning classifier.

Matched MeSH terms: Machine Learning
Fulltext Inline 3D Volumetric Measurement of Moisture Content in Rice Using Regression-Based ML of RF Tomographic Imaging

Almaleeh AA, Zakaria A, Kamarudin LM, Rahiman MHF, Ndzi DL, Ismail I

Sensors (Basel), 2022 Jan 05;22(1).
PMID: 35009947 DOI: 10.3390/s22010405

The moisture content of stored rice is dependent on the surrounding and environmental factors which in turn affect the quality and economic value of the grains. Therefore, the moisture content of grains needs to be measured frequently to ensure that optimum conditions that preserve their quality are maintained. The current state of the art for moisture measurement of rice in a silo is based on grab sampling or relies on single rod sensors placed randomly into the grain. The sensors that are currently used are very localized and are, therefore, unable to provide continuous measurement of the moisture distribution in the silo. To the authors' knowledge, there is no commercially available 3D volumetric measurement system for rice moisture content in a silo. Hence, this paper presents results of work carried out using low-cost wireless devices that can be placed around the silo to measure changes in the moisture content of rice. This paper proposes a novel technique based on radio frequency tomographic imaging using low-cost wireless devices and regression-based machine learning to provide contactless non-destructive 3D volumetric moisture content distribution in stored rice grain. This proposed technique can detect multiple levels of localized moisture distributions in the silo with accuracies greater than or equal to 83.7%, depending on the size and shape of the sample under test. Unlike other approaches proposed in open literature or employed in the sector, the proposed system can be deployed to provide continuous monitoring of the moisture distribution in silos.

Matched MeSH terms: Machine Learning
Fulltext Classification of SARS-CoV-2 and non-SARS-CoV-2 using machine learning algorithms

Singh OP, Vallejo M, El-Badawy IM, Aysha A, Madhanagopal J, Mohd Faudzi AA

Comput Biol Med, 2021 Sep;136:104650.
PMID: 34329865 DOI: 10.1016/j.compbiomed.2021.104650

Due to the continued evolution of the SARS-CoV-2 pandemic, researchers worldwide are working to mitigate, suppress its spread, and better understand it by deploying digital signal processing (DSP) and machine learning approaches. This study presents an alignment-free approach to classify the SARS-CoV-2 using complementary DNA, which is DNA synthesized from the single-stranded RNA virus. Herein, a total of 1582 samples, with different lengths of genome sequences from different regions, were collected from various data sources and divided into a SARS-CoV-2 and a non-SARS-CoV-2 group. We extracted eight biomarkers based on three-base periodicity, using DSP techniques, and ranked those based on a filter-based feature selection. The ranked biomarkers were fed into k-nearest neighbor, support vector machines, decision trees, and random forest classifiers for the classification of SARS-CoV-2 from other coronaviruses. The training dataset was used to test the performance of the classifiers based on accuracy and F-measure via 10-fold cross-validation. Kappa-scores were estimated to check the influence of unbalanced data. Further, 10 × 10 cross-validation paired t-test was utilized to test the best model with unseen data. Random forest was elected as the best model, differentiating the SARS-CoV-2 coronavirus from other coronaviruses and a control a group with an accuracy of 97.4 %, sensitivity of 96.2 %, and specificity of 98.2 %, when tested with unseen samples. Moreover, the proposed algorithm was computationally efficient, taking only 0.31 s to compute the genome biomarkers, outperforming previous studies.

Matched MeSH terms: Machine Learning
Fulltext Clinical Applications of Artificial Intelligence and Machine Learning in Children with Cleft Lip and Palate-A Systematic Review

Huqh MZU, Abdullah JY, Wong LS, Jamayet NB, Alam MK, Rashid QF, et al.

Int J Environ Res Public Health, 2022 Aug 31;19(17).
PMID: 36078576 DOI: 10.3390/ijerph191710860

OBJECTIVE: The objective of this systematic review was (a) to explore the current clinical applications of AI/ML (Artificial intelligence and Machine learning) techniques in diagnosis and treatment prediction in children with CLP (Cleft lip and palate), (b) to create a qualitative summary of results of the studies retrieved.
MATERIALS AND METHODS: An electronic search was carried out using databases such as PubMed, Scopus, and the Web of Science Core Collection. Two reviewers searched the databases separately and concurrently. The initial search was conducted on 6 July 2021. The publishing period was unrestricted; however, the search was limited to articles involving human participants and published in English. Combinations of Medical Subject Headings (MeSH) phrases and free text terms were used as search keywords in each database. The following data was taken from the methods and results sections of the selected papers: The amount of AI training datasets utilized to train the intelligent system, as well as their conditional properties; Unilateral CLP, Bilateral CLP, Unilateral Cleft lip and alveolus, Unilateral cleft lip, Hypernasality, Dental characteristics, and sagittal jaw relationship in children with CLP are among the problems studied.
RESULTS: Based on the predefined search strings with accompanying database keywords, a total of 44 articles were found in Scopus, PubMed, and Web of Science search results. After reading the full articles, 12 papers were included for systematic analysis.
CONCLUSIONS: Artificial intelligence provides an advanced technology that can be employed in AI-enabled computerized programming software for accurate landmark detection, rapid digital cephalometric analysis, clinical decision-making, and treatment prediction. In children with corrected unilateral cleft lip and palate, ML can help detect cephalometric predictors of future need for orthognathic surgery.

Matched MeSH terms: Machine Learning
Early diagnosis of Parkinson's disease: A combined method using deep learning and neuro-fuzzy techniques

Nilashi M, Abumalloh RA, Yusuf SYM, Thi HH, Alsulami M, Abosaq H, et al.

Comput Biol Chem, 2023 Feb;102:107788.
PMID: 36410240 DOI: 10.1016/j.compbiolchem.2022.107788

Predicting Unified Parkinson's Disease Rating Scale (UPDRS) in Total- UPDRS and Motor-UPDRS clinical scales is an important part of controlling PD. Computational intelligence approaches have been used effectively in the early diagnosis of PD by predicting UPDRS. In this research, we target to present a combined approach for PD diagnosis using an ensemble learning approach with the ability of online learning from clinical large datasets. The method is developed using Deep Belief Network (DBN) and Neuro-Fuzzy approaches. A clustering approach, Expectation-Maximization (EM), is used to handle large datasets. The Principle Component Analysis (PCA) technique is employed for noise removal from the data. The UPDRS prediction models are constructed for PD diagnosis. To handle the missing data, K-NN is used in the proposed method. We use incremental machine learning approaches to improve the efficiency of the proposed method. We assess our approach on a real-world PD dataset and the findings are assessed compared to other PD diagnosis approaches developed by machine learning techniques. The findings revealed that the approach can improve the UPDRS prediction accuracy and the time complexity of previous methods in handling large datasets.

Matched MeSH terms: Machine Learning
Role of Artificial Intelligence in Drug Discovery and Target Identification in Cancer

Sharma V, Singh A, Chauhan S, Sharma PK, Chaudhary S, Sharma A, et al.

Curr Drug Deliv, 2024;21(6):870-886.
PMID: 37670704 DOI: 10.2174/1567201821666230905090621

Drug discovery and development (DDD) is a highly complex process that necessitates precise monitoring and extensive data analysis at each stage. Furthermore, the DDD process is both timeconsuming and costly. To tackle these concerns, artificial intelligence (AI) technology can be used, which facilitates rapid and precise analysis of extensive datasets within a limited timeframe. The pathophysiology of cancer disease is complicated and requires extensive research for novel drug discovery and development. The first stage in the process of drug discovery and development involves identifying targets. Cell structure and molecular functioning are complex due to the vast number of molecules that function constantly, performing various roles. Furthermore, scientists are continually discovering novel cellular mechanisms and molecules, expanding the range of potential targets. Accurately identifying the correct target is a crucial step in the preparation of a treatment strategy. Various forms of AI, such as machine learning, neural-based learning, deep learning, and network-based learning, are currently being utilised in applications, online services, and databases. These technologies facilitate the identification and validation of targets, ultimately contributing to the success of projects. This review focuses on the different types and subcategories of AI databases utilised in the field of drug discovery and target identification for cancer.

Matched MeSH terms: Machine Learning
Probabilistic classification of the severity classes of unhealthy air pollution events

Masseran N, Safari MAM, Tajuddin RRM

Environ Monit Assess, 2024 May 08;196(6):523.
PMID: 38717514 DOI: 10.1007/s10661-024-12700-4

Air pollution events can be categorized as extreme or non-extreme on the basis of their magnitude of severity. High-risk extreme air pollution events will exert a disastrous effect on the environment. Therefore, public health and policy-making authorities must be able to determine the characteristics of these events. This study proposes a probabilistic machine learning technique for predicting the classification of extreme and non-extreme events on the basis of data features to address the above issue. The use of the naïve Bayes model in the prediction of air pollution classes is proposed to leverage its simplicity as well as high accuracy and efficiency. A case study was conducted on the air pollution index data of Klang, Malaysia, for the period of January 01, 1997, to August 31, 2020. The trained naïve Bayes model achieves high accuracy, sensitivity, and specificity on the training and test datasets. Therefore, the naïve Bayes model can be easily applied in air pollution analysis while providing a promising solution for the accurate and efficient prediction of extreme or non-extreme air pollution events. The findings of this study provide reliable information to public authorities for monitoring and managing sustainable air quality over time.

Matched MeSH terms: Machine Learning
Fulltext Ensemble learning for multi-class COVID-19 detection from big data

Kaleem S, Sohail A, Tariq MU, Babar M, Qureshi B

PLoS One, 2023;18(10):e0292587.
PMID: 37819992 DOI: 10.1371/journal.pone.0292587

Coronavirus disease (COVID-19), which has caused a global pandemic, continues to have severe effects on human lives worldwide. Characterized by symptoms similar to pneumonia, its rapid spread requires innovative strategies for its early detection and management. In response to this crisis, data science and machine learning (ML) offer crucial solutions to complex problems, including those posed by COVID-19. One cost-effective approach to detect the disease is the use of chest X-rays, which is a common initial testing method. Although existing techniques are useful for detecting COVID-19 using X-rays, there is a need for further improvement in efficiency, particularly in terms of training and execution time. This article introduces an advanced architecture that leverages an ensemble learning technique for COVID-19 detection from chest X-ray images. Using a parallel and distributed framework, the proposed model integrates ensemble learning with big data analytics to facilitate parallel processing. This approach aims to enhance both execution and training times, ensuring a more effective detection process. The model's efficacy was validated through a comprehensive analysis of predicted and actual values, and its performance was meticulously evaluated for accuracy, precision, recall, and F-measure, and compared to state-of-the-art models. The work presented here not only contributes to the ongoing fight against COVID-19 but also showcases the wider applicability and potential of ensemble learning techniques in healthcare.

Matched MeSH terms: Machine Learning
Fulltext Conditional Tabular Generative Adversarial Based Intrusion Detection System for Detecting Ddos and Dos Attacks on the Internet of Things Networks

Alabsi BA, Anbar M, Rihan SDA

Sensors (Basel), 2023 Jun 16;23(12).
PMID: 37420810 DOI: 10.3390/s23125644

The increasing use of Internet of Things (IoT) devices has led to a rise in Distributed Denial of Service (DDoS) and Denial of Service (DoS) attacks on these networks. These attacks can have severe consequences, resulting in the unavailability of critical services and financial losses. In this paper, we propose an Intrusion Detection System (IDS) based on a Conditional Tabular Generative Adversarial Network (CTGAN) for detecting DDoS and DoS attacks on IoT networks. Our CGAN-based IDS utilizes a generator network to produce synthetic traffic that mimics legitimate traffic patterns, while the discriminator network learns to differentiate between legitimate and malicious traffic. The syntactic tabular data generated by CTGAN is employed to train multiple shallow machine-learning and deep-learning classifiers, enhancing their detection model performance. The proposed approach is evaluated using the Bot-IoT dataset, measuring detection accuracy, precision, recall, and F1 measure. Our experimental results demonstrate the accurate detection of DDoS and DoS attacks on IoT networks using the proposed approach. Furthermore, the results highlight the significant contribution of CTGAN in improving the performance of detection models in machine learning and deep learning classifiers.

Matched MeSH terms: Machine Learning
A Radiomics Study: Classification of Breast Lesions by Textural Features from Mammography Images

Letchumanan N, Wong JHD, Tan LK, Ab Mumin N, Ng WL, Chan WY, et al.

J Digit Imaging, 2023 Aug;36(4):1533-1540.
PMID: 37253893 DOI: 10.1007/s10278-022-00753-1

This study investigates the feasibility of using texture radiomics features extracted from mammography images to distinguish between benign and malignant breast lesions and to classify benign lesions into different categories and determine the best machine learning (ML) model to perform the tasks. Six hundred and twenty-two breast lesions from 200 retrospective patient data were segmented and analysed. Three hundred fifty radiomics features were extracted using the Standardized Environment for Radiomics Analysis (SERA) library, one of the radiomics implementations endorsed by the Image Biomarker Standardisation Initiative (IBSI). The radiomics features and selected patient characteristics were used to train selected machine learning models to classify the breast lesions. A fivefold cross-validation was used to evaluate the performance of the ML models and the top 10 most important features were identified. The random forest (RF) ensemble gave the highest accuracy (89.3%) and positive predictive value (66%) and likelihood ratio of 13.5 in categorising benign and malignant lesions. For the classification of benign lesions, the RF model again gave the highest likelihood ratio of 3.4 compared to the other models. Morphological and textural radiomics features were identified as the top 10 most important features from the random forest models. Patient age was also identified as one of the significant features in the RF model. We concluded that machine learning models trained against texture-based radiomics features and patient features give reasonable performance in differentiating benign versus malignant breast lesions. Our study also demonstrated that the radiomics-based machine learning models were able to emulate the visual assessment of mammography lesions, typically used by radiologists, leading to a better understanding of how the machine learning model arrive at their decision.

Matched MeSH terms: Machine Learning
Smartic: A smart tool for Big Data analytics and IoT

Sayeed S, Ahmad AF, Peng TC

F1000Res, 2022;11:17.
PMID: 38269303 DOI: 10.12688/f1000research.73613.1

The Internet of Things (IoT) is leading the physical and digital world of technology to converge. Real-time and massive scale connections produce a large amount of versatile data, where Big Data comes into the picture. Big Data refers to large, diverse sets of information with dimensions that go beyond the capabilities of widely used database management systems, or standard data processing software tools to manage within a given limit. Almost every big dataset is dirty and may contain missing data, mistyping, inaccuracies, and many more issues that impact Big Data analytics performances. One of the biggest challenges in Big Data analytics is to discover and repair dirty data; failure to do this can lead to inaccurate analytics results and unpredictable conclusions. We experimented with different missing value imputation techniques and compared machine learning (ML) model performances with different imputation methods. We propose a hybrid model for missing value imputation combining ML and sample-based statistical techniques. Furthermore, we continued with the best missing value inputted dataset, chosen based on ML model performance for feature engineering and hyperparameter tuning. We used k-means clustering and principal component analysis. Accuracy, the evaluated outcome, improved dramatically and proved that the XGBoost model gives very high accuracy at around 0.125 root mean squared logarithmic error (RMSLE). To overcome overfitting, we used K-fold cross-validation.

Matched MeSH terms: Machine Learning

Filters

Please provide feedback to Administrator (afdal@afpm.org.my)

External Links