Biomedical engineering involves ideologies and problem-solving methods of engineering to biology and medicine. Malaria is a life-threatening illness, which has gained significant attention among researchers. Since the manual diagnosis of malaria in a clinical setting is tedious, automated tools based on computational intelligence (CI) tools have gained considerable interest. Though earlier studies were focused on the handcrafted features, the diagnostic accuracy can be boosted through deep learning (DL) methods. This study introduces a new Barnacles Mating Optimizer with Deep Transfer Learning Enabled Biomedical Malaria Parasite Detection and Classification (BMODTL-BMPC) model. The presented BMODTL-BMPC model involves the design of intelligent models for the recognition and classification of malaria parasites. Initially, the Gaussian filtering (GF) approach is employed to eradicate noise in blood smear images. Then, Graph cuts (GC) segmentation technique is applied to determine the affected regions in the blood smear images. Moreover, the barnacles mating optimizer (BMO) algorithm with the NasNetLarge model is employed for the feature extraction process. Furthermore, the extreme learning machine (ELM) classification model is employed for the identification and classification of malaria parasites. To assure the enhanced outcomes of the BMODTL-BMPC technique, a wide-ranging experimentation analysis is performed using a benchmark dataset. The experimental results show that the BMODTL-BMPC technique outperforms other recent approaches.
The exponential increase in published articles makes a thorough and expedient review of literature increasingly challenging. This review delineated automated tools and platforms that employ artificial intelligence (AI) approaches and evaluated the reported benefits and challenges in using such methods. A search was conducted in 4 databases (Medline, Embase, CDSR, and Epistemonikos) up to April 2021 for systematic reviews and other related reviews implementing AI methods. To be included, the review must use any form of AI method, including machine learning, deep learning, neural network, or any other applications used to enable the full or semi-autonomous performance of one or more stages in the development of evidence synthesis. Twelve reviews were included, using nine different tools to implement 15 different AI methods. Eleven methods were used in the screening stages of the review (73%). The rest were divided: two in data extraction (13%) and two in risk of bias assessment (13%). The ambiguous benefits of the data extractions, combined with the reported advantages from 10 reviews, indicating that AI platforms have taken hold with varying success in evidence synthesis. However, the results are qualified by the reliance on the self-reporting of the review authors. Extensive human validation still appears required at this stage in implementing AI methods, though further evaluation is required to define the overall contribution of such platforms in enhancing efficiency and quality in evidence synthesis.
Acute myocardial infarction (AMI) or heart attack is a significant global health threat and one of the leading causes of death. The evolution of machine learning has greatly revamped the risk stratification and death prediction of AMI. In this study, an integrated feature selection and machine learning approach was used to identify potential biomarkers for early detection and treatment of AMI. First, feature selection was conducted and evaluated before all classification tasks with machine learning. Full classification models (using all 62 features) and reduced classification models (using various feature selection methods ranging from 5 to 30 features) were built and evaluated using six machine learning classification algorithms. The results showed that the reduced models performed generally better (mean AUPRC via random forest (RF) algorithm for recursive feature elimination (RFE) method ranges from 0.8048 to 0.8260, while for random forest importance (RFI) method, it ranges from 0.8301 to 0.8505) than the full models (mean AUPRC via RF: 0.8044). The most notable finding of this study was the identification of a five-feature model that included cardiac troponin I, HDL cholesterol, HbA1c, anion gap, and albumin, which had achieved comparable results (mean AUPRC via RF: 0.8462) as to the models that containing more features. These five features were proven by the previous studies as significant risk factors for AMI or cardiovascular disease and could be used as potential biomarkers to predict the prognosis of AMI patients. From the medical point of view, fewer features for diagnosis or prognosis could reduce the cost and time of a patient as lesser clinical and pathological tests are needed.
In recent times, knee joint pains have become severe enough to make daily tasks difficult. Knee osteoarthritis is a type of arthritis and a leading cause of disability worldwide. The middle of the knee contains a vital portion, the anterior cruciate ligament (ACL). It is necessary to diagnose the ACL ruptured tears early to avoid surgery. The study aimed to perform a comparative analysis of machine learning models to identify the condition of three ACL tears. In contrast to previous studies, this study also considers imbalanced data distributions as machine learning techniques struggle to deal with this problem. The paper applied and analyzed four machine learning classification models, namely, random forest (RF), categorical boosting (Cat Boost), light gradient boosting machines (LGBM), and highly randomized classifier (ETC) on the balanced, structured dataset of ACL. After oversampling a hyperparameter adjustment, the above four models have achieved an average accuracy of 95.72%, 94.98%, 94.98%, and 98.26%. There are 2070 observations and eight features in the collection of three diagnosis ACL classes after oversampling. The area under curve value was approximately 0.998, respectively. Experiments were performed using twelve machine learning algorithms with imbalanced and balanced datasets. However, the accuracy of the imbalanced dataset has remained under 76% for all twelve models. After oversampling, the proposed model may contribute to the investigation of ACL tears on magnetic resonance imaging and other knee ligaments efficiently and automatically without involving radiologists.
Based on the data of the Chinese A-share listed firms in China Shanghai and Shenzhen Stock Exchange from 2014 to 2021, this article explores the relationship between common institutional investors and the quality of management earnings forecasts. The study used the multiple linear regression model and empirically found that common institutional investors positively impact the precision of earnings forecasts. This article also uses graph neural networks to predict the precision of earnings forecasts. Our findings have shown that common institutional investors form external supervision over restricting management to release a wide width of earnings forecasts, which helps to improve the risk warning function of earnings forecasts and promote the sustainable development of information disclosure from management in the Chinese capital market. One of the marginal contributions of this paper is that it enriches the literature related to the economic consequences of common institutional shareholding. Then, the neural network method used to predict the quality of management forecasts enhances the research method of institutional investors and the behavior of management earnings forecasts. Thirdly, this paper calls for strengthening information sharing and circulation among institutional investors to reduce information asymmetry between investors and management.
This work focuses on data sampling in cancer-gene association prediction. Currently, researchers are using machine learning methods to predict genes that are more likely to produce cancer-causing mutations. To improve the performance of machine learning models, methods have been proposed, one of which is to improve the quality of the training data. Existing methods focus mainly on positive data, i.e. cancer driver genes, for screening selection. This paper proposes a low-cancer-related gene screening method based on gene network and graph theory algorithms to improve the negative samples selection. Genetic data with low cancer correlation is used as negative training samples. After experimental verification, using the negative samples screened by this method to train the cancer gene classification model can improve prediction performance. The biggest advantage of this method is that it can be easily combined with other methods that focus on enhancing the quality of positive training samples. It has been demonstrated that significant improvement is achieved by combining this method with three state-of-the-arts cancer gene prediction methods.
Breast cancer continues to be a prominent cause for substantial loss of life among women globally. Despite established treatment approaches, the rising prevalence of breast cancer is a concerning trend regardless of geographical location. This highlights the need to identify common key genes and explore their biological significance across diverse populations. Our research centered on establishing a correlation between common key genes identified in breast cancer patients. While previous studies have reported many of the genes independently, our study delved into the unexplored realm of their mutual interactions, that may establish a foundational network contributing to breast cancer development. Machine learning algorithms were employed for sample classification and key gene selection. The best performance model further selected the candidate genes through expression pattern recognition. Subsequently, the genes common in all the breast cancer patients from India, China, Czech Republic, Germany, Malaysia and Saudi Arabia were selected for further study. We found that among ten classifiers, Catboost exhibited superior performance with an average accuracy of 92%. Functional enrichment analysis and pathway analysis revealed that calcium signaling pathway, regulation of actin cytoskeleton pathway and other cancer-associated pathways were highly enriched with our identified genes. Notably, we observed that these genes regulate each other, forming a complex network. Additionally, we identified PALMD gene as a novel potential biomarker for breast cancer progression. Our study revealed key gene modules forming a complex network that were consistently expressed in different populations, affirming their critical role and biological significance in breast cancer. The identified genes hold promise as prospective biomarkers of breast cancer prognosis irrespective of country of origin or ethnicity. Future investigations will expand upon these genes in a larger population and validate their biological functions through in vivo analysis.
Digital image processing has witnessed a significant transformation, owing to the adoption of deep learning (DL) algorithms, which have proven to be vastly superior to conventional methods for crop detection. These DL algorithms have recently found successful applications across various domains, translating input data, such as images of afflicted plants, into valuable insights, like the identification of specific crop diseases. This innovation has spurred the development of cutting-edge techniques for early detection and diagnosis of crop diseases, leveraging tools such as convolutional neural networks (CNN), K-nearest neighbour (KNN), support vector machines (SVM), and artificial neural networks (ANN). This paper offers an all-encompassing exploration of the contemporary literature on methods for diagnosing, categorizing, and gauging the severity of crop diseases. The review examines the performance analysis of the latest machine learning (ML) and DL techniques outlined in these studies. It also scrutinizes the methodologies and datasets and outlines the prevalent recommendations and identified gaps within different research investigations. As a conclusion, the review offers insights into potential solutions and outlines the direction for future research in this field. The review underscores that while most studies have concentrated on traditional ML algorithms and CNN, there has been a noticeable dearth of focus on emerging DL algorithms like capsule neural networks and vision transformers. Furthermore, it sheds light on the fact that several datasets employed for training and evaluating DL models have been tailored to suit specific crop types, emphasizing the pressing need for a comprehensive and expansive image dataset encompassing a wider array of crop varieties. Moreover, the survey draws attention to the prevailing trend where the majority of research endeavours have concentrated on individual plant diseases, ML, or DL algorithms. In light of this, it advocates for the development of a unified framework that harnesses an ensemble of ML and DL algorithms to address the complexities of multiple plant diseases effectively.
Groundwater, the world's most abundant source of freshwater, is rapidly depleting in many regions due to a variety of factors. Accurate forecasting of groundwater level (GWL) is essential for effective management of this vital resource, but it remains a complex and challenging task. In recent years, there has been a notable increase in the use of machine learning (ML) techniques to model GWL, with many studies reporting exceptional results. In this paper, we present a comprehensive review of 142 relevant articles indexed by the Web of Science from 2017 to 2023, focusing on key ML models, including artificial neural networks (ANN), adaptive neuro-fuzzy inference systems (ANFIS), support vector regression (SVR), evolutionary computing (EC), deep learning (DL), ensemble learning (EN), and hybrid-modeling (HM). We also discussed key modeling concepts such as dataset size, data splitting, input variable selection, forecasting time-step, performance metrics (PM), study zones, and aquifers, highlighting best practices for optimal GWL forecasting with ML. This review provides valuable insights and recommendations for researchers and water management agencies working in the field of groundwater management and hydrology.
Dental caries has high prevalence among kids and adults thus it has become one of the global health concerns. The current modern dentistry focused on the preventives measures to reduce the number of dental caries cases. The employment of machine learning coupled with UV spectroscopy plays a crucial role to detect the early stage of caries. Artificial neural network with hyperparameter tuning was employed to train spectral data for the classification based on the International Caries Detection and Assesment System (ICDAS). Spectra preprocessing namely mean center (MC), autoscale (AS) and Savitzky Golay smoothing (SG) were applied on the data for spectra correction. The best performance of ANN model obtained has accuracy of 0.85 with precision of 1.00. Convolutional neural network (CNN) combined with Savitzky Golay smoothing performed on the spectral data has accuracy, precision, sensitivity and specificity for validation data of 1.00 respectively. The result obtained shows that the application of ANN and CNN capable to produce robust model to be used as an early screening of dental caries.
A real-time Bangla Sign Language interpreter can enable more than 200 k hearing and speech-impaired people to the mainstream workforce in Bangladesh. Bangla Sign Language (BdSL) recognition and detection is a challenging topic in computer vision and deep learning research because sign language recognition accuracy may vary on the skin tone, hand orientation, and background. This research has used deep machine learning models for accurate and reliable BdSL Alphabets and Numerals using two well-suited and robust datasets. The dataset prepared in this study comprises of the largest image database for BdSL Alphabets and Numerals in order to reduce inter-class similarity while dealing with diverse image data, which comprises various backgrounds and skin tones. The papers compared classification with and without background images to determine the best working model for BdSL Alphabets and Numerals interpretation. The CNN model trained with the images that had a background was found to be more effective than without background. The hand detection portion in the segmentation approach must be more accurate in the hand detection process to boost the overall accuracy in the sign recognition. It was found that ResNet18 performed best with 99.99% accuracy, precision, F1 score, sensitivity, and 100% specificity, which outperforms the works in the literature for BdSL Alphabets and Numerals recognition. This dataset is made publicly available for researchers to support and encourage further research on Bangla Sign Language Interpretation so that the hearing and speech-impaired individuals can benefit from this research.
Arrhythmia is a life-threatening cardiac condition characterized by irregular heart rhythm. Early and accurate detection is crucial for effective treatment. However, single-lead electrocardiogram (ECG) methods have limited sensitivity and specificity. This study propose an improved ensemble learning approach for arrhythmia detection using multi-lead ECG data. Proposed method, based on a boosting algorithm, namely Fine Tuned Boosting (FTBO) model detects multiple arrhythmia classes. For the feature extraction, introduce a new technique that utilizes a sliding window with a window size of 5 R-peaks. This study compared it with other models, including bagging and stacking, and assessed the impact of parameter tuning. Rigorous experiments on the MIT-BIH arrhythmia database focused on Premature Ventricular Contraction (PVC), Atrial Premature Contraction (PAC), and Atrial Fibrillation (AF) have been performed. The results showed that the proposed method achieved high sensitivity, specificity, and accuracy for all three classes of arrhythmia. It accurately detected Atrial Fibrillation (AF) with 100% sensitivity and specificity. For Premature Ventricular Contraction (PVC) detection, it achieved 99% sensitivity and specificity in both leads. Similarly, for Atrial Premature Contraction (PAC) detection, proposed method achieved almost 96% sensitivity and specificity in both leads. The proposed method shows great potential for early arrhythmia detection using multi-lead ECG data.
Recently, the increasing prevalence of solar energy in power and energy systems around the world has dramatically increased the importance of accurately predicting solar irradiance. However, the lack of access to data in many regions and the privacy concerns that can arise when collecting and transmitting data from distributed points to a central server pose challenges to current predictive techniques. This study proposes a global solar radiation forecasting approach based on federated learning (FL) and convolutional neural network (CNN). In addition to maintaining input data privacy, the proposed procedure can also be used as a global supermodel. In this paper, data related to eight regions of Iran with different climatic features are considered as CNN input for network training in each client. To test the effectiveness of the global supermodel, data related to three new regions of Iran named Abadeh, Jarqavieh, and Arak are used. It can be seen that the global forecasting supermodel was able to forecast solar radiation for Abadeh, Jarqavieh, and Arak regions with 95%, 92%, and 90% accuracy coefficients, respectively. Finally, in a comparative scenario, various conventional machine learning and deep learning models are employed to forecast solar radiation in each of the study regions. The results of the above approaches are compared and evaluated with the results of the proposed FL-based method. The results show that, since no training data were available from regions of Abadeh, Jarqavieh, and Arak, the conventional methods were not able to forecast solar radiation in these regions. This evaluation confirms the high ability of the presented FL approach to make acceptable predictions while preserving privacy and eliminating model reliance on training data.
Accurate prediction of inlet chemical oxygen demand (COD) is vital for better planning and management of wastewater treatment plants. The COD values at the inlet follow a complex nonstationary pattern, making its prediction challenging. This study compared the performance of several novel machine learning models developed through hybridizing kernel-based extreme learning machines (KELMs) with intelligent optimization algorithms for the reliable prediction of real-time COD values. The combined time-series learning method and consumer behaviours, estimated from water-use data (hour/day), were used as the supplementary inputs of the hybrid KELM models. Comparison of model performances for different input combinations revealed the best performance using up to 2-day lag values of COD with the other wastewater properties. The results also showed the best performance of the KELM-salp swarm algorithm (SSA) model among all the hybrid models with a minimum root mean square error of 0.058 and mean absolute error of 0.044.
Knee osteoarthritis (OA) is a deliberating joint disorder characterized by cartilage loss that can be captured by imaging modalities and translated into imaging features. Observing imaging features is a well-known objective assessment for knee OA disorder. However, the variety of imaging features is rarely discussed. This study reviews knee OA imaging features with respect to different imaging modalities for traditional OA diagnosis and updates recent image-based machine learning approaches for knee OA diagnosis and prognosis. Although most studies recognized X-ray as standard imaging option for knee OA diagnosis, the imaging features are limited to bony changes and less sensitive to short-term OA changes. Researchers have recommended the usage of MRI to study the hidden OA-related radiomic features in soft tissues and bony structures. Furthermore, ultrasound imaging features should be explored to make it more feasible for point-of-care diagnosis. Traditional knee OA diagnosis mainly relies on manual interpretation of medical images based on the Kellgren-Lawrence (KL) grading scheme, but this approach is consistently prone to human resource and time constraints and less effective for OA prevention. Recent studies revealed the capability of machine learning approaches in automating knee OA diagnosis and prognosis, through three major tasks: knee joint localization (detection and segmentation), classification of OA severity, and prediction of disease progression. AI-aided diagnostic models improved the quality of knee OA diagnosis significantly in terms of time taken, reproducibility, and accuracy. Prognostic ability was demonstrated by several prediction models in terms of estimating possible OA onset, OA deterioration, progressive pain, progressive structural change, progressive structural change with pain, and time to total knee replacement (TKR) incidence. Despite research gaps, machine learning techniques still manifest huge potential to work on demanding tasks such as early knee OA detection and estimation of future disease events, as well as fundamental tasks such as discovering the new imaging features and establishment of novel OA status measure. Continuous machine learning model enhancement may favour the discovery of new OA treatment in future.
Recently, fake news has been widely spread through the Internet due to the increased use of social media for communication. Fake news has become a significant concern due to its harmful impact on individual attitudes and the community's behavior. Researchers and social media service providers have commonly utilized artificial intelligence techniques in the recent few years to rein in fake news propagation. However, fake news detection is challenging due to the use of political language and the high linguistic similarities between real and fake news. In addition, most news sentences are short, therefore finding valuable representative features that machine learning classifiers can use to distinguish between fake and authentic news is difficult because both false and legitimate news have comparable language traits. Existing fake news solutions suffer from low detection performance due to improper representation and model design. This study aims at improving the detection accuracy by proposing a deep ensemble fake news detection model using the sequential deep learning technique. The proposed model was constructed in three phases. In the first phase, features were extracted from news contents, preprocessed using natural language processing techniques, enriched using n-gram, and represented using the term frequency-inverse term frequency technique. In the second phase, an ensemble model based on deep learning was constructed as follows. Multiple binary classifiers were trained using sequential deep learning networks to extract the representative hidden features that could accurately classify news types. In the third phase, a multi-class classifier was constructed based on multilayer perceptron (MLP) and trained using the features extracted from the aggregated outputs of the deep learning-based binary classifiers for final classification. The two popular and well-known datasets (LIAR and ISOT) were used with different classifiers to benchmark the proposed model. Compared with the state-of-the-art models, which use deep contextualized representation with convolutional neural network (CNN), the proposed model shows significant improvements (2.41%) in the overall performance in terms of the F1score for the LIAR dataset, which is more challenging than other datasets. Meanwhile, the proposed model achieves 100% accuracy with ISOT. The study demonstrates that traditional features extracted from news content with proper model design outperform the existing models that were constructed based on text embedding techniques.
This paper describes a discrete wavelet transform-based feature extraction scheme for the classification of EEG signals. In this scheme, the discrete wavelet transform is applied on EEG signals and the relative wavelet energy is calculated in terms of detailed coefficients and the approximation coefficients of the last decomposition level. The extracted relative wavelet energy features are passed to classifiers for the classification purpose. The EEG dataset employed for the validation of the proposed method consisted of two classes: (1) the EEG signals recorded during the complex cognitive task--Raven's advance progressive metric test and (2) the EEG signals recorded in rest condition--eyes open. The performance of four different classifiers was evaluated with four performance measures, i.e., accuracy, sensitivity, specificity and precision values. The accuracy was achieved above 98 % by the support vector machine, multi-layer perceptron and the K-nearest neighbor classifiers with approximation (A4) and detailed coefficients (D4), which represent the frequency range of 0.53-3.06 and 3.06-6.12 Hz, respectively. The findings of this study demonstrated that the proposed feature extraction approach has the potential to classify the EEG signals recorded during a complex cognitive task by achieving a high accuracy rate.
Coronaviruses (CoVs) are a large family of viruses that are common in many animal species, including camels, cattle, cats and bats. Animal CoVs, such as Middle East respiratory syndrome-CoV, severe acute respiratory syndrome (SARS)-CoV, and the new virus named SARS-CoV-2, rarely infect and spread among humans. On January 30, 2020, the International Health Regulations Emergency Committee of the World Health Organisation declared the outbreak of the resulting disease from this new CoV called 'COVID-19', as a 'public health emergency of international concern'. This global pandemic has affected almost the whole planet and caused the death of more than 315,131 patients as of the date of this article. In this context, publishers, journals and researchers are urged to research different domains and stop the spread of this deadly virus. The increasing interest in developing artificial intelligence (AI) applications has addressed several medical problems. However, such applications remain insufficient given the high potential threat posed by this virus to global public health. This systematic review addresses automated AI applications based on data mining and machine learning (ML) algorithms for detecting and diagnosing COVID-19. We aimed to obtain an overview of this critical virus, address the limitations of utilising data mining and ML algorithms, and provide the health sector with the benefits of this technique. We used five databases, namely, IEEE Xplore, Web of Science, PubMed, ScienceDirect and Scopus and performed three sequences of search queries between 2010 and 2020. Accurate exclusion criteria and selection strategy were applied to screen the obtained 1305 articles. Only eight articles were fully evaluated and included in this review, and this number only emphasised the insufficiency of research in this important area. After analysing all included studies, the results were distributed following the year of publication and the commonly used data mining and ML algorithms. The results found in all papers were discussed to find the gaps in all reviewed papers. Characteristics, such as motivations, challenges, limitations, recommendations, case studies, and features and classes used, were analysed in detail. This study reviewed the state-of-the-art techniques for CoV prediction algorithms based on data mining and ML assessment. The reliability and acceptability of extracted information and datasets from implemented technologies in the literature were considered. Findings showed that researchers must proceed with insights they gain, focus on identifying solutions for CoV problems, and introduce new improvements. The growing emphasis on data mining and ML techniques in medical fields can provide the right environment for change and improvement.
Hypertension is a potentially unsafe health ailment, which can be indicated directly from the blood pressure (BP). Hypertension always leads to other health complications. Continuous monitoring of BP is very important; however, cuff-based BP measurements are discrete and uncomfortable to the user. To address this need, a cuff-less, continuous, and noninvasive BP measurement system is proposed using the photoplethysmograph (PPG) signal and demographic features using machine learning (ML) algorithms. PPG signals were acquired from 219 subjects, which undergo preprocessing and feature extraction steps. Time, frequency, and time-frequency domain features were extracted from the PPG and their derivative signals. Feature selection techniques were used to reduce the computational complexity and to decrease the chance of over-fitting the ML algorithms. The features were then used to train and evaluate ML algorithms. The best regression models were selected for systolic BP (SBP) and diastolic BP (DBP) estimation individually. Gaussian process regression (GPR) along with the ReliefF feature selection algorithm outperforms other algorithms in estimating SBP and DBP with a root mean square error (RMSE) of 6.74 and 3.59, respectively. This ML model can be implemented in hardware systems to continuously monitor BP and avoid any critical health conditions due to sudden changes.
The training machine learning algorithm from an imbalanced data set is an inherently challenging task. It becomes more demanding with limited samples but with a massive number of features (high dimensionality). The high dimensional and imbalanced data set has posed severe challenges in many real-world applications, such as biomedical data sets. Numerous researchers investigated either imbalanced class or high dimensional data sets and came up with various methods. Nonetheless, few approaches reported in the literature have addressed the intersection of the high dimensional and imbalanced class problem due to their complicated interactions. Lately, feature selection has become a well-known technique that has been used to overcome this problem by selecting discriminative features that represent minority and majority class. This paper proposes a new method called Robust Correlation Based Redundancy and Binary Grasshopper Optimization Algorithm (rCBR-BGOA); rCBR-BGOA has employed an ensemble of multi-filters coupled with the Correlation-Based Redundancy method to select optimal feature subsets. A binary Grasshopper optimisation algorithm (BGOA) is used to construct the feature selection process as an optimisation problem to select the best (near-optimal) combination of features from the majority and minority class. The obtained results, supported by the proper statistical analysis, indicate that rCBR-BGOA can improve the classification performance for high dimensional and imbalanced datasets in terms of G-mean and the Area Under the Curve (AUC) performance metrics.