MyMedR

Displaying all 3 publications

Abstract:

Sort:

Fulltext Optimal feature selection using novel flamingo search algorithm for classification of COVID-19 patients from clinical text

Mahdi AY, Yuhaniz SS

Math Biosci Eng, 2023 Jan 11;20(3):5268-5297.
PMID: 36896545 DOI: 10.3934/mbe.2023244

Though several AI-based models have been established for COVID-19 diagnosis, the machine-based diagnostic gap is still ongoing, making further efforts to combat this epidemic imperative. So, we tried to create a new feature selection (FS) method because of the persistent need for a reliable system to choose features and to develop a model to predict the COVID-19 virus from clinical texts. This study employs a newly developed methodology inspired by the flamingo's behavior to find a near-ideal feature subset for accurate diagnosis of COVID-19 patients. The best features are selected using a two-stage. In the first stage, we implemented a term weighting technique, which that is RTF-C-IEF, to quantify the significance of the features extracted. The second stage involves using a newly developed feature selection approach called the improved binary flamingo search algorithm (IBFSA), which chooses the most important and relevant features for COVID-19 patients. The proposed multi-strategy improvement process is at the heart of this study to improve the search algorithm. The primary objective is to broaden the algorithm's capabilities by increasing diversity and support exploring the algorithm search space. Additionally, a binary mechanism was used to improve the performance of traditional FSA to make it appropriate for binary FS issues. Two datasets, totaling 3053 and 1446 cases, were used to evaluate the suggested model based on the Support Vector Machine (SVM) and other classifiers. The results showed that IBFSA has the best performance compared to numerous previous swarm algorithms. It was noted, that the number of feature subsets that were chosen was also drastically reduced by 88% and obtained the best global optimal features.
Fulltext Forecasting stock prices changes using long-short term memory neural network with symbolic genetic programming

Li Q, Kamaruddin N, Yuhaniz SS, Al-Jaifi HAA

Sci Rep, 2024 Jan 03;14(1):422.
PMID: 38172568 DOI: 10.1038/s41598-023-50783-0

This study introduces an augmented Long-Short Term Memory (LSTM) neural network architecture, integrating Symbolic Genetic Programming (SGP), with the objective of forecasting cross-sectional price returns across a comprehensive dataset comprising 4500 listed stocks in the Chinese market over the period from 2014 to 2022. Using the S&P Alpha Pool Dataset for China as basic input, this architecture incorporates data augmentation and feature extraction techniques. The result of this study demonstrates significant improvements in Rank Information coefficient (Rank IC) and IC information ratio (ICIR) by 1128% and 5360% respectively when it is applied to fundamental indicators. For technical indicators, the hybrid model achieves a 206% increase in Rank IC and an impressive surge of 2752% in ICIR. Furthermore, the proposed hybrid SGP-LSTM model outperforms major Chinese stock indexes, generating average annualized excess returns of 31.00%, 24.48%, and 16.38% compared to the CSI 300 index, CSI 500 index, and the average portfolio, respectively. These findings highlight the effectiveness of SGP-LSTM model in improving the accuracy of cross-sectional stock return predictions and provide valuable insights for fund managers, traders, and financial analysts.
Genome assembly composition of the String "ACGT" array: a review of data structure accuracy and performance challenges

Magdy Mohamed Abdelaziz Barakat S, Sallehuddin R, Yuhaniz SS, R Khairuddin RF, Mahmood Y

PeerJ Comput Sci, 2023;9:e1180.
PMID: 37547391 DOI: 10.7717/peerj-cs.1180

BACKGROUND: The development of sequencing technology increases the number of genomes being sequenced. However, obtaining a quality genome sequence remains a challenge in genome assembly by assembling a massive number of short strings (reads) with the presence of repetitive sequences (repeats). Computer algorithms for genome assembly construct the entire genome from reads in two approaches. The de novo approach concatenates the reads based on the exact match between their suffix-prefix (overlapping). Reference-guided approach orders the reads based on their offsets in a well-known reference genome (reads alignment). The presence of repeats extends the technical ambiguity, making the algorithm unable to distinguish the reads resulting in misassembly and affecting the assembly approach accuracy. On the other hand, the massive number of reads causes a big assembly performance challenge.
METHOD: The repeat identification method was introduced for misassembly by prior identification of repetitive sequences, creating a repeat knowledge base to reduce ambiguity during the assembly process, thus enhancing the accuracy of the assembled genome. Also, hybridization between assembly approaches resulted in a lower misassembly degree with the aid of the reference genome. The assembly performance is optimized through data structure indexing and parallelization. This article's primary aim and contribution are to support the researchers through an extensive review to ease other researchers' search for genome assembly studies. The study also, highlighted the most recent developments and limitations in genome assembly accuracy and performance optimization.
RESULTS: Our findings show the limitations of the repeat identification methods available, which only allow to detect of specific lengths of the repeat, and may not perform well when various types of repeats are present in a genome. We also found that most of the hybrid assembly approaches, either starting with de novo or reference-guided, have some limitations in handling repetitive sequences as it is more computationally costly and time intensive. Although the hybrid approach was found to outperform individual assembly approaches, optimizing its performance remains a challenge. Also, the usage of parallelization in overlapping and reads alignment for genome assembly is yet to be fully implemented in the hybrid assembly approach.
CONCLUSION: We suggest combining multiple repeat identification methods to enhance the accuracy of identifying the repeats as an initial step to the hybrid assembly approach and combining genome indexing with parallelization for better optimization of its performance.

Filters

Please provide feedback to Administrator (afdal@afpm.org.my)

External Links