Affiliations 

  • 1 Business Studies Division, National Institute of Public Administration, Lusaka, Zambia
  • 2 Department of Parasitology, Faculty of Medicine, University Malaya, Kuala Lumpur, Malaysia
  • 3 Institute of Biological Sciences, Faculty of Science, University Malaya, Kuala Lumpur, Malaysia
  • 4 Department of Medical Microbiology, Faculty of Medicine, University Malaya, Kuala Lumpur, Malaysia
  • 5 Faculty of Computer Science and Information Technology, University of Malaya, Kuala Lumpur, Malaysia
  • 6 School of Business, Monash University Malaysia, Selangor, Malaysia
  • 7 Department of Computer Science, Faculty of Computing, Federal University of Lafia, Lafia, Nasarawa State, Nigeria
PLoS One, 2025;20(1):e0316493.
PMID: 39879257 DOI: 10.1371/journal.pone.0316493

Abstract

The emergence of Next Generation Sequencing (NGS) technology has catalyzed a paradigm shift in clinical diagnostics and personalized medicine, enabling unprecedented access to high-throughput microbiome data. However, the inherent high dimensionality, noise, and variability of microbiome data present substantial obstacles to conventional statistical methods and machine learning techniques. Even the promising deep learning (DL) methods are not immune to these challenges. This paper introduces a novel feature engineering method that circumvents these limitations by amalgamating two feature sets derived from input data to generate a new dataset, which is then subjected to feature selection. This innovative approach markedly enhances the Area Under the Curve (AUC) performance of the Deep Neural Network (DNN) algorithm in colorectal cancer (CRC) detection using gut microbiome data, elevating it from 0.800 to 0.923. The proposed method constitutes a significant advancement in the field, providing a robust solution to the intricacies of microbiome data analysis and amplifying the potential of DL methods in disease detection.

* Title and MeSH Headings from MEDLINE®/PubMed®, a database of the U.S. National Library of Medicine.