Breast cancer continues to be a prominent cause for substantial loss of life among women globally. Despite established treatment approaches, the rising prevalence of breast cancer is a concerning trend regardless of geographical location. This highlights the need to identify common key genes and explore their biological significance across diverse populations. Our research centered on establishing a correlation between common key genes identified in breast cancer patients. While previous studies have reported many of the genes independently, our study delved into the unexplored realm of their mutual interactions, that may establish a foundational network contributing to breast cancer development. Machine learning algorithms were employed for sample classification and key gene selection. The best performance model further selected the candidate genes through expression pattern recognition. Subsequently, the genes common in all the breast cancer patients from India, China, Czech Republic, Germany, Malaysia and Saudi Arabia were selected for further study. We found that among ten classifiers, Catboost exhibited superior performance with an average accuracy of 92%. Functional enrichment analysis and pathway analysis revealed that calcium signaling pathway, regulation of actin cytoskeleton pathway and other cancer-associated pathways were highly enriched with our identified genes. Notably, we observed that these genes regulate each other, forming a complex network. Additionally, we identified PALMD gene as a novel potential biomarker for breast cancer progression. Our study revealed key gene modules forming a complex network that were consistently expressed in different populations, affirming their critical role and biological significance in breast cancer. The identified genes hold promise as prospective biomarkers of breast cancer prognosis irrespective of country of origin or ethnicity. Future investigations will expand upon these genes in a larger population and validate their biological functions through in vivo analysis.
* Title and MeSH Headings from MEDLINE®/PubMed®, a database of the U.S. National Library of Medicine.