Over-the-Counter Breast Cancer Classification Using Machine Learning and Patient Registration Records

Hanis TM; Ruhaiyem NIR; Arifin WN; Haron J; Wan Abdul Rahman WF; Abdullah R; Musa KI

doi:10.3390/diagnostics12112826

Fulltext

Over-the-Counter Breast Cancer Classification Using Machine Learning and Patient Registration Records

Hanis TM ¹ , Ruhaiyem NIR ² , Arifin WN ³ , Haron J ⁴ , Wan Abdul Rahman WF ⁵ , Abdullah R ² Show all authors , Musa KI ¹

Affiliations

¹ Department of Community Medicine, School of Medical Sciences, Universiti Sains Malaysia, Kubang Kerian 16150, Kelantan, Malaysia
² School of Computer Sciences, Universiti Sains Malaysia, Gelugor 11800, Penang, Malaysia
³ Biostatistics and Research Methodology Unit, School of Medical Sciences, Universiti Sains Malaysia, Kubang Kerian 16150, Kelantan, Malaysia
⁴ Department of Radiology, School of Medical Sciences, Universiti Sains Malaysia, Kubang Kerian 16150, Kelantan, Malaysia
⁵ Breast Cancer Awareness and Research Unit, Hospital Universiti Sains Malaysia, Kubang Kerian 16150, Kelantan, Malaysia

Diagnostics (Basel), 2022 Nov 16;12(11).

PMID: 36428886 DOI: 10.3390/diagnostics12112826

Abstract

This study aims to determine the feasibility of machine learning (ML) and patient registration record to be utilised to develop an over-the-counter (OTC) screening model for breast cancer risk estimation. Data were retrospectively collected from women who came to the Hospital Universiti Sains Malaysia, Malaysia for breast-related problems. Eight ML models were used: k-nearest neighbour (kNN), elastic-net logistic regression, multivariate adaptive regression splines, artificial neural network, partial least square, random forest, support vector machine (SVM), and extreme gradient boosting. Features utilised for the development of the screening models were limited to information in the patient registration form. The final model was evaluated in terms of performance across a mammographic density. Additionally, the feature importance of the final model was assessed using the model agnostic approach. kNN had the highest Youden J index, precision, and PR-AUC, while SVM had the highest F2 score. The kNN model was selected as the final model. The model had a balanced performance in terms of sensitivity, specificity, and PR-AUC across the mammographic density groups. The most important feature was the age at examination. In conclusion, this study showed that ML and patient registration information are feasible to be used as the OTC screening model for breast cancer.

* Title and MeSH Headings from MEDLINE®/PubMed®, a database of the U.S. National Library of Medicine.

Similar publications