An intrusion detection model to detect zero-day attacks in unseen data using machine learning

Dai Z; Por LY; Chen YL; Yang J; Ku CS; Alizadehsani R; Pławiak P

doi:10.1371/journal.pone.0308469

An intrusion detection model to detect zero-day attacks in unseen data using machine learning

Dai Z ¹ , Por LY ¹ , Chen YL ² , Yang J ¹ , Ku CS ³ , Alizadehsani R ⁴ Show all authors , Pławiak P ⁵

Affiliations

¹ Department of Computer System and Technology, Faculty of Computer Science and Information Technology, Universiti Malaya, Kuala Lumpur, Malaysia
² Department of Computer Science and Information Engineering, National Taipei University of Technology, Taipei Taiwan
³ Department of Computer Science, Universiti Tunku Abdul Rahman, Kampar, Malaysia
⁴ Institute for Intelligent Systems Research and Innovation (IISRI) Deakin University, Waurn Ponds, Australia
⁵ Department of Computer Science, Faculty of Computer Science and Telecommunications, Cracow University of Technology, Warszawska, Krakow, Poland

PLoS One, 2024;19(9):e0308469.

PMID: 39259729 DOI: 10.1371/journal.pone.0308469

Abstract

In an era marked by pervasive digital connectivity, cybersecurity concerns have escalated. The rapid evolution of technology has led to a spectrum of cyber threats, including sophisticated zero-day attacks. This research addresses the challenge of existing intrusion detection systems in identifying zero-day attacks using the CIC-MalMem-2022 dataset and autoencoders for anomaly detection. The trained autoencoder is integrated with XGBoost and Random Forest, resulting in the models XGBoost-AE and Random Forest-AE. The study demonstrates that incorporating an anomaly detector into traditional models significantly enhances performance. The Random Forest-AE model achieved 100% accuracy, precision, recall, F1 score, and Matthews Correlation Coefficient (MCC), outperforming the methods proposed by Balasubramanian et al., Khan, Mezina et al., Smith et al., and Dener et al. When tested on unseen data, the Random Forest-AE model achieved an accuracy of 99.9892%, precision of 100%, recall of 99.9803%, F1 score of 99.9901%, and MCC of 99.8313%. This research highlights the effectiveness of the proposed model in maintaining high accuracy even with previously unseen data.

* Title and MeSH Headings from MEDLINE®/PubMed®, a database of the U.S. National Library of Medicine.

MeSH terms

Similar publications