Affiliations 

  • 1 School of Computer Science, VIT-AP University, Vijayawada, 522241, Andhra Pradesh, India
  • 2 Department of Electronics and Communication Engineering, SRM University, Amaravati, 522240, Andhra Pradesh, India
  • 3 Professor of Computer Science and Engineering (AI&ML), St. Peter's Engineering College, Hyderabad, India
  • 4 Department of Electronics and Communication Engineering, NRI Institute of Technology, Agiripalli, Eluru, 521212, Andhra Pradesh, India. k_krishna2k7@yahoo.co.in
  • 5 Department of Electronics and Communication Engineering, University Institute of Engineering, Chandigarh University, Gharuan, Mohali, India. shonakk@gmail.com
  • 6 Space Science Centre (ANGKASA), Universiti Kebangsaan Malaysia, Bangi, 43600 UKM, Selangor D.E, Malaysia. rashed@ukm.edu.my
Sci Rep, 2025 Jan 21;15(1):2692.
PMID: 39837915 DOI: 10.1038/s41598-025-85822-5

Abstract

It is important in the rising demands to have efficient anomaly detection in camera surveillance systems for improving public safety in a complex environment. Most of the available methods usually fail to capture the long-term temporal dependencies and spatial correlations, especially in dynamic multi-camera settings. Also, many traditional methods rely heavily on large labeled datasets, generalizing poorly when encountering unseen anomalies in the process. We introduce a new framework to address such challenges by incorporating state-of-the-art deep learning models that improve temporal and spatial context modeling. We combine RNNs with GATs to model long-term dependencies across cameras effectively distributed over space. The Transformer-Augmented RNN allows for a better way than standard RNNs through self-attention mechanisms to improve robust temporal modeling. We employ a Multimodal Variational Autoencoder-MVAE that fuses video, audio, and motion sensor information in a manner resistant to noise and missing samples. To address the challenge of having a few labeled anomalies, we apply the Prototypical Networks to perform few-shot learning and enable generalization based on a few examples. Then, a Spatiotemporal Autoencoder is adopted to realize unsupervised anomaly detection by learning normal behavior patterns and deviations from them as anomalies. The methods proposed here yield significant improvements of about 10% to 15% in precision, recall, and F1-scores over traditional models. Further, the generalization capability of the framework to unseen anomalies, up to a gain of + 20% on novel event detection, represents a major advancement for real-world surveillance systems.

* Title and MeSH Headings from MEDLINE®/PubMed®, a database of the U.S. National Library of Medicine.