Conventional convolutional neural networks (CNNs) present a high computational workload and memory access cost (CMC). Spectral domain CNNs (SpCNNs) offer a computationally efficient approach to compute CNN training and inference. This paper investigates CMC of SpCNNs and its contributing components analytically and then proposes a methodology to optimize CMC, under three strategies, to enhance inference performance. In this methodology, output feature map (OFM) size, OFM depth or both are progressively reduced under an accuracy constraint to compute performance-optimized CNN inference. Before conducting training or testing, it can provide designers guidelines and preliminary insights regarding techniques for optimum performance, least degradation in accuracy and a balanced performance-accuracy trade-off. This methodology was evaluated on MNIST and Fashion MNIST datasets using LeNet-5 and AlexNet architectures. When compared to state-of-the-art SpCNN models, LeNet-5 achieves up to 4.2× (batch inference) and 4.1× (single-image inference) higher throughputs and 10.5× (batch inference) and 4.2× (single-image inference) greater energy efficiency at a maximum loss of 3% in test accuracy. When compared to the baseline model used in this study, AlexNet delivers 11.6× (batch inference) and 5× (single-image inference) increased throughput and 25× (batch inference) and 8.8× (single-image inference) more energy-efficient inference with just 4.4% reduction in accuracy.
Human action recognition (HAR) is one of the most active research topics in the field of computer vision. Even though this area is well-researched, HAR algorithms such as 3D Convolution Neural Networks (CNN), Two-stream Networks, and CNN-LSTM (Long Short-Term Memory) suffer from highly complex models. These algorithms involve a huge number of weights adjustments during the training phase, and as a consequence, require high-end configuration machines for real-time HAR applications. Therefore, this paper presents an extraneous frame scrapping technique that employs 2D skeleton features with a Fine-KNN classifier-based HAR system to overcome the dimensionality problems.To illustrate the efficacy of our proposed method, two contemporary datasets i.e., Multi-Camera Action Dataset (MCAD) and INRIA Xmas Motion Acquisition Sequences (IXMAS) dataset was used in experiment. We used the OpenPose technique to extract the 2D information, The proposed method was compared with CNN-LSTM, and other State of the art methods. Results obtained confirm the potential of our technique. The proposed OpenPose-FineKNN with Extraneous Frame Scrapping Technique achieved an accuracy of 89.75% on MCAD dataset and 90.97% on IXMAS dataset better than existing technique.