Camera attribution plays an important role in digital image forensics by providing the evidence and distinguishing characteristics of the origin of the digital image. It allows the forensic analyser to find the possible source camera which captured the image under investigation. However, in real-world applications, these approaches have faced many challenges due to the large set of multimedia data publicly available through photo sharing and social network sites, captured with uncontrolled conditions and undergone variety of hardware and software post-processing operations. Moreover, the legal system only accepts the forensic analysis of the digital image evidence if the applied camera attribution techniques are unbiased, reliable, nondestructive and widely accepted by the experts in the field. The aim of this paper is to investigate the evolutionary trend of image source camera attribution approaches from fundamental to practice, in particular, with the application of image processing and data mining techniques. Extracting implicit knowledge from images using intrinsic image artifacts for source camera attribution requires a structured image mining process. In this paper, we attempt to provide an introductory tutorial on the image processing pipeline, to determine the general classification of the features corresponding to different components for source camera attribution. The article also reviews techniques of the source camera attribution more comprehensively in the domain of the image forensics in conjunction with the presentation of classifying ongoing developments within the specified area. The classification of the existing source camera attribution approaches is presented based on the specific parameters, such as colour image processing pipeline, hardware- and software-related artifacts and the methods to extract such artifacts. The more recent source camera attribution approaches, which have not yet gained sufficient attention among image forensics researchers, are also critically analysed and further categorised into four different classes, namely, optical aberrations based, sensor camera fingerprints based, processing statistics based and processing regularities based, to present a classification. Furthermore, this paper aims to investigate the challenging problems, and the proposed strategies of such schemes based on the suggested taxonomy to plot an evolution of the source camera attribution approaches with respect to the subjective optimisation criteria over the last decade. The optimisation criteria were determined based on the strategies proposed to increase the detection accuracy, robustness and computational efficiency of source camera brand, model or device attribution.
Audio forgery is any act of tampering, illegal copy and fake quality in the audio in a criminal way. In the last decade, there has been increasing attention to the audio forgery detection due to a significant increase in the number of forge in different type of audio. There are a number of methods for forgery detection, which electric network frequency (ENF) is one of the powerful methods in this area for forgery detection in terms of accuracy. In spite of suitable accuracy of ENF in a majority of plug-in powered devices, the weak accuracy of ENF in audio forgery detection for battery-powered devices, especially in laptop and mobile phone, can be consider as one of the main obstacles of the ENF. To solve the ENF problem in terms of accuracy in battery-powered devices, a combination method of ENF and phase feature is proposed. From experiment conducted, ENF alone give 50% and 60% accuracy for forgery detection in mobile phone and laptop respectively, while the proposed method shows 88% and 92% accuracy respectively, for forgery detection in battery-powered devices. The results lead to higher accuracy for forgery detection with the combination of ENF and phase feature.
The VoIP services provide fertile ground for criminal activity, thus identifying the transmitting computer devices from recorded VoIP call may help the forensic investigator to reveal useful information. It also proves the authenticity of the call recording submitted to the court as evidence. This paper extended the previous study on the use of recorded VoIP call for blind source computer device identification. Although initial results were promising but theoretical reasoning for this is yet to be found. The study suggested computing entropy of mel-frequency cepstrum coefficients (entropy-MFCC) from near-silent segments as an intrinsic feature set that captures the device response function due to the tolerances in the electronic components of individual computer devices. By applying the supervised learning techniques of naïve Bayesian, linear logistic regression, neural networks and support vector machines to the entropy-MFCC features, state-of-the-art identification accuracy of near 99.9% has been achieved on different sets of computer devices for both call recording and microphone recording scenarios. Furthermore, unsupervised learning techniques, including simple k-means, expectation-maximization and density-based spatial clustering of applications with noise (DBSCAN) provided promising results for call recording dataset by assigning the majority of instances to their correct clusters.
Since its inception, YouTube has been a source of entertainment and education. Everyday millions of videos are uploaded to this platform. Researchers have been using YouTube as a source of information in their research. However, there is a lack of bibliometric reports on research carried out on this platform and the pattern in the published works. This study aims at providing a bibliometric analysis on YouTube as a source of information to fill this gap. Specifically, this paper analyzes 1781 articles collected from the Scopus database spanning fifteen years. The analysis revealed that 2006-2007 were initial stage in YouTube research followed by 2008 -2017 which is the decade of rapid growth in YouTube research. The 2017 -2021 is considered the stage of consolidation and stabilization of this research topic. We also discovered that most relevant papers were published in small number of journals such as New Media and Society, Convergence, Journal of Medical Internet Research, Computers in Human Behaviour and the Physics Teacher, which proves the Bradford's law. USA, Turkey, and UK are the countries with the highest number of publications. We also present network analysis between countries, sources, and authors. Analyzing the keywords resulted in finding the trend in research such as "video sharing" (2010-2018), "web -based learning" (2012-2014), and "COVID -19" (2020 onward). Finally, we used Multiple Correspondence Analysis (MCA) to find the conceptual clusters of research on YouTube. The first cluster is related to user -generated content. The second cluster is about health and medical issues, and the final cluster is on the topic of information quality.
An immense volume of digital documents exists online and offline with content that can offer useful information and insights. Utilizing topic modeling enhances the analysis and understanding of digital documents. Topic modeling discovers latent semantic structures or topics within a set of digital textual documents. The Internet of Things, Blockchain, recommender system, and search engine optimization applications use topic modeling to handle data mining tasks, such as classification and clustering. The usefulness of topic models depends on the quality of resulting term patterns and topics with high quality. Topic coherence is the standard metric to measure the quality of topic models. Previous studies build topic models to generally work on conventional documents, and they are insufficient and underperform when applied to web content data due to differences in the structure of the conventional and HTML documents. Neglecting the unique structure of web content leads to missing otherwise coherent topics and, therefore, low topic quality. This study aims to propose an innovative topic model to learn coherence topics in web content data. We present the HTML Topic Model (HTM), a web content topic model that takes into consideration the HTML tags to understand the structure of web pages. We conducted two series of experiments to demonstrate the limitations of the existing topic models and examine the topic coherence of the HTM against the widely used Latent Dirichlet Allocation (LDA) model and its variants, namely the Correlated Topic Model, the Dirichlet Multinomial Regression, the Hierarchical Dirichlet Process, the Hierarchical Latent Dirichlet Allocation, the pseudo-document based Topic Model, and the Supervised Latent Dirichlet Allocation models. The first experiment demonstrates the limitations of the existing topic models when applied to web content data and, therefore, the essential need for a web content topic model. When applied to web data, the overall performance dropped an average of five times and, in some cases, up to approximately 20 times lower than when applied to conventional data. The second experiment then evaluates the effectiveness of the HTM model in discovering topics and term patterns of web content data. The HTM model achieved an overall 35% improvement in topic coherence compared to the LDA.
Cloud computing is a significant shift of computational paradigm where computing as a utility and storing data remotely have a great potential. Enterprise and businesses are now more interested in outsourcing their data to the cloud to lessen the burden of local data storage and maintenance. However, the outsourced data and the computation outcomes are not continuously trustworthy due to the lack of control and physical possession of the data owners. To better streamline this issue, researchers have now focused on designing remote data auditing (RDA) techniques. The majority of these techniques, however, are only applicable for static archive data and are not subject to audit the dynamically updated outsourced data. We propose an effectual RDA technique based on algebraic signature properties for cloud storage system and also present a new data structure capable of efficiently supporting dynamic data operations like append, insert, modify, and delete. Moreover, this data structure empowers our method to be applicable for large-scale data with minimum computation cost. The comparative analysis with the state-of-the-art RDA schemes shows that the proposed scheme is secure and highly efficient in terms of the computation and communication overhead on the auditor and server.
Product reviews are the individual's opinions, judgement or belief about a certain product or service provided by certain companies. Such reviews serve as guides for these companies to plan and monitor their business ventures in terms of increasing productivity or enhancing their product/service qualities. Product reviews can also increase business profits by convincing future customers about the products which they have interest in. In the mobile application marketplace such as Google Playstore, reviews and star ratings are used as indicators of the application quality. However, among all these reviews, hereby also known as opinions, spams also exist, to disrupt the online business balance. Previous studies used the time series and neural network approach (which require a lot of computational power) to detect these opinion spams. However, the detection performance can be restricted in terms of accuracy because the approach focusses on basic, discrete and document level features only thereby, projecting little statistical relationships. Aiming to improve the detection of opinion spams in mobile application marketplace, this study proposes using statistical based features that are modelled through the supervised boosting approach such as the Extreme Gradient Boost (XGBoost) and the Generalized Boosted Regression Model (GBM) to evaluate two multilingual datasets (i.e. English and Malay language). From the evaluation done, it was found that the XGBoost is most suitable for detecting opinion spams in the English dataset while the GBM Gaussian is most suitable for the Malay dataset. The comparative analysis also indicates that the implementation of the proposed statistical based features had achieved a detection accuracy rate of 87.43 per cent on the English dataset and 86.13 per cent on the Malay dataset.
To deal with the large number of malicious mobile applications (e.g. mobile malware), a number of malware detection systems have been proposed in the literature. In this paper, we propose a hybrid method to find the optimum parameters that can be used to facilitate mobile malware identification. We also present a multi agent system architecture comprising three system agents (i.e. sniffer, extraction and selection agent) to capture and manage the pcap file for data preparation phase. In our hybrid approach, we combine an adaptive neuro fuzzy inference system (ANFIS) and particle swarm optimization (PSO). Evaluations using data captured on a real-world Android device and the MalGenome dataset demonstrate the effectiveness of our approach, in comparison to two hybrid optimization methods which are differential evolution (ANFIS-DE) and ant colony optimization (ANFIS-ACO).
Wireless sensor networks (WSNs) consist of hundreds, or thousands of sensor nodes distributed over a wide area and used as the Internet of Things (IoT) devices to benefit many home users and autonomous systems industries. With many users adopting WSN-based IoT technology, ensuring that the sensor's information is protected from attacks is essential. Many attacks interrupt WSNs, such as Quality of Service (QoS) attacks, malicious nodes, and routing attacks. To combat these attacks, especially on the routing attacks, we need to detect the attacker nodes and prevent them from any access to WSN. Although some survey studies on routing attacks have been published, a lack of systematic studies on detecting WSN routing attacks can be seen in the literature. This study enhances the topic with a taxonomy of current and emerging detection techniques for routing attacks in wireless sensor networks to improve QoS. This article uses a PRISMA flow diagram for a systematic review of 87 articles from 2016 to 2022 based on eight routing attacks: wormhole, sybil, Grayhole/selective forwarding, blackhole, sinkhole, replay, spoofing, and hello flood attacks. The review also includes an evaluation of the metrics and criteria used to evaluate performance. Researchers can use this article to fill in any information gaps within the WSN routing attack detection domain.
Routing protocols transmit vast amounts of sensor data between the Wireless Sensor Network (WSN) and the Internet of Things (IoT) gateway. One of these routing protocols is Routing Protocol for Low Power and Lossy Networks (RPL). The Internet Engineering Task Force (IETF) defined RPL in March 2012 as a de facto distance-vector routing protocol for wireless communications with lower energy. Although RPL messages use a cryptographic algorithm for security protection, it does not help prevent internal attacks. These attacks drop some or all packets, such as blackhole or selective forwarding attacks, or change data packets, like grayhole attacks. The RPL protocol needs to be strengthened to address such an issue, as only a limited number of studies have been conducted on detecting internal attacks. Moreover, earlier research should have considered the mobility framework, a vital feature of the IoT. This article presents a novel lightweight system for anomaly detection of grayhole, blackhole, and selective forwarding attacks. The study aims to use a trust model in the RPL protocol, considering attack detection under mobility frameworks. The proposed system, anomaly detection of three RPL attacks (RPLAD3), is designed in four layers and starts operating immediately after the initial state of the network. The experiments demonstrated that RPLAD3 outperforms the RPL protocol when defeating attacks with high accuracy and a true positive ratio while lowering power and energy consumption. In addition, it significantly improves the packet delivery ratio and decreases the false positive ratio to zero.
The COVID-19 pandemic introduced unprecedented challenges for people and governments. Vaccines are an available solution to this pandemic. Recipients of the vaccines are of different ages, gender, and religion. Muslims follow specific Islamic guidelines that prohibit them from taking a vaccine with certain ingredients. This study aims at analyzing Facebook and Twitter data to understand the discourse related to halal vaccines using aspect-based sentiment analysis and text emotion analysis. We searched for the term "halal vaccine" and limited the timeline to the period between 1 January 2020, and 30 April 2021, and collected 6037 tweets and 3918 Facebook posts. We performed data preprocessing on tweets and Facebook posts and built the Latent Dirichlet Allocation (LDA) model to identify topics. Calculating the sentiment analysis for each topic was the next step. Finally, this study further investigates emotions in the data using the National Research Council of Canada Emotion Lexicon. Our analysis identified four topics in each of the Twitter dataset and Facebook dataset. Two topics of "COVID-19 vaccine" and "halal vaccine" are shared between the two datasets. The other two topics in tweets are "halal certificate" and "must halal", while "sinovac vaccine" and "ulema council" are two other topics in the Facebook dataset. The sentiment analysis shows that the sentiment toward halal vaccine is mostly neutral in Twitter data, whereas it is positive in Facebook data. The emotion analysis indicates that trust is the most present emotion among the top three emotions in both datasets, followed by anticipation and fear.
Cloud computing (CC) has recently been receiving tremendous attention from the IT industry and academic researchers. CC leverages its unique services to cloud customers in a pay-as-you-go, anytime, anywhere manner. Cloud services provide dynamically scalable services through the Internet on demand. Therefore, service provisioning plays a key role in CC. The cloud customer must be able to select appropriate services according to his or her needs. Several approaches have been proposed to solve the service selection problem, including multicriteria decision analysis (MCDA). MCDA enables the user to choose from among a number of available choices. In this paper, we analyze the application of MCDA to service selection in CC. We identify and synthesize several MCDA techniques and provide a comprehensive analysis of this technology for general readers. In addition, we present a taxonomy derived from a survey of the current literature. Finally, we highlight several state-of-the-art practical aspects of MCDA implementation in cloud computing service selection. The contributions of this study are four-fold: (a) focusing on the state-of-the-art MCDA techniques, (b) highlighting the comparative analysis and suitability of several MCDA methods, (c) presenting a taxonomy through extensive literature review, and (d) analyzing and summarizing the cloud computing service selections in different scenarios.
In this study the environmental impact of consolidated rice farms (CF) - farms which have been integrated to increase the mechanization index - and traditional farms (TF) - small farms with lower mechanization index - in Guilan Province, Iran, were evaluated and compared using Life cycle assessment (LCA) methodology and adaptive neuro-fuzzy inference system (ANFIS). Foreground data were collected from farmers using face-to-face questionnaires and background information about production process and inventory data was taken from the EcoInvent®2.0 database. The system boundary was confined to within the farm gate (cradle to farm gate) and two functional units (land and mass based) were chosen. The study also included a comparison of the input-output energy flows of the farms. The results revealed that the average amount of energy consumed by the CFs was 57 GJ compared to 74.2 GJ for the TFs. The energy ratios for CFs and TFs were 1.6 and 0.9, respectively. The LCA results indicated that CFs produced fewer environmental burdens per ton of produced rice. When compared according to the land-based FU the same results were obtained. This indicates that the differences between the two types of farms were not caused by a difference in their production level, but rather by improved management on the CFs. The analysis also showed that electricity accounted for the greatest share of the impact for both types of farms, followed by P-based and N-based chemical fertilizers. These findings suggest that the CFs had superior overall environmental performance compared to the TFs in the study area. The performance metrics of the model based on ANFIS show that it can be used to predict the environmental burdens of rice production with high accuracy and minimal error.
To survey researchers' efforts in response to the new and disruptive technology of smartphone medical apps, mapping the research landscape form the literature into a coherent taxonomy, and finding out basic characteristics of this emerging field represented on: motivation of using smartphone apps in medicine and healthcare, open challenges that hinder the utility, and the recommendations to improve the acceptance and use of medical apps in the literature.
The increasing demand for Android mobile devices and blockchain has motivated malware creators to develop mobile malware to compromise the blockchain. Although the blockchain is secure, attackers have managed to gain access into the blockchain as legal users, thereby comprising important and crucial information. Examples of mobile malware include root exploit, botnets, and Trojans and root exploit is one of the most dangerous malware. It compromises the operating system kernel in order to gain root privileges which are then used by attackers to bypass the security mechanisms, to gain complete control of the operating system, to install other possible types of malware to the devices, and finally, to steal victims' private keys linked to the blockchain. For the purpose of maximizing the security of the blockchain-based medical data management (BMDM), it is crucial to investigate the novel features and approaches contained in root exploit malware. This study proposes to use the bio-inspired method of practical swarm optimization (PSO) which automatically select the exclusive features that contain the novel android debug bridge (ADB). This study also adopts boosting (adaboost, realadaboost, logitboost, and multiboost) to enhance the machine learning prediction that detects unknown root exploit, and scrutinized three categories of features including (1) system command, (2) directory path and (3) code-based. The evaluation gathered from this study suggests a marked accuracy value of 93% with Logitboost in the simulation. Logitboost also helped to predicted all the root exploit samples in our developed system, the root exploit detection system (RODS).