MyMedR

Displaying publications 1 - 20 of 65 in total

Abstract:

Sort:

Fulltext Text mining in mosquito-borne disease: A systematic review

Ong SQ, Pauzi MBM, Gan KH

Acta Trop, 2022 Jul;231:106447.
PMID: 35430265 DOI: 10.1016/j.actatropica.2022.106447

Mosquito-borne diseases are emerging and re-emerging across the globe, especially after the COVID19 pandemic. The recent advances in text mining in infectious diseases hold the potential of providing timely access to explicit and implicit associations among information in the text. In the past few years, the availability of online text data in the form of unstructured or semi-structured text with rich content of information from this domain enables many studies to provide solutions in this area, e.g., disease-related knowledge discovery, disease surveillance, early detection system, etc. However, a recent review of text mining in the domain of mosquito-borne disease was not available to the best of our knowledge. In this review, we survey the recent works in the text mining techniques used in combating mosquito-borne diseases. We highlight the corpus sources, technologies, applications, and the challenges faced by the studies, followed by the possible future directions that can be taken further in this domain. We present a bibliometric analysis of the 294 scientific articles that have been published in Scopus and PubMed in the domain of text mining in mosquito-borne diseases, from the year 2016 to 2021. The papers were further filtered and reviewed based on the techniques used to analyze the text related to mosquito-borne diseases. Based on the corpus of 158 selected articles, we found 27 of the articles were relevant and used text mining in mosquito-borne diseases. These articles covered the majority of Zika (38.70%), Dengue (32.26%), and Malaria (29.03%), with extremely low numbers or none of the other crucial mosquito-borne diseases like chikungunya, yellow fever, West Nile fever. Twitter was the dominant corpus resource to perform text mining in mosquito-borne diseases, followed by PubMed and LexisNexis databases. Sentiment analysis was the most popular technique of text mining to understand the discourse of the disease and followed by information extraction, which dependency relation and co-occurrence-based approach to extract relations and events. Surveillance was the main usage of most of the reviewed studies and followed by treatment, which focused on the drug-disease or symptom-disease association. The advance in text mining could improve the management of mosquito-borne diseases. However, the technique and application posed many limitations and challenges, including biases like user authentication and language, real-world implementation, etc. We discussed the future direction which can be useful to expand this area and domain. This review paper contributes mainly as a library for text mining in mosquito-borne diseases and could further explore the system for other neglected diseases.

Matched MeSH terms: Data Mining
Visualizing disaster attitudes resulting from terrorist activities

Khalid HM, Helander MG, Hood NA

Appl Ergon, 2013 Sep;44(5):671-9.
PMID: 22944486 DOI: 10.1016/j.apergo.2012.06.005

The purpose of this study was to analyze people's attitudes to disasters by investigating how people feel, behave and think during disasters. We focused on disasters induced by humans, such as terrorist attacks. Two types of textual information were collected - from Internet blogs and from research papers. The analysis enabled forecasting of attitudes for the design of proactive disaster advisory scheme. Text was analyzed using a text mining tool, Leximancer. The outcome of this analysis revealed core themes and concepts in the text concerning people's attitudes. The themes and concepts were sorted into three broad categories: Affect, Behaviour, and Cognition (ABC), and the data was visualized in semantic maps. The maps reveal several knowledge pathways of ABC for developing attitudinal ontologies, which describe the relations between affect, behaviour and cognition, and the sequence in which they develop. Clearly, terrorist attacks induced trauma and people became highly vulnerable.

Matched MeSH terms: Data Mining
Suicidal behaviour prediction models using machine learning techniques: A systematic review

Nordin N, Zainol Z, Mohd Noor MH, Chan LF

Artif Intell Med, 2022 10;132:102395.
PMID: 36207078 DOI: 10.1016/j.artmed.2022.102395

BACKGROUND: Early detection and prediction of suicidal behaviour are key factors in suicide control. In conjunction with recent advances in the field of artificial intelligence, there is increasing research into how machine learning can assist in the detection, prediction and treatment of suicidal behaviour. Therefore, this study aims to provide a comprehensive review of the literature exploring machine learning techniques in the study of suicidal behaviour prediction.
METHODS: A search of four databases was conducted: Web of Science, PubMed, Dimensions, and Scopus for research papers dated between January 2016 and September 2021. The search keywords are 'data mining', 'machine learning' in combination with 'suicidal behaviour', 'suicide', 'suicide attempt', 'suicidal ideation', 'suicide plan' and 'self-harm'. The studies that used machine learning techniques were synthesized according to the countries of the articles, sample description, sample size, classification tasks, number of features used to develop the models, types of machine learning techniques, and evaluation of performance metrics.
RESULTS: Thirty-five empirical articles met the criteria to be included in the current review. We provide a general overview of machine learning techniques, examine the feature categories, describe methodological challenges, and suggest areas for improvement and research directions. Ensemble prediction models have been shown to be more accurate and useful than single prediction models.
CONCLUSIONS: Machine learning has great potential for improving estimates of future suicidal behaviour and monitoring changes in risk over time. Further research can address important challenges and potential opportunities that may contribute to significant advances in suicide prediction.

Matched MeSH terms: Data Mining
Fulltext Pooling and expanding registries of familial hypercholesterolaemia to assess gaps in care and improve disease management and outcomes: Rationale and design of the global EAS Familial Hypercholesterolaemia Studies Collaboration

EAS Familial Hypercholesterolaemia Studies Collaboration, Vallejo-Vaz AJ, Akram A, Kondapally Seshasai SR, Cole D, Watts GF, et al.

Atheroscler Suppl, 2016 Dec;22:1-32.
PMID: 27939304 DOI: 10.1016/j.atherosclerosissup.2016.10.001

The potential for global collaborations to better inform public health policy regarding major non-communicable diseases has been successfully demonstrated by several large-scale international consortia. However, the true public health impact of familial hypercholesterolaemia (FH), a common genetic disorder associated with premature cardiovascular disease, is yet to be reliably ascertained using similar approaches. The European Atherosclerosis Society FH Studies Collaboration (EAS FHSC) is a new initiative of international stakeholders which will help establish a global FH registry to generate large-scale, robust data on the burden of FH worldwide.

Matched MeSH terms: Data Mining
Using fuzzy association rule mining in cancer classification

Mahmoodian H, Hamiruce Marhaban M, Abdulrahim R, Rosli R, Saripan I

Australas Phys Eng Sci Med, 2011 Apr;34(1):41-54.
PMID: 21327594 DOI: 10.1007/s13246-011-0054-8

The classification of the cancer tumors based on gene expression profiles has been extensively studied in numbers of studies. A wide variety of cancer datasets have been implemented by the various methods of gene selection and classification to identify the behavior of the genes in tumors and find the relationships between them and outcome of diseases. Interpretability of the model, which is developed by fuzzy rules and linguistic variables in this study, has been rarely considered. In addition, creating a fuzzy classifier with high performance in classification that uses a subset of significant genes which have been selected by different types of gene selection methods is another goal of this study. A new algorithm has been developed to identify the fuzzy rules and significant genes based on fuzzy association rule mining. At first, different subset of genes which have been selected by different methods, were used to generate primary fuzzy classifiers separately and then proposed algorithm was implemented to mix the genes which have been associated in the primary classifiers and generate a new classifier. The results show that fuzzy classifier can classify the tumors with high performance while presenting the relationships between the genes by linguistic variables.

Matched MeSH terms: Data Mining/methods
Fulltext In-vitro diagnosis of single and poly microbial species targeted for diabetic foot infection using e-nose technology

Yusuf N, Zakaria A, Omar MI, Shakaff AY, Masnan MJ, Kamarudin LM, et al.

BMC Bioinformatics, 2015;16:158.
PMID: 25971258 DOI: 10.1186/s12859-015-0601-5

Effective management of patients with diabetic foot infection is a crucial concern. A delay in prescribing appropriate antimicrobial agent can lead to amputation or life threatening complications. Thus, this electronic nose (e-nose) technique will provide a diagnostic tool that will allow for rapid and accurate identification of a pathogen.

Matched MeSH terms: Data Mining
Fulltext A phylogenomic approach to bacterial subspecies classification: proof of concept in Mycobacterium abscessus

Tan JL, Khang TF, Ngeow YF, Choo SW

BMC Genomics, 2013;14:879.
PMID: 24330254 DOI: 10.1186/1471-2164-14-879

Mycobacterium abscessus is a rapidly growing mycobacterium that is often associated with human infections. The taxonomy of this species has undergone several revisions and is still being debated. In this study, we sequenced the genomes of 12 M. abscessus strains and used phylogenomic analysis to perform subspecies classification.

Matched MeSH terms: Data Mining
Fulltext A novel approach for heart disease prediction using strength scores with significant predictors

Yazdani A, Varathan KD, Chiam YK, Malik AW, Wan Ahmad WA

BMC Med Inform Decis Mak, 2021 06 21;21(1):194.
PMID: 34154576 DOI: 10.1186/s12911-021-01527-5

BACKGROUND: Cardiovascular disease is the leading cause of death in many countries. Physicians often diagnose cardiovascular disease based on current clinical tests and previous experience of diagnosing patients with similar symptoms. Patients who suffer from heart disease require quick diagnosis, early treatment and constant observations. To address their needs, many data mining approaches have been used in the past in diagnosing and predicting heart diseases. Previous research was also focused on identifying the significant contributing features to heart disease prediction, however, less importance was given to identifying the strength of these features.
METHOD: This paper is motivated by the gap in the literature, thus proposes an algorithm that measures the strength of the significant features that contribute to heart disease prediction. The study is aimed at predicting heart disease based on the scores of significant features using Weighted Associative Rule Mining.
RESULTS: A set of important feature scores and rules were identified in diagnosing heart disease and cardiologists were consulted to confirm the validity of these rules. The experiments performed on the UCI open dataset, widely used for heart disease research yielded the highest confidence score of 98% in predicting heart disease.
CONCLUSION: This study managed to provide a significant contribution in computing the strength scores with significant predictors in heart disease prediction. From the evaluation results, we obtained important rules and achieved highest confidence score by utilizing the computed strength scores of significant predictors on Weighted Associative Rule Mining in predicting heart disease.

Matched MeSH terms: Data Mining
Fulltext Development and mining of a volatile organic compound database

Abdullah AA, Altaf-Ul-Amin M, Ono N, Sato T, Sugiura T, Morita AH, et al.

Biomed Res Int, 2015;2015:139254.
PMID: 26495281 DOI: 10.1155/2015/139254

Volatile organic compounds (VOCs) are small molecules that exhibit high vapor pressure under ambient conditions and have low boiling points. Although VOCs contribute only a small proportion of the total metabolites produced by living organisms, they play an important role in chemical ecology specifically in the biological interactions between organisms and ecosystems. VOCs are also important in the health care field as they are presently used as a biomarker to detect various human diseases. Information on VOCs is scattered in the literature until now; however, there is still no available database describing VOCs and their biological activities. To attain this purpose, we have developed KNApSAcK Metabolite Ecology Database, which contains the information on the relationships between VOCs and their emitting organisms. The KNApSAcK Metabolite Ecology is also linked with the KNApSAcK Core and KNApSAcK Metabolite Activity Database to provide further information on the metabolites and their biological activities. The VOC database can be accessed online.

Matched MeSH terms: Data Mining/methods*
Fulltext Risk prediction analysis for classifying type 2 diabetes occurrence using local dataset

M. Hafiz Fazren Abd Rahman, Wan Wardatul Amani Wan Salim, M. Firdaus Abd-Wahab

Biological and Natural Resources Engineering Journal, 2020;3(1):48-61.
MyJurnal

The steep rise of cases pertaining to Diabetes Mellitus (DM) condition among global population has encouraged extensive researches on DM, which led to exhaustive accumulation of data related to DM. In this case, data mining and machine learning applications prove to be a powerful tool in transforming data into meaningful deductions. Several machine learning tools have shown great promise in diabetes classification. However, challenges remain in obtaining an accurate model suitable for real world application. Most disease risk-prediction modelling are found to be specific to a local population. Moreover, real-world data are likely to be complex, incomplete and unorganized, thus, convoluting efforts to develop models around it. This research aims to develop a robust prediction model for classification of type 2 diabetes mellitus (T2DM), with the interest of a Malaysian population, using three different machine learning algorithms; Decision Tree, Support Vector Machine and Naïve Bayes. Data pre-processing methods are utilised to the raw data to improve model performance. This study uses datasets obtained from the IIUM Medical Centre for classification and modelling. Ultimately, the performance of each model is validated, evaluated and compared based on several statistical metrics that measures accuracy, precision, sensitivity and efficiency. This study shows that the random forest model provides the best overall prediction performance in terms of accuracy (0.87), sensitivity (0.9), specificity (0.8), precision (0.9), F1-score (0.9) and AUC value (0.93) (Normal).

Matched MeSH terms: Data Mining
Current advance methods for the identification of blast resistance genes in rice

Tanweer FA, Rafii MY, Sijam K, Rahim HA, Ahmed F, Latif MA

C. R. Biol., 2015 May;338(5):321-34.
PMID: 25843222 DOI: 10.1016/j.crvi.2015.03.001

Rice blast caused by Magnaporthe oryzae is one of the most devastating diseases of rice around the world and crop losses due to blast are considerably high. Many blast resistant rice varieties have been developed by classical plant breeding and adopted by farmers in various rice-growing countries. However, the variability in the pathogenicity of the blast fungus according to environment made blast disease a major concern for farmers, which remains a threat to the rice industry. With the utilization of molecular techniques, plant breeders have improved rice production systems and minimized yield losses. In this article, we have summarized the current advanced molecular techniques used for controlling blast disease. With the advent of new technologies like marker-assisted selection, molecular mapping, map-based cloning, marker-assisted backcrossing and allele mining, breeders have identified more than 100 Pi loci and 350 QTL in rice genome responsible for blast disease. These Pi genes and QTLs can be introgressed into a blast-susceptible cultivar through marker-assisted backcross breeding. These molecular techniques provide timesaving, environment friendly and labour-cost-saving ways to control blast disease. The knowledge of host-plant interactions in the frame of blast disease will lead to develop resistant varieties in the future.

Matched MeSH terms: Data Mining
Fulltext First comprehensive in silico analysis of the functional and structural consequences of SNPs in human GalNAc-T1 gene

Mohamoud HS, Hussain MR, El-Harouni AA, Shaik NA, Qasmi ZU, Merican AF, et al.

Comput Math Methods Med, 2014;2014:904052.
PMID: 24723968 DOI: 10.1155/2014/904052

GalNAc-T1, a key candidate of GalNac-transferases genes family that is involved in mucin-type O-linked glycosylation pathway, is expressed in most biological tissues and cell types. Despite the reported association of GalNAc-T1 gene mutations with human disease susceptibility, the comprehensive computational analysis of coding, noncoding and regulatory SNPs, and their functional impacts on protein level, still remains unknown. Therefore, sequence- and structure-based computational tools were employed to screen the entire listed coding SNPs of GalNAc-T1 gene in order to identify and characterize them. Our concordant in silico analysis by SIFT, PolyPhen-2, PANTHER-cSNP, and SNPeffect tools, identified the potential nsSNPs (S143P, G258V, and Y414D variants) from 18 nsSNPs of GalNAc-T1. Additionally, 2 regulatory SNPs (rs72964406 and #x26; rs34304568) were also identified in GalNAc-T1 by using FastSNP tool. Using multiple computational approaches, we have systematically classified the functional mutations in regulatory and coding regions that can modify expression and function of GalNAc-T1 enzyme. These genetic variants can further assist in better understanding the wide range of disease susceptibility associated with the mucin-based cell signalling and pathogenic binding, and may help to develop novel therapeutic elements for associated diseases.

Matched MeSH terms: Data Mining/methods
Fulltext An Efficient Optimization Method for Solving Unsupervised Data Classification Problems

Shabanzadeh P, Yusof R

Comput Math Methods Med, 2015;2015:802754.
PMID: 26336509 DOI: 10.1155/2015/802754

Unsupervised data classification (or clustering) analysis is one of the most useful tools and a descriptive task in data mining that seeks to classify homogeneous groups of objects based on similarity and is used in many medical disciplines and various applications. In general, there is no single algorithm that is suitable for all types of data, conditions, and applications. Each algorithm has its own advantages, limitations, and deficiencies. Hence, research for novel and effective approaches for unsupervised data classification is still active. In this paper a heuristic algorithm, Biogeography-Based Optimization (BBO) algorithm, was adapted for data clustering problems by modifying the main operators of BBO algorithm, which is inspired from the natural biogeography distribution of different species. Similar to other population-based algorithms, BBO algorithm starts with an initial population of candidate solutions to an optimization problem and an objective function that is calculated for them. To evaluate the performance of the proposed algorithm assessment was carried on six medical and real life datasets and was compared with eight well known and recent unsupervised data classification algorithms. Numerical results demonstrate that the proposed evolutionary optimization algorithm is efficient for unsupervised data classification.

Matched MeSH terms: Data Mining/methods*; Data Mining/statistics & numerical data
A new machine learning technique for an accurate diagnosis of coronary artery disease

Abdar M, Książek W, Acharya UR, Tan RS, Makarenkov V, Pławiak P

Comput Methods Programs Biomed, 2019 Oct;179:104992.
PMID: 31443858 DOI: 10.1016/j.cmpb.2019.104992

BACKGROUND AND OBJECTIVE: Coronary artery disease (CAD) is one of the commonest diseases around the world. An early and accurate diagnosis of CAD allows a timely administration of appropriate treatment and helps to reduce the mortality. Herein, we describe an innovative machine learning methodology that enables an accurate detection of CAD and apply it to data collected from Iranian patients.
METHODS: We first tested ten traditional machine learning algorithms, and then the three-best performing algorithms (three types of SVM) were used in the rest of the study. To improve the performance of these algorithms, a data preprocessing with normalization was carried out. Moreover, a genetic algorithm and particle swarm optimization, coupled with stratified 10-fold cross-validation, were used twice: for optimization of classifier parameters and for parallel selection of features.
RESULTS: The presented approach enhanced the performance of all traditional machine learning algorithms used in this study. We also introduced a new optimization technique called N2Genetic optimizer (a new genetic training). Our experiments demonstrated that N2Genetic-nuSVM provided the accuracy of 93.08% and F1-score of 91.51% when predicting CAD outcomes among the patients included in a well-known Z-Alizadeh Sani dataset. These results are competitive and comparable to the best results in the field.
CONCLUSIONS: We showed that machine-learning techniques optimized by the proposed approach, can lead to highly accurate models intended for both clinical and research use.

Matched MeSH terms: Data Mining/statistics & numerical data
Machine learning-based coronary artery disease diagnosis: A comprehensive review

Alizadehsani R, Abdar M, Roshanzamir M, Khosravi A, Kebria PM, Khozeimeh F, et al.

Comput Biol Med, 2019 08;111:103346.
PMID: 31288140 DOI: 10.1016/j.compbiomed.2019.103346

Coronary artery disease (CAD) is the most common cardiovascular disease (CVD) and often leads to a heart attack. It annually causes millions of deaths and billions of dollars in financial losses worldwide. Angiography, which is invasive and risky, is the standard procedure for diagnosing CAD. Alternatively, machine learning (ML) techniques have been widely used in the literature as fast, affordable, and noninvasive approaches for CAD detection. The results that have been published on ML-based CAD diagnosis differ substantially in terms of the analyzed datasets, sample sizes, features, location of data collection, performance metrics, and applied ML techniques. Due to these fundamental differences, achievements in the literature cannot be generalized. This paper conducts a comprehensive and multifaceted review of all relevant studies that were published between 1992 and 2019 for ML-based CAD diagnosis. The impacts of various factors, such as dataset characteristics (geographical location, sample size, features, and the stenosis of each coronary artery) and applied ML techniques (feature selection, performance metrics, and method) are investigated in detail. Finally, the important challenges and shortcomings of ML-based CAD diagnosis are discussed.

Matched MeSH terms: Data Mining
Allele Mining Strategies: Principles and Utilisation for Blast Resistance Genes in Rice (Oryza sativa L.)

Ashkani S, Yusop MR, Shabanimofrad M, Azady A, Ghasemzadeh A, Azizi P, et al.

Curr Issues Mol Biol, 2015;17:57-73.
PMID: 25706446

Allele mining is a promising way to dissect naturally occurring allelic variants of candidate genes with essential agronomic qualities. With the identification, isolation and characterisation of blast resistance genes in rice, it is now possible to dissect the actual allelic variants of these genes within an array of rice cultivars via allele mining. Multiple alleles from the complex locus serve as a reservoir of variation to generate functional genes. The routine sequence exchange is one of the main mechanisms of R gene evolution and development. Allele mining for resistance genes can be an important method to identify additional resistance alleles and new haplotypes along with the development of allele-specific markers for use in marker-assisted selection. Allele mining can be visualised as a vital link between effective utilisation of genetic and genomic resources in genomics-driven modern plant breeding. This review studies the actual concepts and potential of mining approaches for the discovery of alleles and their utilisation for blast resistance genes in rice. The details provided here will be important to provide the rice breeder with a worthwhile introduction to allele mining and its methodology for breakthrough discovery of fresh alleles hidden in hereditary diversity, which is vital for crop improvement.

Matched MeSH terms: Data Mining/methods*
Natural Sirtuin Modulators in Drug Discovery: A Review (2010 -2020)

Chang Y, Yeong KY

Curr Med Chem, 2021 Mar 29.
PMID: 33781187 DOI: 10.2174/0929867328666210329124415

There have been intense research interests in sirtuins since the establishment of their regulatory roles in a myriad of pathological processes. In the last two decades, much research efforts have been dedicated to the development of sirtuin modulators. Although synthetic sirtuin modulators are the focus, natural modulators remain an integral part to be further explored in this area as they are found to possess therapeutic potential in various diseases including cancers, neurodegenerative diseases, and metabolic disorders. Owing to the importance of this cluster of compounds, this review gives a current stand on the naturally occurring sirtuin modulators, , associated molecular mechanisms and their therapeutic benefits.. Furthermore, comprehensive data mining resulted in detailed statistical data analyses pertaining to the development trend of sirtuin modulators from 2010-2020. Lastly, the challenges and future prospect of natural sirtuin modulators in drug discovery will also be discussed.

Matched MeSH terms: Data Mining
Application of a GIS-/remote sensing-based approach for predicting groundwater potential zones using a multi-criteria data mining methodology

Mogaji KA, Lim HS

Environ Monit Assess, 2017 Jul;189(7):321.
PMID: 28593561 DOI: 10.1007/s10661-017-5990-7

This study integrates the application of Dempster-Shafer-driven evidential belief function (DS-EBF) methodology with remote sensing and geographic information system techniques to analyze surface and subsurface data sets for the spatial prediction of groundwater potential in Perak Province, Malaysia. The study used additional data obtained from the records of the groundwater yield rate of approximately 28 bore well locations. The processed surface and subsurface data produced sets of groundwater potential conditioning factors (GPCFs) from which multiple surface hydrologic and subsurface hydrogeologic parameter thematic maps were generated. The bore well location inventories were partitioned randomly into a ratio of 70% (19 wells) for model training to 30% (9 wells) for model testing. Application results of the DS-EBF relationship model algorithms of the surface- and subsurface-based GPCF thematic maps and the bore well locations produced two groundwater potential prediction (GPP) maps based on surface hydrologic and subsurface hydrogeologic characteristics which established that more than 60% of the study area falling within the moderate-high groundwater potential zones and less than 35% falling within the low potential zones. The estimated uncertainty values within the range of 0 to 17% for the predicted potential zones were quantified using the uncertainty algorithm of the model. The validation results of the GPP maps using relative operating characteristic curve method yielded 80 and 68% success rates and 89 and 53% prediction rates for the subsurface hydrogeologic factor (SUHF)- and surface hydrologic factor (SHF)-based GPP maps, respectively. The study results revealed that the SUHF-based GPP map accurately delineated groundwater potential zones better than the SHF-based GPP map. However, significant information on the low degree of uncertainty of the predicted potential zones established the suitability of the two GPP maps for future development of groundwater resources in the area. The overall results proved the efficacy of the data mining model and the geospatial technology in groundwater potential mapping.

Matched MeSH terms: Data Mining*
Fulltext BioHackathon 2015: Semantics of data for life sciences and reproducible research

Vos RA, Katayama T, Mishima H, Kawano S, Kawashima S, Kim JD, et al.

F1000Res, 2020;9:136.
PMID: 32308977 DOI: 10.12688/f1000research.18236.1

We report on the activities of the 2015 edition of the BioHackathon, an annual event that brings together researchers and developers from around the world to develop tools and technologies that promote the reusability of biological data. We discuss issues surrounding the representation, publication, integration, mining and reuse of biological data and metadata across a wide range of biomedical data types of relevance for the life sciences, including chemistry, genotypes and phenotypes, orthology and phylogeny, proteomics, genomics, glycomics, and metabolomics. We describe our progress to address ongoing challenges to the reusability and reproducibility of research results, and identify outstanding issues that continue to impede the progress of bioinformatics research. We share our perspective on the state of the art, continued challenges, and goals for future research and development for the life sciences Semantic Web.

Matched MeSH terms: Data Mining
An evolution of image source camera attribution approaches

Jahanirad M, Wahab AW, Anuar NB

Forensic Sci Int, 2016 May;262:242-75.
PMID: 27060542 DOI: 10.1016/j.forsciint.2016.03.035

Camera attribution plays an important role in digital image forensics by providing the evidence and distinguishing characteristics of the origin of the digital image. It allows the forensic analyser to find the possible source camera which captured the image under investigation. However, in real-world applications, these approaches have faced many challenges due to the large set of multimedia data publicly available through photo sharing and social network sites, captured with uncontrolled conditions and undergone variety of hardware and software post-processing operations. Moreover, the legal system only accepts the forensic analysis of the digital image evidence if the applied camera attribution techniques are unbiased, reliable, nondestructive and widely accepted by the experts in the field. The aim of this paper is to investigate the evolutionary trend of image source camera attribution approaches from fundamental to practice, in particular, with the application of image processing and data mining techniques. Extracting implicit knowledge from images using intrinsic image artifacts for source camera attribution requires a structured image mining process. In this paper, we attempt to provide an introductory tutorial on the image processing pipeline, to determine the general classification of the features corresponding to different components for source camera attribution. The article also reviews techniques of the source camera attribution more comprehensively in the domain of the image forensics in conjunction with the presentation of classifying ongoing developments within the specified area. The classification of the existing source camera attribution approaches is presented based on the specific parameters, such as colour image processing pipeline, hardware- and software-related artifacts and the methods to extract such artifacts. The more recent source camera attribution approaches, which have not yet gained sufficient attention among image forensics researchers, are also critically analysed and further categorised into four different classes, namely, optical aberrations based, sensor camera fingerprints based, processing statistics based and processing regularities based, to present a classification. Furthermore, this paper aims to investigate the challenging problems, and the proposed strategies of such schemes based on the suggested taxonomy to plot an evolution of the source camera attribution approaches with respect to the subjective optimisation criteria over the last decade. The optimisation criteria were determined based on the strategies proposed to increase the detection accuracy, robustness and computational efficiency of source camera brand, model or device attribution.

Matched MeSH terms: Data Mining

Filters

Please provide feedback to Administrator (afdal@afpm.org.my)

External Links