MyMedR

Displaying publications 1 - 20 of 65 in total

Abstract:

Sort:

Feature selection algorithms for Malaysian dengue outbreak detection model

Husam IS, Abuhamad, Azuraliza Abu Bakar, Suhaila Zainudin, Mazrura Sahani, Zainudin Mohd Ali

Sains Malaysiana, 2017;46:255-265.

Dengue fever is considered as one of the most common mosquito borne diseases worldwide. Dengue outbreak detection can be very useful in terms of practical efforts to overcome the rapid spread of the disease by providing the knowledge to predict the next outbreak occurrence. Many studies have been conducted to model and predict dengue outbreak using different data mining techniques. This research aimed to identify the best features that lead to better predictive accuracy of dengue outbreaks using three different feature selection algorithms; particle swarm optimization (PSO), genetic algorithm (GA) and rank search (RS). Based on the selected features, three predictive modeling techniques (J48, DTNB and Naive Bayes) were applied for dengue outbreak detection. The dataset used in this research was obtained from the Public Health Department, Seremban, Negeri Sembilan, Malaysia. The experimental results showed that the predictive accuracy was improved by applying feature selection process before the predictive modeling process. The study also showed the set of features to represent dengue outbreak detection for Malaysian health agencies.

Matched MeSH terms: Data Mining
Fulltext An Efficient Optimization Method for Solving Unsupervised Data Classification Problems

Shabanzadeh P, Yusof R

Comput Math Methods Med, 2015;2015:802754.
PMID: 26336509 DOI: 10.1155/2015/802754

Unsupervised data classification (or clustering) analysis is one of the most useful tools and a descriptive task in data mining that seeks to classify homogeneous groups of objects based on similarity and is used in many medical disciplines and various applications. In general, there is no single algorithm that is suitable for all types of data, conditions, and applications. Each algorithm has its own advantages, limitations, and deficiencies. Hence, research for novel and effective approaches for unsupervised data classification is still active. In this paper a heuristic algorithm, Biogeography-Based Optimization (BBO) algorithm, was adapted for data clustering problems by modifying the main operators of BBO algorithm, which is inspired from the natural biogeography distribution of different species. Similar to other population-based algorithms, BBO algorithm starts with an initial population of candidate solutions to an optimization problem and an objective function that is calculated for them. To evaluate the performance of the proposed algorithm assessment was carried on six medical and real life datasets and was compared with eight well known and recent unsupervised data classification algorithms. Numerical results demonstrate that the proposed evolutionary optimization algorithm is efficient for unsupervised data classification.

Matched MeSH terms: Data Mining/methods*; Data Mining/statistics & numerical data
Natural Sirtuin Modulators in Drug Discovery: A Review (2010 -2020)

Chang Y, Yeong KY

Curr Med Chem, 2021 Mar 29.
PMID: 33781187 DOI: 10.2174/0929867328666210329124415

There have been intense research interests in sirtuins since the establishment of their regulatory roles in a myriad of pathological processes. In the last two decades, much research efforts have been dedicated to the development of sirtuin modulators. Although synthetic sirtuin modulators are the focus, natural modulators remain an integral part to be further explored in this area as they are found to possess therapeutic potential in various diseases including cancers, neurodegenerative diseases, and metabolic disorders. Owing to the importance of this cluster of compounds, this review gives a current stand on the naturally occurring sirtuin modulators, , associated molecular mechanisms and their therapeutic benefits.. Furthermore, comprehensive data mining resulted in detailed statistical data analyses pertaining to the development trend of sirtuin modulators from 2010-2020. Lastly, the challenges and future prospect of natural sirtuin modulators in drug discovery will also be discussed.

Matched MeSH terms: Data Mining
Fulltext In-vitro diagnosis of single and poly microbial species targeted for diabetic foot infection using e-nose technology

Yusuf N, Zakaria A, Omar MI, Shakaff AY, Masnan MJ, Kamarudin LM, et al.

BMC Bioinformatics, 2015;16:158.
PMID: 25971258 DOI: 10.1186/s12859-015-0601-5

Effective management of patients with diabetic foot infection is a crucial concern. A delay in prescribing appropriate antimicrobial agent can lead to amputation or life threatening complications. Thus, this electronic nose (e-nose) technique will provide a diagnostic tool that will allow for rapid and accurate identification of a pathogen.

Matched MeSH terms: Data Mining
Fulltext Order batching in warehouses by minimizing total tardiness: a hybrid approach of weighted association rule mining and genetic algorithms

Azadnia AH, Taheri S, Ghadimi P, Saman MZ, Wong KY

ScientificWorldJournal, 2013;2013:246578.
PMID: 23864823 DOI: 10.1155/2013/246578

One of the cost-intensive issues in managing warehouses is the order picking problem which deals with the retrieval of items from their storage locations in order to meet customer requests. Many solution approaches have been proposed in order to minimize traveling distance in the process of order picking. However, in practice, customer orders have to be completed by certain due dates in order to avoid tardiness which is neglected in most of the related scientific papers. Consequently, we proposed a novel solution approach in order to minimize tardiness which consists of four phases. First of all, weighted association rule mining has been used to calculate associations between orders with respect to their due date. Next, a batching model based on binary integer programming has been formulated to maximize the associations between orders within each batch. Subsequently, the order picking phase will come up which used a Genetic Algorithm integrated with the Traveling Salesman Problem in order to identify the most suitable travel path. Finally, the Genetic Algorithm has been applied for sequencing the constructed batches in order to minimize tardiness. Illustrative examples and comparisons are presented to demonstrate the proficiency and solution quality of the proposed approach.

Matched MeSH terms: Data Mining/methods*
Fulltext A hybrid interpretable deep structure based on adaptive neuro-fuzzy inference system, decision tree, and K-means for intrusion detection

Liu J, Yinchai W, Siong TC, Li X, Zhao L, Wei F

Sci Rep, 2022 Dec 01;12(1):20770.
PMID: 36456582 DOI: 10.1038/s41598-022-23765-x

For generating an interpretable deep architecture for identifying deep intrusion patterns, this study proposes an approach that combines ANFIS (Adaptive Network-based Fuzzy Inference System) and DT (Decision Tree) for interpreting the deep pattern of intrusion detection. Meanwhile, for improving the efficiency of training and predicting, Pearson Correlation analysis, standard deviation, and a new adaptive K-means are used to select attributes and make fuzzy interval decisions. The proposed algorithm was trained, validated, and tested on the NSL-KDD (National security lab-knowledge discovery and data mining) dataset. Using 22 attributes that highly related to the target, the performance of the proposed method achieves a 99.86% detection rate and 0.14% false alarm rate on the KDDTrain+ dataset, a 77.46% detection rate on the KDDTest+ dataset, which is better than many classifiers. Besides, the interpretable model can help us demonstrate the complex and overlapped pattern of intrusions and analyze the pattern of various intrusions.

Matched MeSH terms: Data Mining
Fulltext A novel approach for heart disease prediction using strength scores with significant predictors

Yazdani A, Varathan KD, Chiam YK, Malik AW, Wan Ahmad WA

BMC Med Inform Decis Mak, 2021 06 21;21(1):194.
PMID: 34154576 DOI: 10.1186/s12911-021-01527-5

BACKGROUND: Cardiovascular disease is the leading cause of death in many countries. Physicians often diagnose cardiovascular disease based on current clinical tests and previous experience of diagnosing patients with similar symptoms. Patients who suffer from heart disease require quick diagnosis, early treatment and constant observations. To address their needs, many data mining approaches have been used in the past in diagnosing and predicting heart diseases. Previous research was also focused on identifying the significant contributing features to heart disease prediction, however, less importance was given to identifying the strength of these features.
METHOD: This paper is motivated by the gap in the literature, thus proposes an algorithm that measures the strength of the significant features that contribute to heart disease prediction. The study is aimed at predicting heart disease based on the scores of significant features using Weighted Associative Rule Mining.
RESULTS: A set of important feature scores and rules were identified in diagnosing heart disease and cardiologists were consulted to confirm the validity of these rules. The experiments performed on the UCI open dataset, widely used for heart disease research yielded the highest confidence score of 98% in predicting heart disease.
CONCLUSION: This study managed to provide a significant contribution in computing the strength scores with significant predictors in heart disease prediction. From the evaluation results, we obtained important rules and achieved highest confidence score by utilizing the computed strength scores of significant predictors on Weighted Associative Rule Mining in predicting heart disease.

Matched MeSH terms: Data Mining
Fulltext A Comparison Study on Similarity and Dissimilarity Measures in Clustering Continuous Data

Shirkhorshidi AS, Aghabozorgi S, Wah TY

PLoS One, 2015;10(12):e0144059.
PMID: 26658987 DOI: 10.1371/journal.pone.0144059

Similarity or distance measures are core components used by distance-based clustering algorithms to cluster similar data points into the same clusters, while dissimilar or distant data points are placed into different clusters. The performance of similarity measures is mostly addressed in two or three-dimensional spaces, beyond which, to the best of our knowledge, there is no empirical study that has revealed the behavior of similarity measures when dealing with high-dimensional datasets. To fill this gap, a technical framework is proposed in this study to analyze, compare and benchmark the influence of different similarity measures on the results of distance-based clustering algorithms. For reproducibility purposes, fifteen publicly available datasets were used for this study, and consequently, future distance measures can be evaluated and compared with the results of the measures discussed in this work. These datasets were classified as low and high-dimensional categories to study the performance of each measure against each category. This research should help the research community to identify suitable distance measures for datasets and also to facilitate a comparison and evaluation of the newly proposed similarity or distance measures with traditional ones.

Matched MeSH terms: Data Mining/statistics & numerical data*
Gaining Insights on Nasopharyngeal Carcinoma Treatment Outcome Using Clinical Data Mining Techniques

Ghaibeh AA, Kasem A, Ng XJ, Nair HLK, Hirose J, Thiruchelvam V

Stud Health Technol Inform, 2018;247:386-390.
PMID: 29677988

The analysis of Electronic Health Records (EHRs) is attracting a lot of research attention in the medical informatics domain. Hospitals and medical institutes started to use data mining techniques to gain new insights from the massive amounts of data that can be made available through EHRs. Researchers in the medical field have often used descriptive statistics and classical statistical methods to prove assumed medical hypotheses. However, discovering new insights from large amounts of data solely based on experts' observations is difficult. Using data mining techniques and visualizations, practitioners can find hidden knowledge, identify interesting patterns, or formulate new hypotheses to be further investigated. This paper describes a work in progress on using data mining methods to analyze clinical data of Nasopharyngeal Carcinoma (NPC) cancer patients. NPC is the fifth most common cancer among Malaysians, and the data analyzed in this study was collected from three states in Malaysia (Kuala Lumpur, Sabah and Sarawak), and is considered to be the largest up-to-date dataset of its kind. This research is addressing the issue of cancer recurrence after the completion of radiotherapy and chemotherapy treatment. We describe the procedure, problems, and insights gained during the process.

Matched MeSH terms: Data Mining*
Fulltext A review of subsequence time series clustering

Zolhavarieh S, Aghabozorgi S, Teh YW

ScientificWorldJournal, 2014;2014:312521.
PMID: 25140332 DOI: 10.1155/2014/312521

Clustering of subsequence time series remains an open issue in time series clustering. Subsequence time series clustering is used in different fields, such as e-commerce, outlier detection, speech recognition, biological systems, DNA recognition, and text mining. One of the useful fields in the domain of subsequence time series clustering is pattern recognition. To improve this field, a sequence of time series data is used. This paper reviews some definitions and backgrounds related to subsequence time series clustering. The categorization of the literature reviews is divided into three groups: preproof, interproof, and postproof period. Moreover, various state-of-the-art approaches in performing subsequence time series clustering are discussed under each of the following categories. The strengths and weaknesses of the employed methods are evaluated as potential issues for future studies.

Matched MeSH terms: Data Mining*
Fulltext BioHackathon 2015: Semantics of data for life sciences and reproducible research

Vos RA, Katayama T, Mishima H, Kawano S, Kawashima S, Kim JD, et al.

F1000Res, 2020;9:136.
PMID: 32308977 DOI: 10.12688/f1000research.18236.1

We report on the activities of the 2015 edition of the BioHackathon, an annual event that brings together researchers and developers from around the world to develop tools and technologies that promote the reusability of biological data. We discuss issues surrounding the representation, publication, integration, mining and reuse of biological data and metadata across a wide range of biomedical data types of relevance for the life sciences, including chemistry, genotypes and phenotypes, orthology and phylogeny, proteomics, genomics, glycomics, and metabolomics. We describe our progress to address ongoing challenges to the reusability and reproducibility of research results, and identify outstanding issues that continue to impede the progress of bioinformatics research. We share our perspective on the state of the art, continued challenges, and goals for future research and development for the life sciences Semantic Web.

Matched MeSH terms: Data Mining
A data mining approach to analyze the role of biomacromolecules-based nanocomposites in sustainable packaging

Paul J, Jacob J, Mahmud M, Vaka M, Krishnan SG, Arifutzzaman A, et al.

Int J Biol Macromol, 2024 Apr;265(Pt 2):130850.
PMID: 38492706 DOI: 10.1016/j.ijbiomac.2024.130850

Recent decades have witnessed a surge in research interest in bio-nanocomposite-based packaging materials, but still, a lack of systematic analysis exists in this domain. Bio-based packaging materials pose a sustainable alternative to petroleum-based packaging materials. The current work employs bibliometric analysis to deliver a comprehensive outline on the role of bio nanocomposites in packaging. India, Iran, and China were revealed to be the top three nations actively engaged in this domain in total publications. Islamic Azad University in Iran and Universiti Putra Malaysia in Malaysia are among the world's best institutions in active research and publications in this field. The extensive collaboration between nations and institutions highlights the significance of a holistic approach towards bio-nanocomposite. The National Natural Science Foundation of China is the leading funding body in this field of research. Among authors, Jong whan Rhim secured the topmost citations (2234) in this domain (13 publications). Among journals, Carbohydrate Polymers secured the maximum citation count (4629) from 36 articles; the initial one was published in 2011. Bio nanocomposite is the most frequently used keyword. Researchers and policymakers focussing on sustainable packaging solutions will gain crucial insights on the current research status on packaging solutions using bio-nanocomposites from the conclusions.

Matched MeSH terms: Data Mining
Using fuzzy association rule mining in cancer classification

Mahmoodian H, Hamiruce Marhaban M, Abdulrahim R, Rosli R, Saripan I

Australas Phys Eng Sci Med, 2011 Apr;34(1):41-54.
PMID: 21327594 DOI: 10.1007/s13246-011-0054-8

The classification of the cancer tumors based on gene expression profiles has been extensively studied in numbers of studies. A wide variety of cancer datasets have been implemented by the various methods of gene selection and classification to identify the behavior of the genes in tumors and find the relationships between them and outcome of diseases. Interpretability of the model, which is developed by fuzzy rules and linguistic variables in this study, has been rarely considered. In addition, creating a fuzzy classifier with high performance in classification that uses a subset of significant genes which have been selected by different types of gene selection methods is another goal of this study. A new algorithm has been developed to identify the fuzzy rules and significant genes based on fuzzy association rule mining. At first, different subset of genes which have been selected by different methods, were used to generate primary fuzzy classifiers separately and then proposed algorithm was implemented to mix the genes which have been associated in the primary classifiers and generate a new classifier. The results show that fuzzy classifier can classify the tumors with high performance while presenting the relationships between the genes by linguistic variables.

Matched MeSH terms: Data Mining/methods
Fulltext Chemical named entities recognition: a review on approaches and applications

Eltyeb S, Salim N

J Cheminform, 2014;6:17.
PMID: 24834132 DOI: 10.1186/1758-2946-6-17

The rapid increase in the flow rate of published digital information in all disciplines has resulted in a pressing need for techniques that can simplify the use of this information. The chemistry literature is very rich with information about chemical entities. Extracting molecules and their related properties and activities from the scientific literature to "text mine" these extracted data and determine contextual relationships helps research scientists, particularly those in drug development. One of the most important challenges in chemical text mining is the recognition of chemical entities mentioned in the texts. In this review, the authors briefly introduce the fundamental concepts of chemical literature mining, the textual contents of chemical documents, and the methods of naming chemicals in documents. We sketch out dictionary-based, rule-based and machine learning, as well as hybrid chemical named entity recognition approaches with their applied solutions. We end with an outlook on the pros and cons of these approaches and the types of chemical entities extracted.

Matched MeSH terms: Data Mining
Fulltext Bioactive Molecule Prediction Using Extreme Gradient Boosting

Babajide Mustapha I, Saeed F

Molecules, 2016 Jul 28;21(8).
PMID: 27483216 DOI: 10.3390/molecules21080983

Following the explosive growth in chemical and biological data, the shift from traditional methods of drug discovery to computer-aided means has made data mining and machine learning methods integral parts of today's drug discovery process. In this paper, extreme gradient boosting (Xgboost), which is an ensemble of Classification and Regression Tree (CART) and a variant of the Gradient Boosting Machine, was investigated for the prediction of biological activity based on quantitative description of the compound's molecular structure. Seven datasets, well known in the literature were used in this paper and experimental results show that Xgboost can outperform machine learning algorithms like Random Forest (RF), Support Vector Machines (LSVM), Radial Basis Function Neural Network (RBFN) and Naïve Bayes (NB) for the prediction of biological activities. In addition to its ability to detect minority activity classes in highly imbalanced datasets, it showed remarkable performance on both high and low diversity datasets.

Matched MeSH terms: Data Mining/methods*
Fulltext A novel association rule mining approach using TID intermediate itemset

Aqra I, Herawan T, Abdul Ghani N, Akhunzada A, Ali A, Bin Razali R, et al.

PLoS One, 2018;13(1):e0179703.
PMID: 29351287 DOI: 10.1371/journal.pone.0179703

Designing an efficient association rule mining (ARM) algorithm for multilevel knowledge-based transactional databases that is appropriate for real-world deployments is of paramount concern. However, dynamic decision making that needs to modify the threshold either to minimize or maximize the output knowledge certainly necessitates the extant state-of-the-art algorithms to rescan the entire database. Subsequently, the process incurs heavy computation cost and is not feasible for real-time applications. The paper addresses efficiently the problem of threshold dynamic updation for a given purpose. The paper contributes by presenting a novel ARM approach that creates an intermediate itemset and applies a threshold to extract categorical frequent itemsets with diverse threshold values. Thus, improving the overall efficiency as we no longer needs to scan the whole database. After the entire itemset is built, we are able to obtain real support without the need of rebuilding the itemset (e.g. Itemset list is intersected to obtain the actual support). Moreover, the algorithm supports to extract many frequent itemsets according to a pre-determined minimum support with an independent purpose. Additionally, the experimental results of our proposed approach demonstrate the capability to be deployed in any mining system in a fully parallel mode; consequently, increasing the efficiency of the real-time association rules discovery process. The proposed approach outperforms the extant state-of-the-art and shows promising results that reduce computation cost, increase accuracy, and produce all possible itemsets.

Matched MeSH terms: Data Mining/methods*
Fulltext Pooling and expanding registries of familial hypercholesterolaemia to assess gaps in care and improve disease management and outcomes: Rationale and design of the global EAS Familial Hypercholesterolaemia Studies Collaboration

EAS Familial Hypercholesterolaemia Studies Collaboration, Vallejo-Vaz AJ, Akram A, Kondapally Seshasai SR, Cole D, Watts GF, et al.

Atheroscler Suppl, 2016 Dec;22:1-32.
PMID: 27939304 DOI: 10.1016/j.atherosclerosissup.2016.10.001

The potential for global collaborations to better inform public health policy regarding major non-communicable diseases has been successfully demonstrated by several large-scale international consortia. However, the true public health impact of familial hypercholesterolaemia (FH), a common genetic disorder associated with premature cardiovascular disease, is yet to be reliably ascertained using similar approaches. The European Atherosclerosis Society FH Studies Collaboration (EAS FHSC) is a new initiative of international stakeholders which will help establish a global FH registry to generate large-scale, robust data on the burden of FH worldwide.

Matched MeSH terms: Data Mining
Fast decision tree-based method to index large DNA-protein sequence databases using hybrid distributed-shared memory programming model

Jaber KM, Abdullah R, Rashid NA

Int J Bioinform Res Appl, 2014;10(3):321-40.
PMID: 24794073 DOI: 10.1504/IJBRA.2014.060765

In recent times, the size of biological databases has increased significantly, with the continuous growth in the number of users and rate of queries; such that some databases have reached the terabyte size. There is therefore, the increasing need to access databases at the fastest rates possible. In this paper, the decision tree indexing model (PDTIM) was parallelised, using a hybrid of distributed and shared memory on resident database; with horizontal and vertical growth through Message Passing Interface (MPI) and POSIX Thread (PThread), to accelerate the index building time. The PDTIM was implemented using 1, 2, 4 and 5 processors on 1, 2, 3 and 4 threads respectively. The results show that the hybrid technique improved the speedup, compared to a sequential version. It could be concluded from results that the proposed PDTIM is appropriate for large data sets, in terms of index building time.

Matched MeSH terms: Data Mining/methods*
Comparative analysis of statistical tools for oil palm phytochemical research

Ishak NA, Tahir NI, Mohd Sa'id SN, Gopal K, Othman A, Ramli US

Heliyon, 2021 Feb;7(2):e06048.
PMID: 33553773 DOI: 10.1016/j.heliyon.2021.e06048

Recent advances in phytochemical analysis have allowed the accumulation of data for crop researchers due to its capacity to footprint and distinguish metabolites that are present within an organisms, tissues or cells. Apart from genotypic traits, slight changes either by biotic or abiotic stimuli will have significant impact on the metabolite abundances and will eventually be observed through physicochemical characteristics. Apposite data mining to interpret the mounds of phytochemical information from such a dynamic system is thus incumbent. In this investigation, several statistical software platforms ranging from exploratory and confirmatory technique of multivariate data analysis from four different statistical tools of COVAIN, SIMCA-P+, MetaboAnalyst and RIKEN Excel Macro were appraised using an oil palm phytochemical data set. As different software tool encompasses its own advantages and limitations, the insights gained from this assessment were documented to enlighten several aspects of functions and suitability for the adaptation of the tools into the oil palm phytochemistry pipeline. This comparative analysis will certainly provide scientists with salient notes on data assessment and data mining that will later allow the depiction of the overall oil palm status in-situ and ex-situ.

Matched MeSH terms: Data Mining
An efficient data mining framework for the characterization of symptomatic and asymptomatic carotid plaque using bidimensional empirical mode decomposition technique

Molinari F, Raghavendra U, Gudigar A, Meiburger KM, Rajendra Acharya U

Med Biol Eng Comput, 2018 Sep;56(9):1579-1593.
PMID: 29473126 DOI: 10.1007/s11517-018-1792-5

Atherosclerosis is a type of cardiovascular disease which may cause stroke. It is due to the deposition of fatty plaque in the artery walls resulting in the reduction of elasticity gradually and hence restricting the blood flow to the heart. Hence, an early prediction of carotid plaque deposition is important, as it can save lives. This paper proposes a novel data mining framework for the assessment of atherosclerosis in its early stage using ultrasound images. In this work, we are using 1353 symptomatic and 420 asymptomatic carotid plaque ultrasound images. Our proposed method classifies the symptomatic and asymptomatic carotid plaques using bidimensional empirical mode decomposition (BEMD) and entropy features. The unbalanced data samples are compensated using adaptive synthetic sampling (ADASYN), and the developed method yielded a promising accuracy of 91.43%, sensitivity of 97.26%, and specificity of 83.22% using fourteen features. Hence, the proposed method can be used as an assisting tool during the regular screening of carotid arteries in hospitals. Graphical abstract Outline for our efficient data mining framework for the characterization of symptomatic and asymptomatic carotid plaques.

Matched MeSH terms: Data Mining*

Filters

Please provide feedback to Administrator (afdal@afpm.org.my)

External Links