MyMedR

Displaying publications 1 - 20 of 64 in total

Abstract:

Sort:

Using fuzzy association rule mining in cancer classification

Mahmoodian H, Hamiruce Marhaban M, Abdulrahim R, Rosli R, Saripan I

Australas Phys Eng Sci Med, 2011 Apr;34(1):41-54.
PMID: 21327594 DOI: 10.1007/s13246-011-0054-8

The classification of the cancer tumors based on gene expression profiles has been extensively studied in numbers of studies. A wide variety of cancer datasets have been implemented by the various methods of gene selection and classification to identify the behavior of the genes in tumors and find the relationships between them and outcome of diseases. Interpretability of the model, which is developed by fuzzy rules and linguistic variables in this study, has been rarely considered. In addition, creating a fuzzy classifier with high performance in classification that uses a subset of significant genes which have been selected by different types of gene selection methods is another goal of this study. A new algorithm has been developed to identify the fuzzy rules and significant genes based on fuzzy association rule mining. At first, different subset of genes which have been selected by different methods, were used to generate primary fuzzy classifiers separately and then proposed algorithm was implemented to mix the genes which have been associated in the primary classifiers and generate a new classifier. The results show that fuzzy classifier can classify the tumors with high performance while presenting the relationships between the genes by linguistic variables.

Matched MeSH terms: Data Mining/methods
Development of expressed sequence tag resources for Vanda Mimi Palmer and data mining for EST-SSR

Teh SL, Chan WS, Abdullah JO, Namasivayam P

Mol Biol Rep, 2011 Aug;38(6):3903-9.
PMID: 21116862 DOI: 10.1007/s11033-010-0506-3

Vanda Mimi Palmer (VMP) is a highly sought as fragrant-orchid hybrid in Malaysia. It is economically important in cosmetic and beauty industries and also a famous potted ornamental plant. To date, no work on fragrance-related genes of vandaceous orchids has been reported from other research groups although the analysis of floral fragrance or volatiles have been extensively studied. An expressed sequence tag (EST) resource was developed for VMP principally to mine any potential fragrance-related expressed sequence tag-simple sequence repeat (EST-SSR) for future development as markers in the identification of fragrant vandaceous orchids endemic to Malaysia. Clustering, annotation and assembling of the ESTs identified 1,196 unigenes which defined 966 singletons and 230 contigs. The VMP dbEST was functionally classified by gene ontology (GO) into three groups: molecular functions (51.2%), cellular components (16.4%) and biological processes (24.6%) while the remaining 7.8% showed no hits with GO identifier. A total of 112 EST-SSR (9.4%) was mined on which at least five units of di-, tri-, tetra-, penta-, or hexa-nucleotide repeats were predicted. The di-nucleotide motif repeats appeared to be the most frequent repeats among the detected SSRs with the AT/TA types as the most abundant among the dimerics, while AAG/TTC, AGA/TCT-type were the most frequent trimerics. The mined EST-SSR is believed to be useful in the development of EST-SSR markers that is applicable in the screening and characterization of fragrance-related transcripts in closely related species.

Matched MeSH terms: Data Mining*
Improved GART neural network model for pattern classification and rule extraction with application to power systems

Yap KS, Lim CP, Au MT

IEEE Trans Neural Netw, 2011 Dec;22(12):2310-23.
PMID: 22067292 DOI: 10.1109/TNN.2011.2173502

Generalized adaptive resonance theory (GART) is a neural network model that is capable of online learning and is effective in tackling pattern classification tasks. In this paper, we propose an improved GART model (IGART), and demonstrate its applicability to power systems. IGART enhances the dynamics of GART in several aspects, which include the use of the Laplacian likelihood function, a new vigilance function, a new match-tracking mechanism, an ordering algorithm for determining the sequence of training data, and a rule extraction capability to elicit if-then rules from the network. To assess the effectiveness of IGART and to compare its performances with those from other methods, three datasets that are related to power systems are employed. The experimental results demonstrate the usefulness of IGART with the rule extraction capability in undertaking classification problems in power systems engineering.

Matched MeSH terms: Data Mining/methods*
Characterization of spatial patterns in river water quality using chemometric pattern recognition techniques

Gazzaz NM, Yusoff MK, Ramli MF, Aris AZ, Juahir H

Mar Pollut Bull, 2012 Apr;64(4):688-98.
PMID: 22330076 DOI: 10.1016/j.marpolbul.2012.01.032

This study employed three chemometric data mining techniques (factor analysis (FA), cluster analysis (CA), and discriminant analysis (DA)) to identify the latent structure of a water quality (WQ) dataset pertaining to Kinta River (Malaysia) and to classify eight WQ monitoring stations along the river into groups of similar WQ characteristics. FA identified the WQ parameters responsible for variations in Kinta River's WQ and accentuated the roles of weathering and surface runoff in determining the river's WQ. CA grouped the monitoring locations into a cluster of low levels of water pollution (the two uppermost monitoring stations) and another of relatively high levels of river pollution (the mid-, and down-stream stations). DA confirmed these clusters and produced a discriminant function which can predict the cluster membership of new and/or unknown samples. These chemometric techniques highlight the potential for reasonably reducing the number of WQVs and monitoring stations for long-term monitoring purposes.

Matched MeSH terms: Data Mining
Development of ESTs and data mining of pineapple EST-SSRs

Ong WD, Voo CL, Kumar SV

Mol Biol Rep, 2012 May;39(5):5889-96.
PMID: 22207174 DOI: 10.1007/s11033-011-1400-3

Improving the quality of the non-climacteric fruit, pineapple, is possible with information on the expression of genes that occur during the process of fruit ripening. This can be made known though the generation of partial mRNA transcript sequences known as expressed sequence tags (ESTs). ESTs are useful not only for gene discovery but also function as a resource for the identification of molecular markers, such as simple sequence repeats (SSRs). This paper reports on firstly, the construction of a normalized library of the mature green pineapple fruit and secondly, the mining of EST-SSRs markers using the newly obtained pineapple ESTs as well as publically available pineapple ESTs deposited in GenBank. Sequencing of the clones from the EST library resulted in 282 good sequences. Assembly of sequences generated 168 unique transcripts (UTs) consisting of 34 contigs and 134 singletons with an average length of ≈500 bp. Annotation of the UTs categorized the known proteins transcripts into the three ontologies as: molecular function (34.88%), biological process (38.43%), and cellular component (26.69%). Approximately 7% (416) of the pineapple ESTs contained SSRs with an abundance of trinucleotide SSRs (48.3%) being identified. This was followed by dinucleotide and tetranucleotide SSRs with frequency of 46 and 57%, respectively. From these EST-containing SSRs, 355 (85.3%) matched to known proteins while 133 contained flanking regions for primer design. Both the ESTs were sequenced and the mined EST-SSRs will be useful in the understanding of non-climacteric ripening and the screening of biomarkers linked to fruit quality traits.

Matched MeSH terms: Data Mining*
Visualizing disaster attitudes resulting from terrorist activities

Khalid HM, Helander MG, Hood NA

Appl Ergon, 2013 Sep;44(5):671-9.
PMID: 22944486 DOI: 10.1016/j.apergo.2012.06.005

The purpose of this study was to analyze people's attitudes to disasters by investigating how people feel, behave and think during disasters. We focused on disasters induced by humans, such as terrorist attacks. Two types of textual information were collected - from Internet blogs and from research papers. The analysis enabled forecasting of attitudes for the design of proactive disaster advisory scheme. Text was analyzed using a text mining tool, Leximancer. The outcome of this analysis revealed core themes and concepts in the text concerning people's attitudes. The themes and concepts were sorted into three broad categories: Affect, Behaviour, and Cognition (ABC), and the data was visualized in semantic maps. The maps reveal several knowledge pathways of ABC for developing attitudinal ontologies, which describe the relations between affect, behaviour and cognition, and the sequence in which they develop. Clearly, terrorist attacks induced trauma and people became highly vulnerable.

Matched MeSH terms: Data Mining
Fulltext A phylogenomic approach to bacterial subspecies classification: proof of concept in Mycobacterium abscessus

Tan JL, Khang TF, Ngeow YF, Choo SW

BMC Genomics, 2013;14:879.
PMID: 24330254 DOI: 10.1186/1471-2164-14-879

Mycobacterium abscessus is a rapidly growing mycobacterium that is often associated with human infections. The taxonomy of this species has undergone several revisions and is still being debated. In this study, we sequenced the genomes of 12 M. abscessus strains and used phylogenomic analysis to perform subspecies classification.

Matched MeSH terms: Data Mining
Fulltext Order batching in warehouses by minimizing total tardiness: a hybrid approach of weighted association rule mining and genetic algorithms

Azadnia AH, Taheri S, Ghadimi P, Saman MZ, Wong KY

ScientificWorldJournal, 2013;2013:246578.
PMID: 23864823 DOI: 10.1155/2013/246578

One of the cost-intensive issues in managing warehouses is the order picking problem which deals with the retrieval of items from their storage locations in order to meet customer requests. Many solution approaches have been proposed in order to minimize traveling distance in the process of order picking. However, in practice, customer orders have to be completed by certain due dates in order to avoid tardiness which is neglected in most of the related scientific papers. Consequently, we proposed a novel solution approach in order to minimize tardiness which consists of four phases. First of all, weighted association rule mining has been used to calculate associations between orders with respect to their due date. Next, a batching model based on binary integer programming has been formulated to maximize the associations between orders within each batch. Subsequently, the order picking phase will come up which used a Genetic Algorithm integrated with the Traveling Salesman Problem in order to identify the most suitable travel path. Finally, the Genetic Algorithm has been applied for sequencing the constructed batches in order to minimize tardiness. Illustrative examples and comparisons are presented to demonstrate the proficiency and solution quality of the proposed approach.

Matched MeSH terms: Data Mining/methods*
Fulltext Modified multi-class classification using association rule mining

Yuhanis Yusof, Mohammed Hayel Refai

Pertanika Journal of Science & Technology, 2013;21(1):205-216.
MyJurnal

As the amount of document increases, automation of classification that aids the analysis and management of documents receive focal attention. Classification, based on association rules that are generated from a collection of documents, is a recent data mining approach that integrates association rule mining and classification. The existing approaches produces either high accuracy with large number of rules or a small number of association rules that generate low accuracy. This work presents an association rule mining that employs a new item production algorithm that generates a small number of rules and produces an acceptable accuracy rate. The proposed method is evaluated on UCI datasets and measured based on prediction accuracy and the number of generated association rules. Comparison is later made against an existing classifier, Multi-class Classification based on Association Rule (MCAR). From the undertaken experiments, it is learned that the proposed method produces similar accuracy rate as MCAR but yet uses lesser number of rules.

Matched MeSH terms: Data Mining
Fulltext Intelligent bar chart plagiarism detection in documents

Al-Dabbagh MM, Salim N, Rehman A, Alkawaz MH, Saba T, Al-Rodhaan M, et al.

ScientificWorldJournal, 2014;2014:612787.
PMID: 25309952 DOI: 10.1155/2014/612787

This paper presents a novel features mining approach from documents that could not be mined via optical character recognition (OCR). By identifying the intimate relationship between the text and graphical components, the proposed technique pulls out the Start, End, and Exact values for each bar. Furthermore, the word 2-gram and Euclidean distance methods are used to accurately detect and determine plagiarism in bar charts.

Matched MeSH terms: Data Mining/methods*
Fulltext A review of subsequence time series clustering

Zolhavarieh S, Aghabozorgi S, Teh YW

ScientificWorldJournal, 2014;2014:312521.
PMID: 25140332 DOI: 10.1155/2014/312521

Clustering of subsequence time series remains an open issue in time series clustering. Subsequence time series clustering is used in different fields, such as e-commerce, outlier detection, speech recognition, biological systems, DNA recognition, and text mining. One of the useful fields in the domain of subsequence time series clustering is pattern recognition. To improve this field, a sequence of time series data is used. This paper reviews some definitions and backgrounds related to subsequence time series clustering. The categorization of the literature reviews is divided into three groups: preproof, interproof, and postproof period. Moreover, various state-of-the-art approaches in performing subsequence time series clustering are discussed under each of the following categories. The strengths and weaknesses of the employed methods are evaluated as potential issues for future studies.

Matched MeSH terms: Data Mining*
Fulltext First comprehensive in silico analysis of the functional and structural consequences of SNPs in human GalNAc-T1 gene

Mohamoud HS, Hussain MR, El-Harouni AA, Shaik NA, Qasmi ZU, Merican AF, et al.

Comput Math Methods Med, 2014;2014:904052.
PMID: 24723968 DOI: 10.1155/2014/904052

GalNAc-T1, a key candidate of GalNac-transferases genes family that is involved in mucin-type O-linked glycosylation pathway, is expressed in most biological tissues and cell types. Despite the reported association of GalNAc-T1 gene mutations with human disease susceptibility, the comprehensive computational analysis of coding, noncoding and regulatory SNPs, and their functional impacts on protein level, still remains unknown. Therefore, sequence- and structure-based computational tools were employed to screen the entire listed coding SNPs of GalNAc-T1 gene in order to identify and characterize them. Our concordant in silico analysis by SIFT, PolyPhen-2, PANTHER-cSNP, and SNPeffect tools, identified the potential nsSNPs (S143P, G258V, and Y414D variants) from 18 nsSNPs of GalNAc-T1. Additionally, 2 regulatory SNPs (rs72964406 and #x26; rs34304568) were also identified in GalNAc-T1 by using FastSNP tool. Using multiple computational approaches, we have systematically classified the functional mutations in regulatory and coding regions that can modify expression and function of GalNAc-T1 enzyme. These genetic variants can further assist in better understanding the wide range of disease susceptibility associated with the mucin-based cell signalling and pathogenic binding, and may help to develop novel therapeutic elements for associated diseases.

Matched MeSH terms: Data Mining/methods
Fast decision tree-based method to index large DNA-protein sequence databases using hybrid distributed-shared memory programming model

Jaber KM, Abdullah R, Rashid NA

Int J Bioinform Res Appl, 2014;10(3):321-40.
PMID: 24794073 DOI: 10.1504/IJBRA.2014.060765

In recent times, the size of biological databases has increased significantly, with the continuous growth in the number of users and rate of queries; such that some databases have reached the terabyte size. There is therefore, the increasing need to access databases at the fastest rates possible. In this paper, the decision tree indexing model (PDTIM) was parallelised, using a hybrid of distributed and shared memory on resident database; with horizontal and vertical growth through Message Passing Interface (MPI) and POSIX Thread (PThread), to accelerate the index building time. The PDTIM was implemented using 1, 2, 4 and 5 processors on 1, 2, 3 and 4 threads respectively. The results show that the hybrid technique improved the speedup, compared to a sequential version. It could be concluded from results that the proposed PDTIM is appropriate for large data sets, in terms of index building time.

Matched MeSH terms: Data Mining/methods*
Fulltext Effect of temporal relationships in associative rule mining for web log data

Khairudin NM, Mustapha A, Ahmad MH

ScientificWorldJournal, 2014;2014:813983.
PMID: 24587757 DOI: 10.1155/2014/813983

The advent of web-based applications and services has created such diverse and voluminous web log data stored in web servers, proxy servers, client machines, or organizational databases. This paper attempts to investigate the effect of temporal attribute in relational rule mining for web log data. We incorporated the characteristics of time in the rule mining process and analysed the effect of various temporal parameters. The rules generated from temporal relational rule mining are then compared against the rules generated from the classical rule mining approach such as the Apriori and FP-Growth algorithms. The results showed that by incorporating the temporal attribute via time, the number of rules generated is subsequently smaller but is comparable in terms of quality.

Matched MeSH terms: Data Mining/methods*
Fulltext Mining personal data using smartphones and wearable devices: a survey

Habib ur Rehman M, Liew CS, Wah TY, Shuja J, Daghighi B

Sensors (Basel), 2015 Feb 13;15(2):4430-69.
PMID: 25688592 DOI: 10.3390/s150204430

The staggering growth in smartphone and wearable device use has led to a massive scale generation of personal (user-specific) data. To explore, analyze, and extract useful information and knowledge from the deluge of personal data, one has to leverage these devices as the data-mining platforms in ubiquitous, pervasive, and big data environments. This study presents the personal ecosystem where all computational resources, communication facilities, storage and knowledge management systems are available in user proximity. An extensive review on recent literature has been conducted and a detailed taxonomy is presented. The performance evaluation metrics and their empirical evidences are sorted out in this paper. Finally, we have highlighted some future research directions and potentially emerging application areas for personal data mining using smartphones and wearable devices.

Matched MeSH terms: Data Mining
Current advance methods for the identification of blast resistance genes in rice

Tanweer FA, Rafii MY, Sijam K, Rahim HA, Ahmed F, Latif MA

C. R. Biol., 2015 May;338(5):321-34.
PMID: 25843222 DOI: 10.1016/j.crvi.2015.03.001

Rice blast caused by Magnaporthe oryzae is one of the most devastating diseases of rice around the world and crop losses due to blast are considerably high. Many blast resistant rice varieties have been developed by classical plant breeding and adopted by farmers in various rice-growing countries. However, the variability in the pathogenicity of the blast fungus according to environment made blast disease a major concern for farmers, which remains a threat to the rice industry. With the utilization of molecular techniques, plant breeders have improved rice production systems and minimized yield losses. In this article, we have summarized the current advanced molecular techniques used for controlling blast disease. With the advent of new technologies like marker-assisted selection, molecular mapping, map-based cloning, marker-assisted backcrossing and allele mining, breeders have identified more than 100 Pi loci and 350 QTL in rice genome responsible for blast disease. These Pi genes and QTLs can be introgressed into a blast-susceptible cultivar through marker-assisted backcross breeding. These molecular techniques provide timesaving, environment friendly and labour-cost-saving ways to control blast disease. The knowledge of host-plant interactions in the frame of blast disease will lead to develop resistant varieties in the future.

Matched MeSH terms: Data Mining
Fulltext A Comparison Study on Similarity and Dissimilarity Measures in Clustering Continuous Data

Shirkhorshidi AS, Aghabozorgi S, Wah TY

PLoS One, 2015;10(12):e0144059.
PMID: 26658987 DOI: 10.1371/journal.pone.0144059

Similarity or distance measures are core components used by distance-based clustering algorithms to cluster similar data points into the same clusters, while dissimilar or distant data points are placed into different clusters. The performance of similarity measures is mostly addressed in two or three-dimensional spaces, beyond which, to the best of our knowledge, there is no empirical study that has revealed the behavior of similarity measures when dealing with high-dimensional datasets. To fill this gap, a technical framework is proposed in this study to analyze, compare and benchmark the influence of different similarity measures on the results of distance-based clustering algorithms. For reproducibility purposes, fifteen publicly available datasets were used for this study, and consequently, future distance measures can be evaluated and compared with the results of the measures discussed in this work. These datasets were classified as low and high-dimensional categories to study the performance of each measure against each category. This research should help the research community to identify suitable distance measures for datasets and also to facilitate a comparison and evaluation of the newly proposed similarity or distance measures with traditional ones.

Matched MeSH terms: Data Mining/statistics & numerical data*
Fulltext Development and mining of a volatile organic compound database

Abdullah AA, Altaf-Ul-Amin M, Ono N, Sato T, Sugiura T, Morita AH, et al.

Biomed Res Int, 2015;2015:139254.
PMID: 26495281 DOI: 10.1155/2015/139254

Volatile organic compounds (VOCs) are small molecules that exhibit high vapor pressure under ambient conditions and have low boiling points. Although VOCs contribute only a small proportion of the total metabolites produced by living organisms, they play an important role in chemical ecology specifically in the biological interactions between organisms and ecosystems. VOCs are also important in the health care field as they are presently used as a biomarker to detect various human diseases. Information on VOCs is scattered in the literature until now; however, there is still no available database describing VOCs and their biological activities. To attain this purpose, we have developed KNApSAcK Metabolite Ecology Database, which contains the information on the relationships between VOCs and their emitting organisms. The KNApSAcK Metabolite Ecology is also linked with the KNApSAcK Core and KNApSAcK Metabolite Activity Database to provide further information on the metabolites and their biological activities. The VOC database can be accessed online.

Matched MeSH terms: Data Mining/methods*
Fulltext An Efficient Optimization Method for Solving Unsupervised Data Classification Problems

Shabanzadeh P, Yusof R

Comput Math Methods Med, 2015;2015:802754.
PMID: 26336509 DOI: 10.1155/2015/802754

Unsupervised data classification (or clustering) analysis is one of the most useful tools and a descriptive task in data mining that seeks to classify homogeneous groups of objects based on similarity and is used in many medical disciplines and various applications. In general, there is no single algorithm that is suitable for all types of data, conditions, and applications. Each algorithm has its own advantages, limitations, and deficiencies. Hence, research for novel and effective approaches for unsupervised data classification is still active. In this paper a heuristic algorithm, Biogeography-Based Optimization (BBO) algorithm, was adapted for data clustering problems by modifying the main operators of BBO algorithm, which is inspired from the natural biogeography distribution of different species. Similar to other population-based algorithms, BBO algorithm starts with an initial population of candidate solutions to an optimization problem and an objective function that is calculated for them. To evaluate the performance of the proposed algorithm assessment was carried on six medical and real life datasets and was compared with eight well known and recent unsupervised data classification algorithms. Numerical results demonstrate that the proposed evolutionary optimization algorithm is efficient for unsupervised data classification.

Matched MeSH terms: Data Mining/methods*; Data Mining/statistics & numerical data
Fulltext Application of an Effective Statistical Technique for an Accurate and Powerful Mining of Quantitative Trait Loci for Rice Aroma Trait

Golestan Hashemi FS, Rafii MY, Ismail MR, Mohamed MT, Rahim HA, Latif MA, et al.

PLoS One, 2015;10(6):e0129069.
PMID: 26061689 DOI: 10.1371/journal.pone.0129069

When a phenotype of interest is associated with an external/internal covariate, covariate inclusion in quantitative trait loci (QTL) analyses can diminish residual variation and subsequently enhance the ability of QTL detection. In the in vitro synthesis of 2-acetyl-1-pyrroline (2AP), the main fragrance compound in rice, the thermal processing during the Maillard-type reaction between proline and carbohydrate reduction produces a roasted, popcorn-like aroma. Hence, for the first time, we included the proline amino acid, an important precursor of 2AP, as a covariate in our QTL mapping analyses to precisely explore the genetic factors affecting natural variation for rice scent. Consequently, two QTLs were traced on chromosomes 4 and 8. They explained from 20% to 49% of the total aroma phenotypic variance. Additionally, by saturating the interval harboring the major QTL using gene-based primers, a putative allele of fgr (major genetic determinant of fragrance) was mapped in the QTL on the 8th chromosome in the interval RM223-SCU015RM (1.63 cM). These loci supported previous studies of different accessions. Such QTLs can be widely used by breeders in crop improvement programs and for further fine mapping. Moreover, no previous studies and findings were found on simultaneous assessment of the relationship among 2AP, proline and fragrance QTLs. Therefore, our findings can help further our understanding of the metabolomic and genetic basis of 2AP biosynthesis in aromatic rice.

Matched MeSH terms: Data Mining

Filters

Please provide feedback to Administrator (afdal@afpm.org.my)

External Links