Displaying publications 1 - 20 of 65 in total

Abstract:
Sort:
  1. Zolhavarieh S, Aghabozorgi S, Teh YW
    ScientificWorldJournal, 2014;2014:312521.
    PMID: 25140332 DOI: 10.1155/2014/312521
    Clustering of subsequence time series remains an open issue in time series clustering. Subsequence time series clustering is used in different fields, such as e-commerce, outlier detection, speech recognition, biological systems, DNA recognition, and text mining. One of the useful fields in the domain of subsequence time series clustering is pattern recognition. To improve this field, a sequence of time series data is used. This paper reviews some definitions and backgrounds related to subsequence time series clustering. The categorization of the literature reviews is divided into three groups: preproof, interproof, and postproof period. Moreover, various state-of-the-art approaches in performing subsequence time series clustering are discussed under each of the following categories. The strengths and weaknesses of the employed methods are evaluated as potential issues for future studies.
    Matched MeSH terms: Data Mining*
  2. Yusuf N, Zakaria A, Omar MI, Shakaff AY, Masnan MJ, Kamarudin LM, et al.
    BMC Bioinformatics, 2015;16:158.
    PMID: 25971258 DOI: 10.1186/s12859-015-0601-5
    Effective management of patients with diabetic foot infection is a crucial concern. A delay in prescribing appropriate antimicrobial agent can lead to amputation or life threatening complications. Thus, this electronic nose (e-nose) technique will provide a diagnostic tool that will allow for rapid and accurate identification of a pathogen.
    Matched MeSH terms: Data Mining
  3. Yuhanis Yusof, Mohammed Hayel Refai
    MyJurnal
    As the amount of document increases, automation of classification that aids the analysis and management of documents receive focal attention. Classification, based on association rules that are generated from a collection of documents, is a recent data mining approach that integrates association rule mining and classification. The existing approaches produces either high accuracy with large number of rules or a small number of association rules that generate low accuracy. This work presents an association rule mining that employs a new item production algorithm that generates a small number of rules and produces an acceptable accuracy rate. The proposed method is evaluated on UCI datasets and measured based on prediction accuracy and the number of generated association rules. Comparison is later made against an existing classifier, Multi-class Classification based on Association Rule (MCAR). From the undertaken experiments, it is learned that the proposed method produces similar accuracy rate as MCAR but yet uses lesser number of rules.
    Matched MeSH terms: Data Mining
  4. Yeo JG, Wasser M, Kumar P, Pan L, Poh SL, Ally F, et al.
    Nat Biotechnol, 2020 06;38(6):679-684.
    PMID: 32440006 DOI: 10.1038/s41587-020-0532-1
    Matched MeSH terms: Data Mining
  5. Yazdani A, Varathan KD, Chiam YK, Malik AW, Wan Ahmad WA
    BMC Med Inform Decis Mak, 2021 06 21;21(1):194.
    PMID: 34154576 DOI: 10.1186/s12911-021-01527-5
    BACKGROUND: Cardiovascular disease is the leading cause of death in many countries. Physicians often diagnose cardiovascular disease based on current clinical tests and previous experience of diagnosing patients with similar symptoms. Patients who suffer from heart disease require quick diagnosis, early treatment and constant observations. To address their needs, many data mining approaches have been used in the past in diagnosing and predicting heart diseases. Previous research was also focused on identifying the significant contributing features to heart disease prediction, however, less importance was given to identifying the strength of these features.

    METHOD: This paper is motivated by the gap in the literature, thus proposes an algorithm that measures the strength of the significant features that contribute to heart disease prediction. The study is aimed at predicting heart disease based on the scores of significant features using Weighted Associative Rule Mining.

    RESULTS: A set of important feature scores and rules were identified in diagnosing heart disease and cardiologists were consulted to confirm the validity of these rules. The experiments performed on the UCI open dataset, widely used for heart disease research yielded the highest confidence score of 98% in predicting heart disease.

    CONCLUSION: This study managed to provide a significant contribution in computing the strength scores with significant predictors in heart disease prediction. From the evaluation results, we obtained important rules and achieved highest confidence score by utilizing the computed strength scores of significant predictors on Weighted Associative Rule Mining in predicting heart disease.

    Matched MeSH terms: Data Mining
  6. Yap KS, Lim CP, Au MT
    IEEE Trans Neural Netw, 2011 Dec;22(12):2310-23.
    PMID: 22067292 DOI: 10.1109/TNN.2011.2173502
    Generalized adaptive resonance theory (GART) is a neural network model that is capable of online learning and is effective in tackling pattern classification tasks. In this paper, we propose an improved GART model (IGART), and demonstrate its applicability to power systems. IGART enhances the dynamics of GART in several aspects, which include the use of the Laplacian likelihood function, a new vigilance function, a new match-tracking mechanism, an ordering algorithm for determining the sequence of training data, and a rule extraction capability to elicit if-then rules from the network. To assess the effectiveness of IGART and to compare its performances with those from other methods, three datasets that are related to power systems are employed. The experimental results demonstrate the usefulness of IGART with the rule extraction capability in undertaking classification problems in power systems engineering.
    Matched MeSH terms: Data Mining/methods*
  7. Vos RA, Katayama T, Mishima H, Kawano S, Kawashima S, Kim JD, et al.
    F1000Res, 2020;9:136.
    PMID: 32308977 DOI: 10.12688/f1000research.18236.1
    We report on the activities of the 2015 edition of the BioHackathon, an annual event that brings together researchers and developers from around the world to develop tools and technologies that promote the reusability of biological data. We discuss issues surrounding the representation, publication, integration, mining and reuse of biological data and metadata across a wide range of biomedical data types of relevance for the life sciences, including chemistry, genotypes and phenotypes, orthology and phylogeny, proteomics, genomics, glycomics, and metabolomics. We describe our progress to address ongoing challenges to the reusability and reproducibility of research results, and identify outstanding issues that continue to impede the progress of bioinformatics research. We share our perspective on the state of the art, continued challenges, and goals for future research and development for the life sciences Semantic Web.
    Matched MeSH terms: Data Mining
  8. Uddin J, Ghazali R, Deris MM
    PLoS One, 2017;12(1):e0164803.
    PMID: 28068344 DOI: 10.1371/journal.pone.0164803
    Clustering a set of objects into homogeneous groups is a fundamental operation in data mining. Recently, many attentions have been put on categorical data clustering, where data objects are made up of non-numerical attributes. For categorical data clustering the rough set based approaches such as Maximum Dependency Attribute (MDA) and Maximum Significance Attribute (MSA) has outperformed their predecessor approaches like Bi-Clustering (BC), Total Roughness (TR) and Min-Min Roughness(MMR). This paper presents the limitations and issues of MDA and MSA techniques on special type of data sets where both techniques fails to select or faces difficulty in selecting their best clustering attribute. Therefore, this analysis motivates the need to come up with better and more generalize rough set theory approach that can cope the issues with MDA and MSA. Hence, an alternative technique named Maximum Indiscernible Attribute (MIA) for clustering categorical data using rough set indiscernible relations is proposed. The novelty of the proposed approach is that, unlike other rough set theory techniques, it uses the domain knowledge of the data set. It is based on the concept of indiscernibility relation combined with a number of clusters. To show the significance of proposed approach, the effect of number of clusters on rough accuracy, purity and entropy are described in the form of propositions. Moreover, ten different data sets from previously utilized research cases and UCI repository are used for experiments. The results produced in tabular and graphical forms shows that the proposed MIA technique provides better performance in selecting the clustering attribute in terms of purity, entropy, iterations, time, accuracy and rough accuracy.
    Matched MeSH terms: Data Mining
  9. Teng S, Khong KW, Pahlevan Sharif S, Ahmed A
    JMIR Public Health Surveill, 2020 10 01;6(4):e19618.
    PMID: 33001036 DOI: 10.2196/19618
    BACKGROUND: Poor nutrition and food selection lead to health issues such as obesity, cardiovascular disease, diabetes, and cancer. This study of YouTube comments aims to uncover patterns of food choices and the factors driving them, in addition to exploring the sentiments of healthy eating in networked communities.

    OBJECTIVE: The objectives of the study are to explore the determinants, motives, and barriers to healthy eating behaviors in online communities and provide insight into YouTube video commenters' perceptions and sentiments of healthy eating through text mining techniques.

    METHODS: This paper applied text mining techniques to identify and categorize meaningful healthy eating determinants. These determinants were then incorporated into hypothetically defined constructs that reflect their thematic and sentimental nature in order to test our proposed model using a variance-based structural equation modeling procedure.

    RESULTS: With a dataset of 4654 comments extracted from YouTube videos in the context of Malaysia, we apply a text mining method to analyze the perceptions and behavior of healthy eating. There were 10 clusters identified with regard to food ingredients, food price, food choice, food portion, well-being, cooking, and culture in the concept of healthy eating. The structural equation modeling results show that clusters are positively associated with healthy eating with all P values less than .001, indicating a statistical significance of the study results. People hold complex and multifaceted beliefs about healthy eating in the context of YouTube videos. Fruits and vegetables are the epitome of healthy foods. Despite having a favorable perception of healthy eating, people may not purchase commonly recognized healthy food if it has a premium price. People associate healthy eating with weight concerns. Food taste, variety, and availability are identified as reasons why Malaysians cannot act on eating healthily.

    CONCLUSIONS: This study offers significant value to the existing literature of health-related studies by investigating the rich and diverse social media data gleaned from YouTube. This research integrated text mining analytics with predictive modeling techniques to identify thematic constructs and analyze the sentiments of healthy eating.

    Matched MeSH terms: Data Mining
  10. Teh SL, Chan WS, Abdullah JO, Namasivayam P
    Mol Biol Rep, 2011 Aug;38(6):3903-9.
    PMID: 21116862 DOI: 10.1007/s11033-010-0506-3
    Vanda Mimi Palmer (VMP) is a highly sought as fragrant-orchid hybrid in Malaysia. It is economically important in cosmetic and beauty industries and also a famous potted ornamental plant. To date, no work on fragrance-related genes of vandaceous orchids has been reported from other research groups although the analysis of floral fragrance or volatiles have been extensively studied. An expressed sequence tag (EST) resource was developed for VMP principally to mine any potential fragrance-related expressed sequence tag-simple sequence repeat (EST-SSR) for future development as markers in the identification of fragrant vandaceous orchids endemic to Malaysia. Clustering, annotation and assembling of the ESTs identified 1,196 unigenes which defined 966 singletons and 230 contigs. The VMP dbEST was functionally classified by gene ontology (GO) into three groups: molecular functions (51.2%), cellular components (16.4%) and biological processes (24.6%) while the remaining 7.8% showed no hits with GO identifier. A total of 112 EST-SSR (9.4%) was mined on which at least five units of di-, tri-, tetra-, penta-, or hexa-nucleotide repeats were predicted. The di-nucleotide motif repeats appeared to be the most frequent repeats among the detected SSRs with the AT/TA types as the most abundant among the dimerics, while AAG/TTC, AGA/TCT-type were the most frequent trimerics. The mined EST-SSR is believed to be useful in the development of EST-SSR markers that is applicable in the screening and characterization of fragrance-related transcripts in closely related species.
    Matched MeSH terms: Data Mining*
  11. Tanweer FA, Rafii MY, Sijam K, Rahim HA, Ahmed F, Latif MA
    C. R. Biol., 2015 May;338(5):321-34.
    PMID: 25843222 DOI: 10.1016/j.crvi.2015.03.001
    Rice blast caused by Magnaporthe oryzae is one of the most devastating diseases of rice around the world and crop losses due to blast are considerably high. Many blast resistant rice varieties have been developed by classical plant breeding and adopted by farmers in various rice-growing countries. However, the variability in the pathogenicity of the blast fungus according to environment made blast disease a major concern for farmers, which remains a threat to the rice industry. With the utilization of molecular techniques, plant breeders have improved rice production systems and minimized yield losses. In this article, we have summarized the current advanced molecular techniques used for controlling blast disease. With the advent of new technologies like marker-assisted selection, molecular mapping, map-based cloning, marker-assisted backcrossing and allele mining, breeders have identified more than 100 Pi loci and 350 QTL in rice genome responsible for blast disease. These Pi genes and QTLs can be introgressed into a blast-susceptible cultivar through marker-assisted backcross breeding. These molecular techniques provide timesaving, environment friendly and labour-cost-saving ways to control blast disease. The knowledge of host-plant interactions in the frame of blast disease will lead to develop resistant varieties in the future.
    Matched MeSH terms: Data Mining
  12. Tan WM, Ng WL, Ganggayah MD, Hoe VCW, Rahmat K, Zaini HS, et al.
    Health Informatics J, 2023;29(3):14604582231203763.
    PMID: 37740904 DOI: 10.1177/14604582231203763
    Radiology reporting is narrative, and its content depends on the clinician's ability to interpret the images accurately. A tertiary hospital, such as anonymous institute, focuses on writing reports narratively as part of training for medical personnel. Nevertheless, free-text reports make it inconvenient to extract information for clinical audits and data mining. Therefore, we aim to convert unstructured breast radiology reports into structured formats using natural language processing (NLP) algorithm. This study used 327 de-identified breast radiology reports from the anonymous institute. The radiologist identified the significant data elements to be extracted. Our NLP algorithm achieved 97% and 94.9% accuracy in training and testing data, respectively. Henceforth, the structured information was used to build the predictive model for predicting the value of the BIRADS category. The model based on random forest generated the highest accuracy of 92%. Our study not only fulfilled the demands of clinicians by enhancing communication between medical personnel, but it also demonstrated the usefulness of mineable structured data in yielding significant insights.
    Matched MeSH terms: Data Mining
  13. Tan JL, Khang TF, Ngeow YF, Choo SW
    BMC Genomics, 2013;14:879.
    PMID: 24330254 DOI: 10.1186/1471-2164-14-879
    Mycobacterium abscessus is a rapidly growing mycobacterium that is often associated with human infections. The taxonomy of this species has undergone several revisions and is still being debated. In this study, we sequenced the genomes of 12 M. abscessus strains and used phylogenomic analysis to perform subspecies classification.
    Matched MeSH terms: Data Mining
  14. Shirkhorshidi AS, Aghabozorgi S, Wah TY
    PLoS One, 2015;10(12):e0144059.
    PMID: 26658987 DOI: 10.1371/journal.pone.0144059
    Similarity or distance measures are core components used by distance-based clustering algorithms to cluster similar data points into the same clusters, while dissimilar or distant data points are placed into different clusters. The performance of similarity measures is mostly addressed in two or three-dimensional spaces, beyond which, to the best of our knowledge, there is no empirical study that has revealed the behavior of similarity measures when dealing with high-dimensional datasets. To fill this gap, a technical framework is proposed in this study to analyze, compare and benchmark the influence of different similarity measures on the results of distance-based clustering algorithms. For reproducibility purposes, fifteen publicly available datasets were used for this study, and consequently, future distance measures can be evaluated and compared with the results of the measures discussed in this work. These datasets were classified as low and high-dimensional categories to study the performance of each measure against each category. This research should help the research community to identify suitable distance measures for datasets and also to facilitate a comparison and evaluation of the newly proposed similarity or distance measures with traditional ones.
    Matched MeSH terms: Data Mining/statistics & numerical data*
  15. Shabanzadeh P, Yusof R
    Comput Math Methods Med, 2015;2015:802754.
    PMID: 26336509 DOI: 10.1155/2015/802754
    Unsupervised data classification (or clustering) analysis is one of the most useful tools and a descriptive task in data mining that seeks to classify homogeneous groups of objects based on similarity and is used in many medical disciplines and various applications. In general, there is no single algorithm that is suitable for all types of data, conditions, and applications. Each algorithm has its own advantages, limitations, and deficiencies. Hence, research for novel and effective approaches for unsupervised data classification is still active. In this paper a heuristic algorithm, Biogeography-Based Optimization (BBO) algorithm, was adapted for data clustering problems by modifying the main operators of BBO algorithm, which is inspired from the natural biogeography distribution of different species. Similar to other population-based algorithms, BBO algorithm starts with an initial population of candidate solutions to an optimization problem and an objective function that is calculated for them. To evaluate the performance of the proposed algorithm assessment was carried on six medical and real life datasets and was compared with eight well known and recent unsupervised data classification algorithms. Numerical results demonstrate that the proposed evolutionary optimization algorithm is efficient for unsupervised data classification.
    Matched MeSH terms: Data Mining/methods*; Data Mining/statistics & numerical data
  16. Sarahani Harun, Nurulisa Zulkifle
    Sains Malaysiana, 2018;47:2933-2940.
    Laryngeal cancer is the most common head and neck cancer in the world and its incidence is on the rise. However, the
    molecular mechanism underlying laryngeal cancer pathogenesis is poorly understood. The goal of this study was to
    develop a protein-protein interaction (PPI) network for laryngeal cancer to predict the biological pathways that underlie
    the molecular complexes in the network. Genes involved in laryngeal cancer were extracted from the OMIM database
    and their interaction partners were identified via text and data mining using Agilent Literature Search, STRING and
    GeneMANIA. PPI network was then integrated and visualised using Cytoscape ver3.6.0. Molecular complexes in the
    network were predicted by MCODE plugin and functional enrichment analyses of the molecular complexes were performed
    using BiNGO. 28 laryngeal cancer-related genes were present in the OMIM database. The PPI network associated with
    laryngeal cancer contained 161 nodes, 661 edges and five molecular complexes. Some of the complexes were related to
    the biological behaviour of cancer, providing the foundation for further understanding of the mechanism of laryngeal
    cancer development and progression.
    Matched MeSH terms: Data Mining
  17. Salih SQ, Alsewari AA, Wahab HA, Mohammed MKA, Rashid TA, Das D, et al.
    PLoS One, 2023;18(7):e0288044.
    PMID: 37406006 DOI: 10.1371/journal.pone.0288044
    The retrieval of important information from a dataset requires applying a special data mining technique known as data clustering (DC). DC classifies similar objects into a groups of similar characteristics. Clustering involves grouping the data around k-cluster centres that typically are selected randomly. Recently, the issues behind DC have called for a search for an alternative solution. Recently, a nature-based optimization algorithm named Black Hole Algorithm (BHA) was developed to address the several well-known optimization problems. The BHA is a metaheuristic (population-based) that mimics the event around the natural phenomena of black holes, whereby an individual star represents the potential solutions revolving around the solution space. The original BHA algorithm showed better performance compared to other algorithms when applied to a benchmark dataset, despite its poor exploration capability. Hence, this paper presents a multi-population version of BHA as a generalization of the BHA called MBHA wherein the performance of the algorithm is not dependent on the best-found solution but a set of generated best solutions. The method formulated was subjected to testing using a set of nine widespread and popular benchmark test functions. The ensuing experimental outcomes indicated the highly precise results generated by the method compared to BHA and comparable algorithms in the study, as well as excellent robustness. Furthermore, the proposed MBHA achieved a high rate of convergence on six real datasets (collected from the UCL machine learning lab), making it suitable for DC problems. Lastly, the evaluations conclusively indicated the appropriateness of the proposed algorithm to resolve DC issues.
    Matched MeSH terms: Data Mining/methods
  18. Saeed, Sana, Ong, Hong Choon
    MyJurnal
    Support vector machine (SVM) is one of the most popular algorithms in machine learning
    and data mining. However, its reduced efficiency is usually observed for imbalanced
    datasets. To improve the performance of SVM for binary imbalanced datasets, a new scheme
    based on oversampling and the hybrid algorithm were introduced. Besides the use of a
    single kernel function, SVM was applied with multiple kernel learning (MKL). A weighted
    linear combination was defined based on the linear kernel function, radial basis function
    (RBF kernel), and sigmoid kernel function for MKL. By generating the synthetic samples
    in the minority class, searching the best choices of the SVM parameters and identifying
    the weights of MKL by minimizing the objective function, the improved performance of
    SVM was observed. To prove the strength of the proposed scheme, an experimental study,
    including noisy borderline and real imbalanced datasets was conducted. SVM was applied
    with linear kernel function, RBF kernel, sigmoid kernel function and MKL on all datasets.
    The performance of SVM with all kernel functions was evaluated by using sensitivity,
    G Mean, and F measure. A significantly improved performance of SVM with MKL was
    observed by applying the proposed scheme.
    Matched MeSH terms: Data Mining
  19. Paul J, Jacob J, Mahmud M, Vaka M, Krishnan SG, Arifutzzaman A, et al.
    Int J Biol Macromol, 2024 Apr;265(Pt 2):130850.
    PMID: 38492706 DOI: 10.1016/j.ijbiomac.2024.130850
    Recent decades have witnessed a surge in research interest in bio-nanocomposite-based packaging materials, but still, a lack of systematic analysis exists in this domain. Bio-based packaging materials pose a sustainable alternative to petroleum-based packaging materials. The current work employs bibliometric analysis to deliver a comprehensive outline on the role of bio nanocomposites in packaging. India, Iran, and China were revealed to be the top three nations actively engaged in this domain in total publications. Islamic Azad University in Iran and Universiti Putra Malaysia in Malaysia are among the world's best institutions in active research and publications in this field. The extensive collaboration between nations and institutions highlights the significance of a holistic approach towards bio-nanocomposite. The National Natural Science Foundation of China is the leading funding body in this field of research. Among authors, Jong whan Rhim secured the topmost citations (2234) in this domain (13 publications). Among journals, Carbohydrate Polymers secured the maximum citation count (4629) from 36 articles; the initial one was published in 2011. Bio nanocomposite is the most frequently used keyword. Researchers and policymakers focussing on sustainable packaging solutions will gain crucial insights on the current research status on packaging solutions using bio-nanocomposites from the conclusions.
    Matched MeSH terms: Data Mining
  20. Ong WD, Voo CL, Kumar SV
    Mol Biol Rep, 2012 May;39(5):5889-96.
    PMID: 22207174 DOI: 10.1007/s11033-011-1400-3
    Improving the quality of the non-climacteric fruit, pineapple, is possible with information on the expression of genes that occur during the process of fruit ripening. This can be made known though the generation of partial mRNA transcript sequences known as expressed sequence tags (ESTs). ESTs are useful not only for gene discovery but also function as a resource for the identification of molecular markers, such as simple sequence repeats (SSRs). This paper reports on firstly, the construction of a normalized library of the mature green pineapple fruit and secondly, the mining of EST-SSRs markers using the newly obtained pineapple ESTs as well as publically available pineapple ESTs deposited in GenBank. Sequencing of the clones from the EST library resulted in 282 good sequences. Assembly of sequences generated 168 unique transcripts (UTs) consisting of 34 contigs and 134 singletons with an average length of ≈500 bp. Annotation of the UTs categorized the known proteins transcripts into the three ontologies as: molecular function (34.88%), biological process (38.43%), and cellular component (26.69%). Approximately 7% (416) of the pineapple ESTs contained SSRs with an abundance of trinucleotide SSRs (48.3%) being identified. This was followed by dinucleotide and tetranucleotide SSRs with frequency of 46 and 57%, respectively. From these EST-containing SSRs, 355 (85.3%) matched to known proteins while 133 contained flanking regions for primer design. Both the ESTs were sequenced and the mined EST-SSRs will be useful in the understanding of non-climacteric ripening and the screening of biomarkers linked to fruit quality traits.
    Matched MeSH terms: Data Mining*
Filters
Contact Us

Please provide feedback to Administrator (afdal@afpm.org.my)

External Links