Displaying all 16 publications

Abstract:
Sort:
  1. Hakak S, Kamsin A, Shivakumara P, Idna Idris MY, Gilkar GA
    PLoS One, 2018;13(7):e0200912.
    PMID: 30048486 DOI: 10.1371/journal.pone.0200912
    Exact pattern matching algorithms are popular and used widely in several applications, such as molecular biology, text processing, image processing, web search engines, network intrusion detection systems and operating systems. The focus of these algorithms is to achieve time efficiency according to applications but not memory consumption. In this work, we propose a novel idea to achieve both time efficiency and memory consumption by splitting query string for searching in Corpus. For a given text, the proposed algorithm split the query pattern into two equal halves and considers the second (right) half as a query string for searching in Corpus. Once the match is found with second halves, the proposed algorithm applies brute force procedure to find remaining match by referring the location of right half. Experimental results on different S1 Dataset, namely Arabic, English, Chinese, Italian and French text databases show that the proposed algorithm outperforms the existing S1 Algorithm in terms of time efficiency and memory consumption as the length of the query pattern increases.
    Matched MeSH terms: Data Mining/methods*
  2. Aqra I, Herawan T, Abdul Ghani N, Akhunzada A, Ali A, Bin Razali R, et al.
    PLoS One, 2018;13(1):e0179703.
    PMID: 29351287 DOI: 10.1371/journal.pone.0179703
    Designing an efficient association rule mining (ARM) algorithm for multilevel knowledge-based transactional databases that is appropriate for real-world deployments is of paramount concern. However, dynamic decision making that needs to modify the threshold either to minimize or maximize the output knowledge certainly necessitates the extant state-of-the-art algorithms to rescan the entire database. Subsequently, the process incurs heavy computation cost and is not feasible for real-time applications. The paper addresses efficiently the problem of threshold dynamic updation for a given purpose. The paper contributes by presenting a novel ARM approach that creates an intermediate itemset and applies a threshold to extract categorical frequent itemsets with diverse threshold values. Thus, improving the overall efficiency as we no longer needs to scan the whole database. After the entire itemset is built, we are able to obtain real support without the need of rebuilding the itemset (e.g. Itemset list is intersected to obtain the actual support). Moreover, the algorithm supports to extract many frequent itemsets according to a pre-determined minimum support with an independent purpose. Additionally, the experimental results of our proposed approach demonstrate the capability to be deployed in any mining system in a fully parallel mode; consequently, increasing the efficiency of the real-time association rules discovery process. The proposed approach outperforms the extant state-of-the-art and shows promising results that reduce computation cost, increase accuracy, and produce all possible itemsets.
    Matched MeSH terms: Data Mining/methods*
  3. Babajide Mustapha I, Saeed F
    Molecules, 2016 Jul 28;21(8).
    PMID: 27483216 DOI: 10.3390/molecules21080983
    Following the explosive growth in chemical and biological data, the shift from traditional methods of drug discovery to computer-aided means has made data mining and machine learning methods integral parts of today's drug discovery process. In this paper, extreme gradient boosting (Xgboost), which is an ensemble of Classification and Regression Tree (CART) and a variant of the Gradient Boosting Machine, was investigated for the prediction of biological activity based on quantitative description of the compound's molecular structure. Seven datasets, well known in the literature were used in this paper and experimental results show that Xgboost can outperform machine learning algorithms like Random Forest (RF), Support Vector Machines (LSVM), Radial Basis Function Neural Network (RBFN) and Naïve Bayes (NB) for the prediction of biological activities. In addition to its ability to detect minority activity classes in highly imbalanced datasets, it showed remarkable performance on both high and low diversity datasets.
    Matched MeSH terms: Data Mining/methods*
  4. Salih SQ, Alsewari AA, Wahab HA, Mohammed MKA, Rashid TA, Das D, et al.
    PLoS One, 2023;18(7):e0288044.
    PMID: 37406006 DOI: 10.1371/journal.pone.0288044
    The retrieval of important information from a dataset requires applying a special data mining technique known as data clustering (DC). DC classifies similar objects into a groups of similar characteristics. Clustering involves grouping the data around k-cluster centres that typically are selected randomly. Recently, the issues behind DC have called for a search for an alternative solution. Recently, a nature-based optimization algorithm named Black Hole Algorithm (BHA) was developed to address the several well-known optimization problems. The BHA is a metaheuristic (population-based) that mimics the event around the natural phenomena of black holes, whereby an individual star represents the potential solutions revolving around the solution space. The original BHA algorithm showed better performance compared to other algorithms when applied to a benchmark dataset, despite its poor exploration capability. Hence, this paper presents a multi-population version of BHA as a generalization of the BHA called MBHA wherein the performance of the algorithm is not dependent on the best-found solution but a set of generated best solutions. The method formulated was subjected to testing using a set of nine widespread and popular benchmark test functions. The ensuing experimental outcomes indicated the highly precise results generated by the method compared to BHA and comparable algorithms in the study, as well as excellent robustness. Furthermore, the proposed MBHA achieved a high rate of convergence on six real datasets (collected from the UCL machine learning lab), making it suitable for DC problems. Lastly, the evaluations conclusively indicated the appropriateness of the proposed algorithm to resolve DC issues.
    Matched MeSH terms: Data Mining/methods
  5. Al-Dabbagh MM, Salim N, Rehman A, Alkawaz MH, Saba T, Al-Rodhaan M, et al.
    ScientificWorldJournal, 2014;2014:612787.
    PMID: 25309952 DOI: 10.1155/2014/612787
    This paper presents a novel features mining approach from documents that could not be mined via optical character recognition (OCR). By identifying the intimate relationship between the text and graphical components, the proposed technique pulls out the Start, End, and Exact values for each bar. Furthermore, the word 2-gram and Euclidean distance methods are used to accurately detect and determine plagiarism in bar charts.
    Matched MeSH terms: Data Mining/methods*
  6. Jaber KM, Abdullah R, Rashid NA
    Int J Bioinform Res Appl, 2014;10(3):321-40.
    PMID: 24794073 DOI: 10.1504/IJBRA.2014.060765
    In recent times, the size of biological databases has increased significantly, with the continuous growth in the number of users and rate of queries; such that some databases have reached the terabyte size. There is therefore, the increasing need to access databases at the fastest rates possible. In this paper, the decision tree indexing model (PDTIM) was parallelised, using a hybrid of distributed and shared memory on resident database; with horizontal and vertical growth through Message Passing Interface (MPI) and POSIX Thread (PThread), to accelerate the index building time. The PDTIM was implemented using 1, 2, 4 and 5 processors on 1, 2, 3 and 4 threads respectively. The results show that the hybrid technique improved the speedup, compared to a sequential version. It could be concluded from results that the proposed PDTIM is appropriate for large data sets, in terms of index building time.
    Matched MeSH terms: Data Mining/methods*
  7. Khairudin NM, Mustapha A, Ahmad MH
    ScientificWorldJournal, 2014;2014:813983.
    PMID: 24587757 DOI: 10.1155/2014/813983
    The advent of web-based applications and services has created such diverse and voluminous web log data stored in web servers, proxy servers, client machines, or organizational databases. This paper attempts to investigate the effect of temporal attribute in relational rule mining for web log data. We incorporated the characteristics of time in the rule mining process and analysed the effect of various temporal parameters. The rules generated from temporal relational rule mining are then compared against the rules generated from the classical rule mining approach such as the Apriori and FP-Growth algorithms. The results showed that by incorporating the temporal attribute via time, the number of rules generated is subsequently smaller but is comparable in terms of quality.
    Matched MeSH terms: Data Mining/methods*
  8. Azadnia AH, Taheri S, Ghadimi P, Saman MZ, Wong KY
    ScientificWorldJournal, 2013;2013:246578.
    PMID: 23864823 DOI: 10.1155/2013/246578
    One of the cost-intensive issues in managing warehouses is the order picking problem which deals with the retrieval of items from their storage locations in order to meet customer requests. Many solution approaches have been proposed in order to minimize traveling distance in the process of order picking. However, in practice, customer orders have to be completed by certain due dates in order to avoid tardiness which is neglected in most of the related scientific papers. Consequently, we proposed a novel solution approach in order to minimize tardiness which consists of four phases. First of all, weighted association rule mining has been used to calculate associations between orders with respect to their due date. Next, a batching model based on binary integer programming has been formulated to maximize the associations between orders within each batch. Subsequently, the order picking phase will come up which used a Genetic Algorithm integrated with the Traveling Salesman Problem in order to identify the most suitable travel path. Finally, the Genetic Algorithm has been applied for sequencing the constructed batches in order to minimize tardiness. Illustrative examples and comparisons are presented to demonstrate the proficiency and solution quality of the proposed approach.
    Matched MeSH terms: Data Mining/methods*
  9. Albahri AS, Hamid RA, Alwan JK, Al-Qays ZT, Zaidan AA, Zaidan BB, et al.
    J Med Syst, 2020 May 25;44(7):122.
    PMID: 32451808 DOI: 10.1007/s10916-020-01582-x
    Coronaviruses (CoVs) are a large family of viruses that are common in many animal species, including camels, cattle, cats and bats. Animal CoVs, such as Middle East respiratory syndrome-CoV, severe acute respiratory syndrome (SARS)-CoV, and the new virus named SARS-CoV-2, rarely infect and spread among humans. On January 30, 2020, the International Health Regulations Emergency Committee of the World Health Organisation declared the outbreak of the resulting disease from this new CoV called 'COVID-19', as a 'public health emergency of international concern'. This global pandemic has affected almost the whole planet and caused the death of more than 315,131 patients as of the date of this article. In this context, publishers, journals and researchers are urged to research different domains and stop the spread of this deadly virus. The increasing interest in developing artificial intelligence (AI) applications has addressed several medical problems. However, such applications remain insufficient given the high potential threat posed by this virus to global public health. This systematic review addresses automated AI applications based on data mining and machine learning (ML) algorithms for detecting and diagnosing COVID-19. We aimed to obtain an overview of this critical virus, address the limitations of utilising data mining and ML algorithms, and provide the health sector with the benefits of this technique. We used five databases, namely, IEEE Xplore, Web of Science, PubMed, ScienceDirect and Scopus and performed three sequences of search queries between 2010 and 2020. Accurate exclusion criteria and selection strategy were applied to screen the obtained 1305 articles. Only eight articles were fully evaluated and included in this review, and this number only emphasised the insufficiency of research in this important area. After analysing all included studies, the results were distributed following the year of publication and the commonly used data mining and ML algorithms. The results found in all papers were discussed to find the gaps in all reviewed papers. Characteristics, such as motivations, challenges, limitations, recommendations, case studies, and features and classes used, were analysed in detail. This study reviewed the state-of-the-art techniques for CoV prediction algorithms based on data mining and ML assessment. The reliability and acceptability of extracted information and datasets from implemented technologies in the literature were considered. Findings showed that researchers must proceed with insights they gain, focus on identifying solutions for CoV problems, and introduce new improvements. The growing emphasis on data mining and ML techniques in medical fields can provide the right environment for change and improvement.
    Matched MeSH terms: Data Mining/methods*
  10. Yap KS, Lim CP, Au MT
    IEEE Trans Neural Netw, 2011 Dec;22(12):2310-23.
    PMID: 22067292 DOI: 10.1109/TNN.2011.2173502
    Generalized adaptive resonance theory (GART) is a neural network model that is capable of online learning and is effective in tackling pattern classification tasks. In this paper, we propose an improved GART model (IGART), and demonstrate its applicability to power systems. IGART enhances the dynamics of GART in several aspects, which include the use of the Laplacian likelihood function, a new vigilance function, a new match-tracking mechanism, an ordering algorithm for determining the sequence of training data, and a rule extraction capability to elicit if-then rules from the network. To assess the effectiveness of IGART and to compare its performances with those from other methods, three datasets that are related to power systems are employed. The experimental results demonstrate the usefulness of IGART with the rule extraction capability in undertaking classification problems in power systems engineering.
    Matched MeSH terms: Data Mining/methods*
  11. Mahmoodian H, Hamiruce Marhaban M, Abdulrahim R, Rosli R, Saripan I
    Australas Phys Eng Sci Med, 2011 Apr;34(1):41-54.
    PMID: 21327594 DOI: 10.1007/s13246-011-0054-8
    The classification of the cancer tumors based on gene expression profiles has been extensively studied in numbers of studies. A wide variety of cancer datasets have been implemented by the various methods of gene selection and classification to identify the behavior of the genes in tumors and find the relationships between them and outcome of diseases. Interpretability of the model, which is developed by fuzzy rules and linguistic variables in this study, has been rarely considered. In addition, creating a fuzzy classifier with high performance in classification that uses a subset of significant genes which have been selected by different types of gene selection methods is another goal of this study. A new algorithm has been developed to identify the fuzzy rules and significant genes based on fuzzy association rule mining. At first, different subset of genes which have been selected by different methods, were used to generate primary fuzzy classifiers separately and then proposed algorithm was implemented to mix the genes which have been associated in the primary classifiers and generate a new classifier. The results show that fuzzy classifier can classify the tumors with high performance while presenting the relationships between the genes by linguistic variables.
    Matched MeSH terms: Data Mining/methods
  12. Abdullah AA, Altaf-Ul-Amin M, Ono N, Sato T, Sugiura T, Morita AH, et al.
    Biomed Res Int, 2015;2015:139254.
    PMID: 26495281 DOI: 10.1155/2015/139254
    Volatile organic compounds (VOCs) are small molecules that exhibit high vapor pressure under ambient conditions and have low boiling points. Although VOCs contribute only a small proportion of the total metabolites produced by living organisms, they play an important role in chemical ecology specifically in the biological interactions between organisms and ecosystems. VOCs are also important in the health care field as they are presently used as a biomarker to detect various human diseases. Information on VOCs is scattered in the literature until now; however, there is still no available database describing VOCs and their biological activities. To attain this purpose, we have developed KNApSAcK Metabolite Ecology Database, which contains the information on the relationships between VOCs and their emitting organisms. The KNApSAcK Metabolite Ecology is also linked with the KNApSAcK Core and KNApSAcK Metabolite Activity Database to provide further information on the metabolites and their biological activities. The VOC database can be accessed online.
    Matched MeSH terms: Data Mining/methods*
  13. Shabanzadeh P, Yusof R
    Comput Math Methods Med, 2015;2015:802754.
    PMID: 26336509 DOI: 10.1155/2015/802754
    Unsupervised data classification (or clustering) analysis is one of the most useful tools and a descriptive task in data mining that seeks to classify homogeneous groups of objects based on similarity and is used in many medical disciplines and various applications. In general, there is no single algorithm that is suitable for all types of data, conditions, and applications. Each algorithm has its own advantages, limitations, and deficiencies. Hence, research for novel and effective approaches for unsupervised data classification is still active. In this paper a heuristic algorithm, Biogeography-Based Optimization (BBO) algorithm, was adapted for data clustering problems by modifying the main operators of BBO algorithm, which is inspired from the natural biogeography distribution of different species. Similar to other population-based algorithms, BBO algorithm starts with an initial population of candidate solutions to an optimization problem and an objective function that is calculated for them. To evaluate the performance of the proposed algorithm assessment was carried on six medical and real life datasets and was compared with eight well known and recent unsupervised data classification algorithms. Numerical results demonstrate that the proposed evolutionary optimization algorithm is efficient for unsupervised data classification.
    Matched MeSH terms: Data Mining/methods*
  14. Ashkani S, Yusop MR, Shabanimofrad M, Azady A, Ghasemzadeh A, Azizi P, et al.
    Curr Issues Mol Biol, 2015;17:57-73.
    PMID: 25706446
    Allele mining is a promising way to dissect naturally occurring allelic variants of candidate genes with essential agronomic qualities. With the identification, isolation and characterisation of blast resistance genes in rice, it is now possible to dissect the actual allelic variants of these genes within an array of rice cultivars via allele mining. Multiple alleles from the complex locus serve as a reservoir of variation to generate functional genes. The routine sequence exchange is one of the main mechanisms of R gene evolution and development. Allele mining for resistance genes can be an important method to identify additional resistance alleles and new haplotypes along with the development of allele-specific markers for use in marker-assisted selection. Allele mining can be visualised as a vital link between effective utilisation of genetic and genomic resources in genomics-driven modern plant breeding. This review studies the actual concepts and potential of mining approaches for the discovery of alleles and their utilisation for blast resistance genes in rice. The details provided here will be important to provide the rice breeder with a worthwhile introduction to allele mining and its methodology for breakthrough discovery of fresh alleles hidden in hereditary diversity, which is vital for crop improvement.
    Matched MeSH terms: Data Mining/methods*
  15. Kalid N, Zaidan AA, Zaidan BB, Salman OH, Hashim M, Muzammil H
    J Med Syst, 2017 Dec 29;42(2):30.
    PMID: 29288419 DOI: 10.1007/s10916-017-0883-4
    The growing worldwide population has increased the need for technologies, computerised software algorithms and smart devices that can monitor and assist patients anytime and anywhere and thus enable them to lead independent lives. The real-time remote monitoring of patients is an important issue in telemedicine. In the provision of healthcare services, patient prioritisation poses a significant challenge because of the complex decision-making process it involves when patients are considered 'big data'. To our knowledge, no study has highlighted the link between 'big data' characteristics and real-time remote healthcare monitoring in the patient prioritisation process, as well as the inherent challenges involved. Thus, we present comprehensive insights into the elements of big data characteristics according to the six 'Vs': volume, velocity, variety, veracity, value and variability. Each of these elements is presented and connected to a related part in the study of the connection between patient prioritisation and real-time remote healthcare monitoring systems. Then, we determine the weak points and recommend solutions as potential future work. This study makes the following contributions. (1) The link between big data characteristics and real-time remote healthcare monitoring in the patient prioritisation process is described. (2) The open issues and challenges for big data used in the patient prioritisation process are emphasised. (3) As a recommended solution, decision making using multiple criteria, such as vital signs and chief complaints, is utilised to prioritise the big data of patients with chronic diseases on the basis of the most urgent cases.
    Matched MeSH terms: Data Mining/methods*
  16. Mohamoud HS, Hussain MR, El-Harouni AA, Shaik NA, Qasmi ZU, Merican AF, et al.
    Comput Math Methods Med, 2014;2014:904052.
    PMID: 24723968 DOI: 10.1155/2014/904052
    GalNAc-T1, a key candidate of GalNac-transferases genes family that is involved in mucin-type O-linked glycosylation pathway, is expressed in most biological tissues and cell types. Despite the reported association of GalNAc-T1 gene mutations with human disease susceptibility, the comprehensive computational analysis of coding, noncoding and regulatory SNPs, and their functional impacts on protein level, still remains unknown. Therefore, sequence- and structure-based computational tools were employed to screen the entire listed coding SNPs of GalNAc-T1 gene in order to identify and characterize them. Our concordant in silico analysis by SIFT, PolyPhen-2, PANTHER-cSNP, and SNPeffect tools, identified the potential nsSNPs (S143P, G258V, and Y414D variants) from 18 nsSNPs of GalNAc-T1. Additionally, 2 regulatory SNPs (rs72964406 and #x26; rs34304568) were also identified in GalNAc-T1 by using FastSNP tool. Using multiple computational approaches, we have systematically classified the functional mutations in regulatory and coding regions that can modify expression and function of GalNAc-T1 enzyme. These genetic variants can further assist in better understanding the wide range of disease susceptibility associated with the mucin-based cell signalling and pathogenic binding, and may help to develop novel therapeutic elements for associated diseases.
    Matched MeSH terms: Data Mining/methods
Filters
Contact Us

Please provide feedback to Administrator (afdal@afpm.org.my)

External Links