MyMedR

Displaying publications 21 - 40 of 65 in total

Abstract:

Sort:

Fulltext Diabetes disease prediction system using HNB classifier based on discretization method

Al-Hameli BA, Alsewari AA, Basurra SS, Bhogal J, Ali MAH

J Integr Bioinform, 2023 Mar 01;20(1).
PMID: 36810102 DOI: 10.1515/jib-2021-0037

Diagnosing diabetes early is critical as it helps patients live with the disease in a healthy way - through healthy eating, taking appropriate medical doses, and making patients more vigilant in their movements/activities to avoid wounds that are difficult to heal for diabetic patients. Data mining techniques are typically used to detect diabetes with high confidence to avoid misdiagnoses with other chronic diseases whose symptoms are similar to diabetes. Hidden Naïve Bayes is one of the algorithms for classification, which works under a data-mining model based on the assumption of conditional independence of the traditional Naïve Bayes. The results from this research study, which was conducted on the Pima Indian Diabetes (PID) dataset collection, show that the prediction accuracy of the HNB classifier achieved 82%. As a result, the discretization method increases the performance and accuracy of the HNB classifier.

Matched MeSH terms: Data Mining
A data mining approach to analyze the role of biomacromolecules-based nanocomposites in sustainable packaging

Paul J, Jacob J, Mahmud M, Vaka M, Krishnan SG, Arifutzzaman A, et al.

Int J Biol Macromol, 2024 Apr;265(Pt 2):130850.
PMID: 38492706 DOI: 10.1016/j.ijbiomac.2024.130850

Recent decades have witnessed a surge in research interest in bio-nanocomposite-based packaging materials, but still, a lack of systematic analysis exists in this domain. Bio-based packaging materials pose a sustainable alternative to petroleum-based packaging materials. The current work employs bibliometric analysis to deliver a comprehensive outline on the role of bio nanocomposites in packaging. India, Iran, and China were revealed to be the top three nations actively engaged in this domain in total publications. Islamic Azad University in Iran and Universiti Putra Malaysia in Malaysia are among the world's best institutions in active research and publications in this field. The extensive collaboration between nations and institutions highlights the significance of a holistic approach towards bio-nanocomposite. The National Natural Science Foundation of China is the leading funding body in this field of research. Among authors, Jong whan Rhim secured the topmost citations (2234) in this domain (13 publications). Among journals, Carbohydrate Polymers secured the maximum citation count (4629) from 36 articles; the initial one was published in 2011. Bio nanocomposite is the most frequently used keyword. Researchers and policymakers focussing on sustainable packaging solutions will gain crucial insights on the current research status on packaging solutions using bio-nanocomposites from the conclusions.

Matched MeSH terms: Data Mining
Fulltext A Comparison Study on Similarity and Dissimilarity Measures in Clustering Continuous Data

Shirkhorshidi AS, Aghabozorgi S, Wah TY

PLoS One, 2015;10(12):e0144059.
PMID: 26658987 DOI: 10.1371/journal.pone.0144059

Similarity or distance measures are core components used by distance-based clustering algorithms to cluster similar data points into the same clusters, while dissimilar or distant data points are placed into different clusters. The performance of similarity measures is mostly addressed in two or three-dimensional spaces, beyond which, to the best of our knowledge, there is no empirical study that has revealed the behavior of similarity measures when dealing with high-dimensional datasets. To fill this gap, a technical framework is proposed in this study to analyze, compare and benchmark the influence of different similarity measures on the results of distance-based clustering algorithms. For reproducibility purposes, fifteen publicly available datasets were used for this study, and consequently, future distance measures can be evaluated and compared with the results of the measures discussed in this work. These datasets were classified as low and high-dimensional categories to study the performance of each measure against each category. This research should help the research community to identify suitable distance measures for datasets and also to facilitate a comparison and evaluation of the newly proposed similarity or distance measures with traditional ones.

Matched MeSH terms: Data Mining/statistics & numerical data*
Fulltext Intelligent bar chart plagiarism detection in documents

Al-Dabbagh MM, Salim N, Rehman A, Alkawaz MH, Saba T, Al-Rodhaan M, et al.

ScientificWorldJournal, 2014;2014:612787.
PMID: 25309952 DOI: 10.1155/2014/612787

This paper presents a novel features mining approach from documents that could not be mined via optical character recognition (OCR). By identifying the intimate relationship between the text and graphical components, the proposed technique pulls out the Start, End, and Exact values for each bar. Furthermore, the word 2-gram and Euclidean distance methods are used to accurately detect and determine plagiarism in bar charts.

Matched MeSH terms: Data Mining/methods*
Fast decision tree-based method to index large DNA-protein sequence databases using hybrid distributed-shared memory programming model

Jaber KM, Abdullah R, Rashid NA

Int J Bioinform Res Appl, 2014;10(3):321-40.
PMID: 24794073 DOI: 10.1504/IJBRA.2014.060765

In recent times, the size of biological databases has increased significantly, with the continuous growth in the number of users and rate of queries; such that some databases have reached the terabyte size. There is therefore, the increasing need to access databases at the fastest rates possible. In this paper, the decision tree indexing model (PDTIM) was parallelised, using a hybrid of distributed and shared memory on resident database; with horizontal and vertical growth through Message Passing Interface (MPI) and POSIX Thread (PThread), to accelerate the index building time. The PDTIM was implemented using 1, 2, 4 and 5 processors on 1, 2, 3 and 4 threads respectively. The results show that the hybrid technique improved the speedup, compared to a sequential version. It could be concluded from results that the proposed PDTIM is appropriate for large data sets, in terms of index building time.

Matched MeSH terms: Data Mining/methods*
Fulltext Effect of temporal relationships in associative rule mining for web log data

Khairudin NM, Mustapha A, Ahmad MH

ScientificWorldJournal, 2014;2014:813983.
PMID: 24587757 DOI: 10.1155/2014/813983

The advent of web-based applications and services has created such diverse and voluminous web log data stored in web servers, proxy servers, client machines, or organizational databases. This paper attempts to investigate the effect of temporal attribute in relational rule mining for web log data. We incorporated the characteristics of time in the rule mining process and analysed the effect of various temporal parameters. The rules generated from temporal relational rule mining are then compared against the rules generated from the classical rule mining approach such as the Apriori and FP-Growth algorithms. The results showed that by incorporating the temporal attribute via time, the number of rules generated is subsequently smaller but is comparable in terms of quality.

Matched MeSH terms: Data Mining/methods*
Fulltext Order batching in warehouses by minimizing total tardiness: a hybrid approach of weighted association rule mining and genetic algorithms

Azadnia AH, Taheri S, Ghadimi P, Saman MZ, Wong KY

ScientificWorldJournal, 2013;2013:246578.
PMID: 23864823 DOI: 10.1155/2013/246578

One of the cost-intensive issues in managing warehouses is the order picking problem which deals with the retrieval of items from their storage locations in order to meet customer requests. Many solution approaches have been proposed in order to minimize traveling distance in the process of order picking. However, in practice, customer orders have to be completed by certain due dates in order to avoid tardiness which is neglected in most of the related scientific papers. Consequently, we proposed a novel solution approach in order to minimize tardiness which consists of four phases. First of all, weighted association rule mining has been used to calculate associations between orders with respect to their due date. Next, a batching model based on binary integer programming has been formulated to maximize the associations between orders within each batch. Subsequently, the order picking phase will come up which used a Genetic Algorithm integrated with the Traveling Salesman Problem in order to identify the most suitable travel path. Finally, the Genetic Algorithm has been applied for sequencing the constructed batches in order to minimize tardiness. Illustrative examples and comparisons are presented to demonstrate the proficiency and solution quality of the proposed approach.

Matched MeSH terms: Data Mining/methods*
Development of expressed sequence tag resources for Vanda Mimi Palmer and data mining for EST-SSR

Teh SL, Chan WS, Abdullah JO, Namasivayam P

Mol Biol Rep, 2011 Aug;38(6):3903-9.
PMID: 21116862 DOI: 10.1007/s11033-010-0506-3

Vanda Mimi Palmer (VMP) is a highly sought as fragrant-orchid hybrid in Malaysia. It is economically important in cosmetic and beauty industries and also a famous potted ornamental plant. To date, no work on fragrance-related genes of vandaceous orchids has been reported from other research groups although the analysis of floral fragrance or volatiles have been extensively studied. An expressed sequence tag (EST) resource was developed for VMP principally to mine any potential fragrance-related expressed sequence tag-simple sequence repeat (EST-SSR) for future development as markers in the identification of fragrant vandaceous orchids endemic to Malaysia. Clustering, annotation and assembling of the ESTs identified 1,196 unigenes which defined 966 singletons and 230 contigs. The VMP dbEST was functionally classified by gene ontology (GO) into three groups: molecular functions (51.2%), cellular components (16.4%) and biological processes (24.6%) while the remaining 7.8% showed no hits with GO identifier. A total of 112 EST-SSR (9.4%) was mined on which at least five units of di-, tri-, tetra-, penta-, or hexa-nucleotide repeats were predicted. The di-nucleotide motif repeats appeared to be the most frequent repeats among the detected SSRs with the AT/TA types as the most abundant among the dimerics, while AAG/TTC, AGA/TCT-type were the most frequent trimerics. The mined EST-SSR is believed to be useful in the development of EST-SSR markers that is applicable in the screening and characterization of fragrance-related transcripts in closely related species.

Matched MeSH terms: Data Mining*
Fulltext Role of biological Data Mining and Machine Learning Techniques in Detecting and Diagnosing the Novel Coronavirus (COVID-19): A Systematic Review

Albahri AS, Hamid RA, Alwan JK, Al-Qays ZT, Zaidan AA, Zaidan BB, et al.

J Med Syst, 2020 May 25;44(7):122.
PMID: 32451808 DOI: 10.1007/s10916-020-01582-x

Coronaviruses (CoVs) are a large family of viruses that are common in many animal species, including camels, cattle, cats and bats. Animal CoVs, such as Middle East respiratory syndrome-CoV, severe acute respiratory syndrome (SARS)-CoV, and the new virus named SARS-CoV-2, rarely infect and spread among humans. On January 30, 2020, the International Health Regulations Emergency Committee of the World Health Organisation declared the outbreak of the resulting disease from this new CoV called 'COVID-19', as a 'public health emergency of international concern'. This global pandemic has affected almost the whole planet and caused the death of more than 315,131 patients as of the date of this article. In this context, publishers, journals and researchers are urged to research different domains and stop the spread of this deadly virus. The increasing interest in developing artificial intelligence (AI) applications has addressed several medical problems. However, such applications remain insufficient given the high potential threat posed by this virus to global public health. This systematic review addresses automated AI applications based on data mining and machine learning (ML) algorithms for detecting and diagnosing COVID-19. We aimed to obtain an overview of this critical virus, address the limitations of utilising data mining and ML algorithms, and provide the health sector with the benefits of this technique. We used five databases, namely, IEEE Xplore, Web of Science, PubMed, ScienceDirect and Scopus and performed three sequences of search queries between 2010 and 2020. Accurate exclusion criteria and selection strategy were applied to screen the obtained 1305 articles. Only eight articles were fully evaluated and included in this review, and this number only emphasised the insufficiency of research in this important area. After analysing all included studies, the results were distributed following the year of publication and the commonly used data mining and ML algorithms. The results found in all papers were discussed to find the gaps in all reviewed papers. Characteristics, such as motivations, challenges, limitations, recommendations, case studies, and features and classes used, were analysed in detail. This study reviewed the state-of-the-art techniques for CoV prediction algorithms based on data mining and ML assessment. The reliability and acceptability of extracted information and datasets from implemented technologies in the literature were considered. Findings showed that researchers must proceed with insights they gain, focus on identifying solutions for CoV problems, and introduce new improvements. The growing emphasis on data mining and ML techniques in medical fields can provide the right environment for change and improvement.

Matched MeSH terms: Data Mining/methods*
Application of a GIS-/remote sensing-based approach for predicting groundwater potential zones using a multi-criteria data mining methodology

Mogaji KA, Lim HS

Environ Monit Assess, 2017 Jul;189(7):321.
PMID: 28593561 DOI: 10.1007/s10661-017-5990-7

This study integrates the application of Dempster-Shafer-driven evidential belief function (DS-EBF) methodology with remote sensing and geographic information system techniques to analyze surface and subsurface data sets for the spatial prediction of groundwater potential in Perak Province, Malaysia. The study used additional data obtained from the records of the groundwater yield rate of approximately 28 bore well locations. The processed surface and subsurface data produced sets of groundwater potential conditioning factors (GPCFs) from which multiple surface hydrologic and subsurface hydrogeologic parameter thematic maps were generated. The bore well location inventories were partitioned randomly into a ratio of 70% (19 wells) for model training to 30% (9 wells) for model testing. Application results of the DS-EBF relationship model algorithms of the surface- and subsurface-based GPCF thematic maps and the bore well locations produced two groundwater potential prediction (GPP) maps based on surface hydrologic and subsurface hydrogeologic characteristics which established that more than 60% of the study area falling within the moderate-high groundwater potential zones and less than 35% falling within the low potential zones. The estimated uncertainty values within the range of 0 to 17% for the predicted potential zones were quantified using the uncertainty algorithm of the model. The validation results of the GPP maps using relative operating characteristic curve method yielded 80 and 68% success rates and 89 and 53% prediction rates for the subsurface hydrogeologic factor (SUHF)- and surface hydrologic factor (SHF)-based GPP maps, respectively. The study results revealed that the SUHF-based GPP map accurately delineated groundwater potential zones better than the SHF-based GPP map. However, significant information on the low degree of uncertainty of the predicted potential zones established the suitability of the two GPP maps for future development of groundwater resources in the area. The overall results proved the efficacy of the data mining model and the geospatial technology in groundwater potential mapping.

Matched MeSH terms: Data Mining*
An efficient data mining framework for the characterization of symptomatic and asymptomatic carotid plaque using bidimensional empirical mode decomposition technique

Molinari F, Raghavendra U, Gudigar A, Meiburger KM, Rajendra Acharya U

Med Biol Eng Comput, 2018 Sep;56(9):1579-1593.
PMID: 29473126 DOI: 10.1007/s11517-018-1792-5

Atherosclerosis is a type of cardiovascular disease which may cause stroke. It is due to the deposition of fatty plaque in the artery walls resulting in the reduction of elasticity gradually and hence restricting the blood flow to the heart. Hence, an early prediction of carotid plaque deposition is important, as it can save lives. This paper proposes a novel data mining framework for the assessment of atherosclerosis in its early stage using ultrasound images. In this work, we are using 1353 symptomatic and 420 asymptomatic carotid plaque ultrasound images. Our proposed method classifies the symptomatic and asymptomatic carotid plaques using bidimensional empirical mode decomposition (BEMD) and entropy features. The unbalanced data samples are compensated using adaptive synthetic sampling (ADASYN), and the developed method yielded a promising accuracy of 91.43%, sensitivity of 97.26%, and specificity of 83.22% using fourteen features. Hence, the proposed method can be used as an assisting tool during the regular screening of carotid arteries in hospitals. Graphical abstract Outline for our efficient data mining framework for the characterization of symptomatic and asymptomatic carotid plaques.

Matched MeSH terms: Data Mining*
Construction and analysis of protein-protein interaction network to identify the molecular mechanism in laryngeal cancer

Sarahani Harun, Nurulisa Zulkifle

Sains Malaysiana, 2018;47:2933-2940.

Laryngeal cancer is the most common head and neck cancer in the world and its incidence is on the rise. However, the
molecular mechanism underlying laryngeal cancer pathogenesis is poorly understood. The goal of this study was to
develop a protein-protein interaction (PPI) network for laryngeal cancer to predict the biological pathways that underlie
the molecular complexes in the network. Genes involved in laryngeal cancer were extracted from the OMIM database
and their interaction partners were identified via text and data mining using Agilent Literature Search, STRING and
GeneMANIA. PPI network was then integrated and visualised using Cytoscape ver3.6.0. Molecular complexes in the
network were predicted by MCODE plugin and functional enrichment analyses of the molecular complexes were performed
using BiNGO. 28 laryngeal cancer-related genes were present in the OMIM database. The PPI network associated with
laryngeal cancer contained 161 nodes, 661 edges and five molecular complexes. Some of the complexes were related to
the biological behaviour of cancer, providing the foundation for further understanding of the mechanism of laryngeal
cancer development and progression.

Matched MeSH terms: Data Mining
Natural Sirtuin Modulators in Drug Discovery: A Review (2010 -2020)

Chang Y, Yeong KY

Curr Med Chem, 2021 Mar 29.
PMID: 33781187 DOI: 10.2174/0929867328666210329124415

There have been intense research interests in sirtuins since the establishment of their regulatory roles in a myriad of pathological processes. In the last two decades, much research efforts have been dedicated to the development of sirtuin modulators. Although synthetic sirtuin modulators are the focus, natural modulators remain an integral part to be further explored in this area as they are found to possess therapeutic potential in various diseases including cancers, neurodegenerative diseases, and metabolic disorders. Owing to the importance of this cluster of compounds, this review gives a current stand on the naturally occurring sirtuin modulators, , associated molecular mechanisms and their therapeutic benefits.. Furthermore, comprehensive data mining resulted in detailed statistical data analyses pertaining to the development trend of sirtuin modulators from 2010-2020. Lastly, the challenges and future prospect of natural sirtuin modulators in drug discovery will also be discussed.

Matched MeSH terms: Data Mining
Suicidal behaviour prediction models using machine learning techniques: A systematic review

Nordin N, Zainol Z, Mohd Noor MH, Chan LF

Artif Intell Med, 2022 10;132:102395.
PMID: 36207078 DOI: 10.1016/j.artmed.2022.102395

BACKGROUND: Early detection and prediction of suicidal behaviour are key factors in suicide control. In conjunction with recent advances in the field of artificial intelligence, there is increasing research into how machine learning can assist in the detection, prediction and treatment of suicidal behaviour. Therefore, this study aims to provide a comprehensive review of the literature exploring machine learning techniques in the study of suicidal behaviour prediction.
METHODS: A search of four databases was conducted: Web of Science, PubMed, Dimensions, and Scopus for research papers dated between January 2016 and September 2021. The search keywords are 'data mining', 'machine learning' in combination with 'suicidal behaviour', 'suicide', 'suicide attempt', 'suicidal ideation', 'suicide plan' and 'self-harm'. The studies that used machine learning techniques were synthesized according to the countries of the articles, sample description, sample size, classification tasks, number of features used to develop the models, types of machine learning techniques, and evaluation of performance metrics.
RESULTS: Thirty-five empirical articles met the criteria to be included in the current review. We provide a general overview of machine learning techniques, examine the feature categories, describe methodological challenges, and suggest areas for improvement and research directions. Ensemble prediction models have been shown to be more accurate and useful than single prediction models.
CONCLUSIONS: Machine learning has great potential for improving estimates of future suicidal behaviour and monitoring changes in risk over time. Further research can address important challenges and potential opportunities that may contribute to significant advances in suicide prediction.

Matched MeSH terms: Data Mining
Fulltext Natural language processing in narrative breast radiology reporting in University Malaya Medical Centre

Tan WM, Ng WL, Ganggayah MD, Hoe VCW, Rahmat K, Zaini HS, et al.

Health Informatics J, 2023;29(3):14604582231203763.
PMID: 37740904 DOI: 10.1177/14604582231203763

Radiology reporting is narrative, and its content depends on the clinician's ability to interpret the images accurately. A tertiary hospital, such as anonymous institute, focuses on writing reports narratively as part of training for medical personnel. Nevertheless, free-text reports make it inconvenient to extract information for clinical audits and data mining. Therefore, we aim to convert unstructured breast radiology reports into structured formats using natural language processing (NLP) algorithm. This study used 327 de-identified breast radiology reports from the anonymous institute. The radiologist identified the significant data elements to be extracted. Our NLP algorithm achieved 97% and 94.9% accuracy in training and testing data, respectively. Henceforth, the structured information was used to build the predictive model for predicting the value of the BIRADS category. The model based on random forest generated the highest accuracy of 92%. Our study not only fulfilled the demands of clinicians by enhancing communication between medical personnel, but it also demonstrated the usefulness of mineable structured data in yielding significant insights.

Matched MeSH terms: Data Mining
Fulltext An Efficient Optimization Method for Solving Unsupervised Data Classification Problems

Shabanzadeh P, Yusof R

Comput Math Methods Med, 2015;2015:802754.
PMID: 26336509 DOI: 10.1155/2015/802754

Unsupervised data classification (or clustering) analysis is one of the most useful tools and a descriptive task in data mining that seeks to classify homogeneous groups of objects based on similarity and is used in many medical disciplines and various applications. In general, there is no single algorithm that is suitable for all types of data, conditions, and applications. Each algorithm has its own advantages, limitations, and deficiencies. Hence, research for novel and effective approaches for unsupervised data classification is still active. In this paper a heuristic algorithm, Biogeography-Based Optimization (BBO) algorithm, was adapted for data clustering problems by modifying the main operators of BBO algorithm, which is inspired from the natural biogeography distribution of different species. Similar to other population-based algorithms, BBO algorithm starts with an initial population of candidate solutions to an optimization problem and an objective function that is calculated for them. To evaluate the performance of the proposed algorithm assessment was carried on six medical and real life datasets and was compared with eight well known and recent unsupervised data classification algorithms. Numerical results demonstrate that the proposed evolutionary optimization algorithm is efficient for unsupervised data classification.

Matched MeSH terms: Data Mining/methods*; Data Mining/statistics & numerical data
An evolution of image source camera attribution approaches

Jahanirad M, Wahab AW, Anuar NB

Forensic Sci Int, 2016 May;262:242-75.
PMID: 27060542 DOI: 10.1016/j.forsciint.2016.03.035

Camera attribution plays an important role in digital image forensics by providing the evidence and distinguishing characteristics of the origin of the digital image. It allows the forensic analyser to find the possible source camera which captured the image under investigation. However, in real-world applications, these approaches have faced many challenges due to the large set of multimedia data publicly available through photo sharing and social network sites, captured with uncontrolled conditions and undergone variety of hardware and software post-processing operations. Moreover, the legal system only accepts the forensic analysis of the digital image evidence if the applied camera attribution techniques are unbiased, reliable, nondestructive and widely accepted by the experts in the field. The aim of this paper is to investigate the evolutionary trend of image source camera attribution approaches from fundamental to practice, in particular, with the application of image processing and data mining techniques. Extracting implicit knowledge from images using intrinsic image artifacts for source camera attribution requires a structured image mining process. In this paper, we attempt to provide an introductory tutorial on the image processing pipeline, to determine the general classification of the features corresponding to different components for source camera attribution. The article also reviews techniques of the source camera attribution more comprehensively in the domain of the image forensics in conjunction with the presentation of classifying ongoing developments within the specified area. The classification of the existing source camera attribution approaches is presented based on the specific parameters, such as colour image processing pipeline, hardware- and software-related artifacts and the methods to extract such artifacts. The more recent source camera attribution approaches, which have not yet gained sufficient attention among image forensics researchers, are also critically analysed and further categorised into four different classes, namely, optical aberrations based, sensor camera fingerprints based, processing statistics based and processing regularities based, to present a classification. Furthermore, this paper aims to investigate the challenging problems, and the proposed strategies of such schemes based on the suggested taxonomy to plot an evolution of the source camera attribution approaches with respect to the subjective optimisation criteria over the last decade. The optimisation criteria were determined based on the strategies proposed to increase the detection accuracy, robustness and computational efficiency of source camera brand, model or device attribution.

Matched MeSH terms: Data Mining
Fulltext A predictive method for hepatitis disease diagnosis using ensembles of neuro-fuzzy technique

Nilashi M, Ahmadi H, Shahmoradi L, Ibrahim O, Akbari E

J Infect Public Health, 2018 10 04;12(1):13-20.
PMID: 30293875 DOI: 10.1016/j.jiph.2018.09.009

BACKGROUND: Hepatitis is an inflammation of the liver, most commonly caused by a viral infection. Supervised data mining techniques have been successful in hepatitis disease diagnosis through a set of datasets. Many methods have been developed by the aids of data mining techniques for hepatitis disease diagnosis. The majority of these methods are developed by single learning techniques. In addition, these methods do not support the ensemble learning of the data. Combining the outputs of several predictors can result in improved accuracy in classification problems. This study aims to propose an accurate method for the hepatitis disease diagnosis by taking the advantages of ensemble learning.
METHODS: We use Non-linear Iterative Partial Least Squares to perform the data dimensionality reduction, Self-Organizing Map technique for clustering task and ensembles of Neuro-Fuzzy Inference System for predicting the hepatitis disease. We also use decision trees for the selection of most important features in the experimental dataset. We test our method on a real-world dataset and present our results in comparison with the latest results of previous studies.
RESULTS: The results of our analyses on the dataset demonstrated that our method performance is superior to the Neural Network, ANFIS, K-Nearest Neighbors and Support Vector Machine.
CONCLUSIONS: The method has potential to be used as an intelligent learning system for hepatitis disease diagnosis in the healthcare.

Matched MeSH terms: Data Mining
Feature selection algorithms for Malaysian dengue outbreak detection model

Husam IS, Abuhamad, Azuraliza Abu Bakar, Suhaila Zainudin, Mazrura Sahani, Zainudin Mohd Ali

Sains Malaysiana, 2017;46:255-265.

Dengue fever is considered as one of the most common mosquito borne diseases worldwide. Dengue outbreak detection can be very useful in terms of practical efforts to overcome the rapid spread of the disease by providing the knowledge to predict the next outbreak occurrence. Many studies have been conducted to model and predict dengue outbreak using different data mining techniques. This research aimed to identify the best features that lead to better predictive accuracy of dengue outbreaks using three different feature selection algorithms; particle swarm optimization (PSO), genetic algorithm (GA) and rank search (RS). Based on the selected features, three predictive modeling techniques (J48, DTNB and Naive Bayes) were applied for dengue outbreak detection. The dataset used in this research was obtained from the Public Health Department, Seremban, Negeri Sembilan, Malaysia. The experimental results showed that the predictive accuracy was improved by applying feature selection process before the predictive modeling process. The study also showed the set of features to represent dengue outbreak detection for Malaysian health agencies.

Matched MeSH terms: Data Mining
Fulltext BioHackathon 2015: Semantics of data for life sciences and reproducible research

Vos RA, Katayama T, Mishima H, Kawano S, Kawashima S, Kim JD, et al.

F1000Res, 2020;9:136.
PMID: 32308977 DOI: 10.12688/f1000research.18236.1

We report on the activities of the 2015 edition of the BioHackathon, an annual event that brings together researchers and developers from around the world to develop tools and technologies that promote the reusability of biological data. We discuss issues surrounding the representation, publication, integration, mining and reuse of biological data and metadata across a wide range of biomedical data types of relevance for the life sciences, including chemistry, genotypes and phenotypes, orthology and phylogeny, proteomics, genomics, glycomics, and metabolomics. We describe our progress to address ongoing challenges to the reusability and reproducibility of research results, and identify outstanding issues that continue to impede the progress of bioinformatics research. We share our perspective on the state of the art, continued challenges, and goals for future research and development for the life sciences Semantic Web.

Matched MeSH terms: Data Mining

Filters

Please provide feedback to Administrator (afdal@afpm.org.my)

External Links