MyMedR

Displaying all 3 publications

Abstract:

Sort:

Fulltext A Comparison Study on Similarity and Dissimilarity Measures in Clustering Continuous Data

Shirkhorshidi AS, Aghabozorgi S, Wah TY

PLoS One, 2015;10(12):e0144059.
PMID: 26658987 DOI: 10.1371/journal.pone.0144059

Similarity or distance measures are core components used by distance-based clustering algorithms to cluster similar data points into the same clusters, while dissimilar or distant data points are placed into different clusters. The performance of similarity measures is mostly addressed in two or three-dimensional spaces, beyond which, to the best of our knowledge, there is no empirical study that has revealed the behavior of similarity measures when dealing with high-dimensional datasets. To fill this gap, a technical framework is proposed in this study to analyze, compare and benchmark the influence of different similarity measures on the results of distance-based clustering algorithms. For reproducibility purposes, fifteen publicly available datasets were used for this study, and consequently, future distance measures can be evaluated and compared with the results of the measures discussed in this work. These datasets were classified as low and high-dimensional categories to study the performance of each measure against each category. This research should help the research community to identify suitable distance measures for datasets and also to facilitate a comparison and evaluation of the newly proposed similarity or distance measures with traditional ones.

Matched MeSH terms: Data Mining/statistics & numerical data*
Fulltext An Efficient Optimization Method for Solving Unsupervised Data Classification Problems

Shabanzadeh P, Yusof R

Comput Math Methods Med, 2015;2015:802754.
PMID: 26336509 DOI: 10.1155/2015/802754

Unsupervised data classification (or clustering) analysis is one of the most useful tools and a descriptive task in data mining that seeks to classify homogeneous groups of objects based on similarity and is used in many medical disciplines and various applications. In general, there is no single algorithm that is suitable for all types of data, conditions, and applications. Each algorithm has its own advantages, limitations, and deficiencies. Hence, research for novel and effective approaches for unsupervised data classification is still active. In this paper a heuristic algorithm, Biogeography-Based Optimization (BBO) algorithm, was adapted for data clustering problems by modifying the main operators of BBO algorithm, which is inspired from the natural biogeography distribution of different species. Similar to other population-based algorithms, BBO algorithm starts with an initial population of candidate solutions to an optimization problem and an objective function that is calculated for them. To evaluate the performance of the proposed algorithm assessment was carried on six medical and real life datasets and was compared with eight well known and recent unsupervised data classification algorithms. Numerical results demonstrate that the proposed evolutionary optimization algorithm is efficient for unsupervised data classification.

Matched MeSH terms: Data Mining/statistics & numerical data
A new machine learning technique for an accurate diagnosis of coronary artery disease

Abdar M, Książek W, Acharya UR, Tan RS, Makarenkov V, Pławiak P

Comput Methods Programs Biomed, 2019 Oct;179:104992.
PMID: 31443858 DOI: 10.1016/j.cmpb.2019.104992

BACKGROUND AND OBJECTIVE: Coronary artery disease (CAD) is one of the commonest diseases around the world. An early and accurate diagnosis of CAD allows a timely administration of appropriate treatment and helps to reduce the mortality. Herein, we describe an innovative machine learning methodology that enables an accurate detection of CAD and apply it to data collected from Iranian patients.
METHODS: We first tested ten traditional machine learning algorithms, and then the three-best performing algorithms (three types of SVM) were used in the rest of the study. To improve the performance of these algorithms, a data preprocessing with normalization was carried out. Moreover, a genetic algorithm and particle swarm optimization, coupled with stratified 10-fold cross-validation, were used twice: for optimization of classifier parameters and for parallel selection of features.
RESULTS: The presented approach enhanced the performance of all traditional machine learning algorithms used in this study. We also introduced a new optimization technique called N2Genetic optimizer (a new genetic training). Our experiments demonstrated that N2Genetic-nuSVM provided the accuracy of 93.08% and F1-score of 91.51% when predicting CAD outcomes among the patients included in a well-known Z-Alizadeh Sani dataset. These results are competitive and comparable to the best results in the field.
CONCLUSIONS: We showed that machine-learning techniques optimized by the proposed approach, can lead to highly accurate models intended for both clinical and research use.

Matched MeSH terms: Data Mining/statistics & numerical data

Filters

Please provide feedback to Administrator (afdal@afpm.org.my)

External Links