Displaying publications 21 - 40 of 417 in total

Abstract:
Sort:
  1. Fawaz Al-badaii, Azhar Abdul Halim, Mohammad Shuhaimi-othman
    Sains Malaysiana, 2016;45:841-852.
    The study to determine the concentrations of dissolved heavy metals in the Sungai Semenyih and to use the environmetric
    methods to evaluate the influence of different pollution sources on heavy metals concentrations was carried out. Cluster
    analysis (CA) classified 8 sampling stations into two clusters based on the similarity of sampling stations characteristics,
    cluster 1 included stations 1, 2, 3 and 4 (low pollution area), whereas cluster 2 comprised of stations 5, 6, 7 and 8
    (high pollution area). Principal component analysis (PCA) of the two datasets yield two factors for low pollution area
    and three factors for the high pollution area at Eigenvalues >1, representing 92.544% and 100% of the total variance
    in each heavy metals data sets and allowed to gather selected heavy metals based on the anthropogenic and lithologic
    sources of contamination.
    Matched MeSH terms: Cluster Analysis
  2. Harun S, Rohani ER, Ohme-Takagi M, Goh HH, Mohamed-Hussein ZA
    J Plant Res, 2021 Mar;134(2):327-339.
    PMID: 33558947 DOI: 10.1007/s10265-021-01257-9
    Glucosinolates (GSLs) are plant secondary metabolites consisting of sulfur and nitrogen, commonly found in Brassicaceae crops, such as Arabidopsis thaliana. These compounds are known for their roles in plant defense mechanisms against pests and pathogens. 'Guilt-by-association' (GBA) approach predicts genes encoding proteins with similar function tend to share gene expression pattern generated from high throughput sequencing data. Recent studies have successfully identified GSL genes using GBA approach, followed by targeted verification of gene expression and metabolite data. Therefore, a GSL co-expression network was constructed using known GSL genes obtained from our in-house database, SuCComBase. DPClusO was used to identify subnetworks of the GSL co-expression network followed by Fisher's exact test leading to the discovery of a potential gene that encodes the ARIA-interacting double AP2-domain protein (ADAP) transcription factor (TF). Further functional analysis was performed using an effective gene silencing system known as CRES-T. By applying CRES-T, ADAP TF gene was fused to a plant-specific EAR-motif repressor domain (SRDX), which suppresses the expression of ADAP target genes. In this study, ADAP was proposed as a negative regulator in aliphatic GSL biosynthesis due to the over-expression of downstream aliphatic GSL genes (UGT74C1 and IPMI1) in ADAP-SRDX line. The significant over-expression of ADAP gene in the ADAP-SRDX line also suggests the behavior of the TF that negatively affects the expression of UGT74C1 and IPMI1 via a feedback mechanism in A. thaliana.
    Matched MeSH terms: Cluster Analysis
  3. Safuan S, Edinur HA
    Acta Biomed, 2020 11 10;91(4):e2020154.
    PMID: 33525245 DOI: 10.23750/abm.v91i4.10345
    .
    Matched MeSH terms: Cluster Analysis
  4. Honar Pajooh H, Rashid M, Alam F, Demidenko S
    Sensors (Basel), 2021 Jan 24;21(3).
    PMID: 33498860 DOI: 10.3390/s21030772
    The proliferation of smart devices in the Internet of Things (IoT) networks creates significant security challenges for the communications between such devices. Blockchain is a decentralized and distributed technology that can potentially tackle the security problems within the 5G-enabled IoT networks. This paper proposes a Multi layer Blockchain Security model to protect IoT networks while simplifying the implementation. The concept of clustering is utilized in order to facilitate the multi-layer architecture. The K-unknown clusters are defined within the IoT network by applying techniques that utillize a hybrid Evolutionary Computation Algorithm while using Simulated Annealing and Genetic Algorithms. The chosen cluster heads are responsible for local authentication and authorization. Local private blockchain implementation facilitates communications between the cluster heads and relevant base stations. Such a blockchain enhances credibility assurance and security while also providing a network authentication mechanism. The open-source Hyperledger Fabric Blockchain platform is deployed for the proposed model development. Base stations adopt a global blockchain approach to communicate with each other securely. The simulation results demonstrate that the proposed clustering algorithm performs well when compared to the earlier reported approaches. The proposed lightweight blockchain model is also shown to be better suited to balance network latency and throughput as compared to a traditional global blockchain.
    Matched MeSH terms: Cluster Analysis
  5. Danial M, Arulappen AL, Ch'ng ASH, Looi I
    J Glob Health, 2020 Dec;10(2):0203105.
    PMID: 33403108 DOI: 10.7189/jogh.10.0203105
    Matched MeSH terms: Cluster Analysis
  6. Siddiqui FS, Nerali JT, Telang LA
    J Educ Health Promot, 2021;10:105.
    PMID: 34084852 DOI: 10.4103/jehp.jehp_758_20
    BACKGROUND: Stress and low psychological well-being among students in higher education impact their academic performance. The purpose of this study was to determine the relationship between SOC, SDLR, and academic performance in year 3, 4, and 5 undergraduate dental students.

    MATERIALS AND METHODS: Two hundred and ten students completed a validated questionnaire on SOC and SDLR. The percentage of marks obtained by these students in their year-end examination was used as their academic performance. The SOC scores were further divided into three hierarchical clusters using cluster analysis. The data were analyzed to determine the difference in the SDLR scores and academic performance among the three clusters. Furthermore, the relationship between SOC scores, SDLR scores, and academic performance was assessed.

    RESULTS: The SDLR scores significantly increased from the low SOC cluster to the high SOC cluster (P = 0.026). However, there was no significant change in academic performance. A positive relationship was found between the SOC and the academic performance (R = +0.025; P > 0.05). The SDLR had a significant positive relationship with both SOC and academic performance (R = +0.27; P < 0.001).

    CONCLUSION: Although SOC may not have a direct influence on academic performance, SDLR can play an intermediary role. Early identification and timely intervention in students with a weak SOC and low SDLR can have a beneficial influence on their academic life.

    Matched MeSH terms: Cluster Analysis
  7. Uddin J, Ghazali R, Deris MM
    PLoS One, 2017;12(1):e0164803.
    PMID: 28068344 DOI: 10.1371/journal.pone.0164803
    Clustering a set of objects into homogeneous groups is a fundamental operation in data mining. Recently, many attentions have been put on categorical data clustering, where data objects are made up of non-numerical attributes. For categorical data clustering the rough set based approaches such as Maximum Dependency Attribute (MDA) and Maximum Significance Attribute (MSA) has outperformed their predecessor approaches like Bi-Clustering (BC), Total Roughness (TR) and Min-Min Roughness(MMR). This paper presents the limitations and issues of MDA and MSA techniques on special type of data sets where both techniques fails to select or faces difficulty in selecting their best clustering attribute. Therefore, this analysis motivates the need to come up with better and more generalize rough set theory approach that can cope the issues with MDA and MSA. Hence, an alternative technique named Maximum Indiscernible Attribute (MIA) for clustering categorical data using rough set indiscernible relations is proposed. The novelty of the proposed approach is that, unlike other rough set theory techniques, it uses the domain knowledge of the data set. It is based on the concept of indiscernibility relation combined with a number of clusters. To show the significance of proposed approach, the effect of number of clusters on rough accuracy, purity and entropy are described in the form of propositions. Moreover, ten different data sets from previously utilized research cases and UCI repository are used for experiments. The results produced in tabular and graphical forms shows that the proposed MIA technique provides better performance in selecting the clustering attribute in terms of purity, entropy, iterations, time, accuracy and rough accuracy.
    Matched MeSH terms: Cluster Analysis
  8. Nuryazmin Ahmat Zainuri, Abdul Aziz Jemain, Nora Muda
    Sains Malaysiana, 2015;44:449-456.
    This paper presents various imputation methods for air quality data specifically in Malaysia. The main objective was to
    select the best method of imputation and to compare whether there was any difference in the methods used between stations
    in Peninsular Malaysia. Missing data for various cases are randomly simulated with 5, 10, 15, 20, 25 and 30% missing.
    Six methods used in this paper were mean and median substitution, expectation-maximization (EM) method, singular
    value decomposition (SVD), K-nearest neighbour (KNN) method and sequential K-nearest neighbour (SKNN) method. The
    performance of the imputations is compared using the performance indicator: The correlation coefficient (R), the index
    of agreement (d) and the mean absolute error (MAE). Based on the result obtained, it can be concluded that EM, KNN
    and SKNN are the three best methods. The same result are obtained for all the eight monitoring station used in this study.
    Matched MeSH terms: Cluster Analysis
  9. Nur Riza M. Suradi, Teh SL
    This paper discusses the multilevel approach in constructing a model for estimating hierarchically structured data of students' performance. Multilevel models that take into account variation from the clustering of data in different levels are compared to regression models using least squares method. This study also estimates the contributions of gender and ethnic factors on students' performance. Performance data of866 students in a science faculty in an institution of higher learning is obtained and analyzed. This data is hierarchically structured with two levels, namely students and departments. Analysis findings show different parameter estimates for both models. Also, the multilevel model which incorporates variability from different levels and predictors from higher levels is found to provide a better fit for model explaining students' performance.
    [Rencana ini membincangkan pendekatan multitahap dalam pembinaan model penganggaran pencapaian pelajar yang mempunyai struktur data hierarki. Model multitahap yang mengambil kira variasi data yang berpunca dari pengelompokan data pada tahap-tahap yang berbeza dibandingkan dengan model regresi linear yang menggunakan kaedah kuasa dua terkecil. Seterusnya kajian ini menganggar sumbangan faktor jantina dan etnik ke atas pencapaian pelajar. Data pencapaian akademik seramai 866 pelajar fakulti sains di sebuah institusi pengajian tinggi telah diperoleh dan dianalisis. Data pelajar ini berstruktur hierarki dengan dua tahap, iaitu pelajar dan jabatan. Hasil kajian menunjukkan kedua-dua kaedah memberikan penganggaran yang berbeza. Malah, didapati model multitahap yang memasukkan variasi dari tahap-tahap berlainan dan pembolehubah peramal dari tahap yang lebih tinggi memberikan padanan model lebih baik bagi menerangkan pencapaian pelajar].
    Matched MeSH terms: Cluster Analysis
  10. Chin WC, Zaidi Isa, Abu Hassan Shaari Mohd. Nor
    Sains Malaysiana, 2008;37:233-237.
    This article study the influences of structural break to the fractionally integrated time-varying volatility model in Malaysian stock markets from year 1996 to 2006. A fractionally integrated autoregressive conditional heteroscedastic (FIGARCH) model combines with sudden changes of volatility is develops to study the possibility of structural change in Asian financial crisis and currency crisis. Our empirical results evidence substantially reduction in long memory clustering volatility after the inclusion of sudden changes in the volatility. Finally, the estimation, diagnostic and model selection evaluations indicate that the fractionally integrated model with structural change is out-performed compared to the standard model.
    Matched MeSH terms: Cluster Analysis
  11. Kalafi EY, Anuar MK, Sakharkar MK, Dhillon SK
    Folia Biol. (Praha), 2018;64(4):137-143.
    PMID: 30724159
    The process of manual species identification is a daunting task, so much so that the number of taxonomists is seen to be declining. In order to assist taxonomists, many methods and algorithms have been proposed to develop semi-automated and fully automated systems for species identification. While semi-automated tools would require manual intervention by a domain expert, fully automated tools are assumed to be not as reliable as manual or semiautomated identification tools. Hence, in this study we investigate the accuracy of fully automated and semi-automated models for species identification. We have built fully automated and semi-automated species classification models using the monogenean species image dataset. With respect to monogeneans' morphology, they are differentiated based on the morphological characteristics of haptoral bars, anchors, marginal hooks and reproductive organs (male and female copulatory organs). Landmarks (in the semi-automated model) and shape morphometric features (in the fully automated model) were extracted from four monogenean species images, which were then classified using k-nearest neighbour and artificial neural network. In semi-automated models, a classification accuracy of 96.67 % was obtained using the k-nearest neighbour and 97.5 % using the artificial neural network, whereas in fully automated models, a classification accuracy of 90 % was obtained using the k-nearest neighbour and 98.8 % using the artificial neural network. As for the crossvalidation, semi-automated models performed at 91.2 %, whereas fully automated models performed slightly higher at 93.75 %.
    Matched MeSH terms: Cluster Analysis
  12. Dalatu, Paul Inuwa, Habshah Midi
    MyJurnal
    Clustering is basically one of the major sources of primary data mining tools. It makes
    researchers understand the natural grouping of attributes in datasets. Clustering is an
    unsupervised classification method with the major aim of partitioning, where objects in the
    same cluster are similar, and objects which belong to different clusters vary significantly,
    with respect to their attributes. However, the classical Standardized Euclidean distance,
    which uses standard deviation to down weight maximum points of the ith features on the
    distance clusters, has been criticized by many scholars that the method produces outliers,
    lack robustness, and has 0% breakdown points. It also has low efficiency in normal
    distribution. Therefore, to remedy the problem, we suggest two statistical estimators
    which have 50% breakdown points namely the Sn and Qn estimators, with 58% and 82%
    efficiency, respectively. The proposed methods evidently outperformed the existing methods
    in down weighting the maximum points of the ith features in distance-based clustering
    analysis.
    Matched MeSH terms: Cluster Analysis
  13. Chin WC, Nadira Mohamed Isa, Nadira Mohamed Isa, Lee MC, Poo KH
    Sains Malaysiana, 2017;46:107-116.
    The heterogeneous autoregressive (HAR) models are used in modeling high frequency multipower realized volatility of the
    S&P 500 index. Extended from the standard realized volatility, the multipower realized volatility representations have
    the advantage of handling the possible abrupt jumps by smoothing the consecutive volatility. In order to accommodate
    clustering volatility and asymmetric of multipower realized volatility, the HAR model is extended by the threshold
    autoregressive conditional heteroscedastic (GJR-GARCH) component. In addition, the innovations of the multipower realized
    volatility are characterized by the skewed student-t distributions. The extended model provides the best performing insample
    and out-of-sample forecast evaluations.
    Matched MeSH terms: Cluster Analysis
  14. Soffian SSS, Nawi AM, Hod R, Chan HK, Hassan MRA
    PMID: 34639786 DOI: 10.3390/ijerph181910486
    The increasing pattern of colorectal cancer (CRC) in specific geographic region, compounded by interaction of multifactorial determinants, showed the tendency to cluster. The review aimed to identify and synthesize available evidence on clustering patterns of CRC incidence, specifically related to the associated determinants. Articles were systematically searched from four databases, Scopus, Web of Science, PubMed, and EBSCOHost. The approach for identification of the final articles follows PRISMA guidelines. Selected full-text articles were published between 2016 and 2021 of English language and spatial studies focusing on CRC cluster identification. Articles of systematic reviews, conference proceedings, book chapters, and reports were excluded. Of the final 12 articles, data on the spatial statistics used and associated factors were extracted. Identified factors linked with CRC cluster were further classified into ecology (health care accessibility, urbanicity, dirty streets, tree coverage), biology (age, sex, ethnicity, overweight and obesity, daily consumption of milk and fruit), and social determinants (median income level, smoking status, health cost, employment status, housing violations, and domestic violence). Future spatial studies that incorporate physical environment related to CRC cluster and the potential interaction between the ecology, biology and social determinants are warranted to provide more insights to the complex mechanism of CRC cluster pattern.
    Matched MeSH terms: Cluster Analysis
  15. Zulkepli NFS, Noorani MSM, Razak FA, Ismail M, Alias MA
    J Environ Manage, 2022 Mar 15;306:114434.
    PMID: 35065362 DOI: 10.1016/j.jenvman.2022.114434
    Haze has been a major issue afflicting Southeast Asian countries, including Malaysia, for the past few decades. Hierarchical agglomerative cluster analysis (HACA) is commonly used to evaluate the spatial behavior between areas in which pollutants interact. Typically, using HACA, the Euclidean distance acts as the dissimilarity measure and air quality monitoring stations are grouped according to this measure, thus revealing the most polluted areas. In this study, a framework for the hybridization of the HACA technique is proposed by considering the topological similarity (Wasserstein distance) between stations to evaluate the spatial patterns of the affected areas by haze episodes. For this, a tool in the topological data analysis (TDA), namely, persistent homology, is used to extract essential topological features hidden in the dataset. The performance of the proposed method is compared with that of traditional HACA and evaluated based on its ability to categorize areas according to the exceedance level of the particulate matter (PM10). Results show that additional topological features have yielded better accuracy compared to without the case that does not consider topological features. The cluster validity indices are computed to verify the results, and the proposed method outperforms the traditional method, suggesting a practical alternative approach for assessing the similarity in air pollution behaviors based on topological characterizations.
    Matched MeSH terms: Cluster Analysis
  16. Chan KW, Tan GH, Wong RC
    Sci Justice, 2012 Sep;52(3):136-41.
    PMID: 22841136 DOI: 10.1016/j.scijus.2012.04.006
    Statistical classification remains the most useful statistical tool for forensic chemists to assess the relationships between samples. Many clustering techniques such as principal component analysis and hierarchical cluster analysis have been employed to analyze chemical data for pattern recognition. Due to the feeble foundation of this statistics knowledge among novice drug chemists, a tetrahedron method was designed to simulate how advanced chemometrics operates. In this paper, the development of the graphical tetrahedron and computational matrices derived from the possible tetrahedrons are discussed. The tetrahedron method was applied to four selected parameters obtained from nine illicit heroin samples. Pattern analysis and mathematical computation of the differences in areas for assessing the dissimilarity between the nine tetrahedrons were found to be user-convenient and straightforward for novice cluster analysts.
    Matched MeSH terms: Cluster Analysis
  17. Zheng P, Belaton B, Liao IY, Rajion ZA
    PLoS One, 2017;12(11):e0187558.
    PMID: 29121077 DOI: 10.1371/journal.pone.0187558
    Landmarks, also known as feature points, are one of the important geometry primitives that describe the predominant characteristics of a surface. In this study we proposed a self-contained framework to generate landmarks on surfaces extracted from volumetric data. The framework is designed to be a three-fold pipeline structure. The pipeline comprises three phases which are surface construction, crest line extraction and landmark identification. With input as a volumetric data and output as landmarks, the pipeline takes in 3D raw data and produces a 0D geometry feature. In each phase we investigate existing methods, extend and tailor the methods to fit the pipeline design. The pipeline is designed to be functional as it is modularised to have a dedicated function in each phase. We extended the implicit surface polygonizer for surface construction in first phase, developed an alternative way to compute the gradient of maximal curvature for crest line extraction in second phase and finally we combine curvature information and K-means clustering method to identify the landmarks in the third phase. The implementations are firstly carried on a controlled environment, i.e. synthetic data, for proof of concept. Then the method is tested on a small scale data set and subsequently on huge data set. Issues and justifications are addressed accordingly for each phase.
    Matched MeSH terms: Cluster Analysis
  18. Nurul Adzlyana, M.S., Rosma, M.D., Nurazzah, A.R.
    MyJurnal
    Data mining processes such as clustering, classification, regression and outlier detection are developed based on similarity between two objects. Data mining processes of categorical data is found to be most challenging. Earlier similarity measures are context-free. In recent years, researchers have come up with context-sensitive similarity measure based on the relationships of objects. This paper provides an in-depth review of context-based similarity measures. Descriptions of algorithm for four context-based similarity measure, namely Association-based similarity measure, DILCA, CBDL and the hybrid context-based similarity measure, are described. Advantages and limitations of each context-based similarity measure are identified and explained. Context-based similarity measure is highly recommended for data-mining tasks for categorical data. The findings of this paper will help data miners in choosing appropriate similarity measures to achieve more accurate classification or clustering results.
    Matched MeSH terms: Cluster Analysis
  19. Abdul Rahman M, Sani NS, Hamdan R, Ali Othman Z, Abu Bakar A
    PLoS One, 2021;16(8):e0255312.
    PMID: 34339480 DOI: 10.1371/journal.pone.0255312
    The Multidimensional Poverty Index (MPI) is an income-based poverty index which measures multiple deprivations alongside other relevant factors to determine and classify poverty. The implementation of a reliable MPI is one of the significant efforts by the Malaysian government to improve measures in alleviating poverty, in line with the recent policy for Bottom 40 Percent (B40) group. However, using this measurement, only 0.86% of Malaysians are regarded as multidimensionally poor, and this measurement was claimed to be irrelevant for Malaysia as a country that has rapid economic development. Therefore, this study proposes a B40 clustering-based K-Means with cosine similarity architecture to identify the right indicators and dimensions that will provide data driven MPI measurement. In order to evaluate the approach, this study conducted extensive experiments on the Malaysian Census dataset. A series of data preprocessing steps were implemented, including data integration, attribute generation, data filtering, data cleaning, data transformation and attribute selection. The clustering model produced eight clusters of B40 group. The study included a comprehensive clustering analysis to meaningfully understand each of the clusters. The analysis discovered seven indicators of multidimensional poverty from three dimensions encompassing education, living standard and employment. Out of the seven indicators, this study proposed six indicators to be added to the current MPI to establish a more meaningful scenario of the current poverty trend in Malaysia. The outcomes from this study may help the government in properly identifying the B40 group who suffers from financial burden, which could have been currently misclassified.
    Matched MeSH terms: Cluster Analysis
  20. Mohd Romlay MR, Mohd Ibrahim A, Toha SF, De Wilde P, Venkat I
    PLoS One, 2021;16(8):e0256665.
    PMID: 34432855 DOI: 10.1371/journal.pone.0256665
    Low-end LiDAR sensor provides an alternative for depth measurement and object recognition for lightweight devices. However due to low computing capacity, complicated algorithms are incompatible to be performed on the device, with sparse information further limits the feature available for extraction. Therefore, a classification method which could receive sparse input, while providing ample leverage for the classification process to accurately differentiate objects within limited computing capability is required. To achieve reliable feature extraction from a sparse LiDAR point cloud, this paper proposes a novel Clustered Extraction and Centroid Based Clustered Extraction Method (CE-CBCE) method for feature extraction followed by a convolutional neural network (CNN) object classifier. The integration of the CE-CBCE and CNN methods enable us to utilize lightweight actuated LiDAR input and provides low computing means of classification while maintaining accurate detection. Based on genuine LiDAR data, the final result shows reliable accuracy of 97% through the method proposed.
    Matched MeSH terms: Cluster Analysis
Filters
Contact Us

Please provide feedback to Administrator (afdal@afpm.org.my)

External Links