MyMedR

Displaying all 10 publications

Abstract:

Sort:

Fulltext An empirical study on Resource Description Framework reification for trustworthiness in knowledge graphs

Govindapillai S, Soon LK, Haw SC

F1000Res, 2021;10:881.
PMID: 34900233 DOI: 10.12688/f1000research.72843.2

Knowledge graph (KG) publishes machine-readable representation of knowledge on the Web. Structured data in the knowledge graph is published using Resource Description Framework (RDF) where knowledge is represented as a triple (subject, predicate, object). Due to the presence of erroneous, outdated or conflicting data in the knowledge graph, the quality of facts cannot be guaranteed. Trustworthiness of facts in knowledge graph can be enhanced by the addition of metadata like the source of information, location and time of the fact occurrence. Since RDF does not support metadata for providing provenance and contextualization, an alternate method, RDF reification is employed by most of the knowledge graphs. RDF reification increases the magnitude of data as several statements are required to represent a single fact. Another limitation for applications that uses provenance data like in the medical domain and in cyber security is that not all facts in these knowledge graphs are annotated with provenance data. In this paper, we have provided an overview of prominent reification approaches together with the analysis of popular, general knowledge graphs Wikidata and YAGO4 with regard to the representation of provenance and context data. Wikidata employs qualifiers to include metadata to facts, while YAGO4 collects metadata from Wikidata qualifiers. However, facts in Wikidata and YAGO4 can be fetched without using reification to cater for applications that do not require metadata. To the best of our knowledge, this is the first paper that investigates the method and the extent of metadata covered by two prominent KGs, Wikidata and YAGO4.
Fulltext A hybrid recommender system based on data enrichment on the ontology modelling

Chew LJ, Haw SC, Subramaniam S

F1000Res, 2021;10:937.
PMID: 34868563 DOI: 10.12688/f1000research.73060.1

Background: A recommender system captures the user preferences and behaviour to provide a relevant recommendation to the user. In a hybrid model-based recommender system, it requires a pre-trained data model to generate recommendations for a user. Ontology helps to represent the semantic information and relationships to model the expressivity and linkage among the data. Methods: We enhanced the matrix factorization model accuracy by utilizing ontology to enrich the information of the user-item matrix by integrating the item-based and user-based collaborative filtering techniques. In particular, the combination of enriched data, which consists of semantic similarity together with rating pattern, will help to reduce the cold start problem in the model-based recommender system. When the new user or item first coming into the system, we have the user demographic or item profile that linked to our ontology. Thus, semantic similarity can be calculated during the item-based and user-based collaborating filtering process. The item-based and user-based filtering process are used to predict the unknown rating of the original matrix. Results: Experimental evaluations have been carried out on the MovieLens 100k dataset to demonstrate the accuracy rate of our proposed approach as compared to the baseline method using (i) Singular Value Decomposition (SVD) and (ii) combination of item-based collaborative filtering technique with SVD. Experimental results demonstrated that our proposed method has reduced the data sparsity from 0.9542% to 0.8435%. In addition, it also indicated that our proposed method has achieved better accuracy with Root Mean Square Error (RMSE) of 0.9298, as compared to the baseline method (RMSE: 0.9642) and the existing method (RMSE: 0.9492). Conclusions: Our proposed method enhanced the dataset information by integrating user-based and item-based collaborative filtering techniques. The experiment results shows that our system has reduced the data sparsity and has better accuracy as compared to baseline method and existing method.
Neural matrix factorization++ based recommendation system

Ong K, Ng KW, Haw SC

F1000Res, 2021;10:1079.
PMID: 38550618 DOI: 10.12688/f1000research.73240.1

In recent years, Recommender System (RS) research work has covered a wide variety of Artificial Intelligence techniques, ranging from traditional Matrix Factorization (MF) to complex Deep Neural Networks (DNN). Traditional Collaborative Filtering (CF) recommendation methods such as MF, have limited learning capabilities as it only considers the linear combination between user and item vectors. For learning non-linear relationships, methods like Neural Collaborative Filtering (NCF) incorporate DNN into CF methods. Though, CF methods still suffer from cold start and data sparsity. This paper proposes an improved hybrid-based RS, namely Neural Matrix Factorization++ (NeuMF++), for effectively learning user and item features to improve recommendation accuracy and alleviate cold start and data sparsity. NeuMF++ is proposed by incorporating effective latent representation into NeuMF via Stacked Denoising Autoencoders (SDAE). NeuMF++ can also be seen as the fusion of GMF++ and MLP++. NeuMF is an NCF framework which associates with GMF (Generalized Matrix Factorization) and MLP (Multilayer Perceptrons). NeuMF achieves state-of-the-art results due to the integration of GMF linearity and MLP non-linearity. Concurrently, incorporating latent representations has shown tremendous improvement in GMF and MLP, which result in GMF++ and MLP++. Latent representation obtained through the SDAEs' latent space allows NeuMF++ to effectively learn user and item features, significantly enhancing its learning capability. However, sharing feature extractions among GMF++ and MLP++ in NeuMF++ might hinder its performance. Hence, allowing GMF++ and MLP++ to learn separate features provides more flexibility and greatly improves its performance. Experiments performed on a real-world dataset have demonstrated that NeuMF++ achieves an outstanding result of a test root-mean-square error of 0.8681. In future work, we can extend NeuMF++ by introducing other auxiliary information like text or images. Different neural network building blocks can also be integrated into NeuMF++ to form a more robust recommendation model.
Enhancing accessibility: Development and usability testing of mobile application mitigating sexual harassment for visually impaired masseurs

Agus Santoso H, Haw SC, Dewi NS

Assist Technol, 2024 Nov 12.
PMID: 39531059 DOI: 10.1080/10400435.2024.2423605

This study aimed to assess the development and usability of the Visually Impaired Masseur Assistance Application (VIMAA) designed to respond to signs of danger or instances of sexual harassment experienced by Visually Impaired Masseurs (VIMs). It harmonizes Rapid Application Development (RAD) method and qualitative in-depth interviews. RAD was implemented with emphasis on four core stages: requirement identification, design workshop, construction, and implementation, while Qualitative in-depth interviews were conducted utilizing thematic analysis for usability testing. Functionality testing also verifies the effectiveness of VIMAA features while requesting help, notification, and feedback. Pre-test identified four themes, including traumatic experiences and the need for protection. Post-test revealed themes such as ease of requesting assistance and switching to speech mode. VIMs perceive that VIMAA as user-friendly, practical, and acceptable. The requesting help, notification, and feedback features also work well. This study presents the effectiveness of VIMAA in establishing a framework that is accessible by a diverse spectrum of VIMs. The insights derived from this research also furnish valuable perspectives on the preferences of users reliant on mobile applications designed for VIMs, thus providing significant impetus for future research and development endeavors in this domain.
Fulltext Shirt-color recognition for the color-blindness

Wong QQ, Ng KW, Haw SC

MethodsX, 2024 Dec;13:102866.
PMID: 39157818 DOI: 10.1016/j.mex.2024.102866

Color-blind is a generic disability whereby the affected individuals are not given the opportunity to benefit from the various functions provided by color that would impact humans physically and psychologically. Although this disability is not fatal, it brought plenty of turbulence in the affected individuals' daily activities. This paper aims to develop a system for recognizing and detecting colors of clothes in images, improve accuracy by using advanced algorithms to handle lighting variations, and provide color matching recommendations to assist color-blind individuals in making informed choices when purchasing shirts. The proposed methodology for color recognition involves:•retrieving the RGB values of a given point from the input image and converting them into HSV values.•creating web application integrated with a machine learning model to classify and predict the corresponding color based on the HSV values.•predicting the color name with suggestions of matching colors will be displayed on the interface.
Fulltext Improving the support for XML dynamic updates using a hybridization labeling scheme (ORD-GAP)

Haw SC, Amin A, Wong CO, Subramaniam S

F1000Res, 2021;10:907.
PMID: 35106138 DOI: 10.12688/f1000research.69108.1

Background : As the standard for the exchange of data over the World Wide Web, it is important to ensure that the eXtensible Markup Language (XML) database is capable of supporting not only efficient query processing but also capable of enduring frequent data update operations over the dynamic changes of Web content. Most of the existing XML annotation is based on a labeling scheme to identify each hierarchical position of the XML nodes. This computation is costly as any updates will cause the whole XML tree to be re-labelled. This impact can be observed on large datasets. Therefore, a robust labeling scheme that avoids re-labeling is crucial. Method: Here, we present ORD-GAP (named after Order Gap), a robust and persistent XML labeling scheme that supports dynamic updates. ORD-GAP assigns unique identifiers with gaps in-between XML nodes, which could easily identify the level, Parent-Child (P-C), Ancestor-Descendant (A-D) and sibling relationship. ORD-GAP adopts the OrdPath labeling scheme for any future insertion. Results: We demonstrate that ORD-GAP is robust enough for dynamic updates, and have implemented it in three use cases: (i) left-most, (ii) in-between and (iii) right-most insertion. Experimental evaluations on DBLP dataset demonstrated that ORD-GAP outperformed existing approaches such as ORDPath and ME Labeling concerning database storage size, data loading time and query retrieval. On average, ORD-GAP has the best storing and query retrieval time. Conclusion: The main contributions of this paper are: (i) A robust labeling scheme named ORD-GAP that assigns certain gap between each node to support future insertion, and (ii) An efficient mapping scheme, which built upon ORD-GAP labeling scheme to transform XML into RDB effectively.
Fulltext Machine learning methods to predict particulate matter PM 2.5

Palanichamy N, Haw SC, S S, Murugan R, Govindasamy K

F1000Res, 2022;11:406.
PMID: 36531254 DOI: 10.12688/f1000research.73166.1

Introduction Pollution of air in urban cities across the world has been steadily increasing in recent years. An increasing trend in particulate matter, PM 2.5, is a threat because it can lead to uncontrollable consequences like worsening of asthma and cardiovascular disease. The metric used to measure air quality is the air pollutant index (API). In Malaysia, machine learning (ML) techniques for PM 2.5 have received less attention as the concentration is on predicting other air pollutants. To fill the research gap, this study focuses on correctly predicting PM 2.5 concentrations in the smart cities of Malaysia by comparing supervised ML techniques, which helps to mitigate its adverse effects. Methods In this paper, ML models for forecasting PM 2.5 concentrations were investigated on Malaysian air quality data sets from 2017 to 2018. The dataset was preprocessed by data cleaning and a normalization process. Next, it was reduced into an informative dataset with location and time factors in the feature extraction process. The dataset was fed into three supervised ML classifiers, which include random forest (RF), artificial neural network (ANN) and long short-term memory (LSTM). Finally, their output was evaluated using the confusion matrix and compared to identify the best model for the accurate prediction of PM 2.5. Results Overall, the experimental result shows an accuracy of 97.7% was obtained by the RF model in comparison with the accuracy of ANN (61.14%) and LSTM (61.77%) in predicting PM 2.5. Discussion RF performed well when compared with ANN and LSTM for the given data with minimum features. RF was able to reach good accuracy as the model learns from the random samples by using decision tree with the maximum vote on the predictions.
Fulltext Improving the data access control using blockchain for healthcare domain

Tahir Yinka O, Haw SC, Yap TTV, Subramaniam S

F1000Res, 2021;10:901.
PMID: 34858590 DOI: 10.12688/f1000research.72890.3

Introduction: Unauthorized access to data is one of the most significant privacy issues that hinder most industries from adopting big data technologies. Even though specific processes and structures have been put in place to deal with access authorization and identity management for large databases nonetheless, the scalability criteria are far beyond the capabilities of traditional databases. Hence, most researchers are looking into other solutions, such as big data management. Methods: In this paper, we firstly study the strengths and weaknesses of implementing cryptography and blockchain for identity management and authorization control in big data, focusing on the healthcare domain. Subsequently, we propose a decentralized data access and sharing system that preserves privacy to ensure adequate data access management under the blockchain. In addition, we designed a blockchain framework to resolve the decentralized data access and sharing system privacy issues, by implementing a public key infrastructure model, which utilizes a signature cryptography algorithm (elliptic curve and signcryption). Lastly, we compared the proposed blockchain model to previous techniques to see how well it performed. Results: We evaluated the blockchain on four performance metrics which include throughput, latency, scalability, and security. The proposed blockchain model was tested using a sample of 5000 patients and 500,000 observations. The performance evaluation results further showed that the proposed model achieves higher throughput and lower latency compared to existing approaches when the workload varies up to 10,000 transactions. Discussion: This research reviews the importance of blockchains as they provide infinite possibilities to individuals, companies, and governments.
Fulltext IM- LTS: An Integrated Model for Lung Tumor Segmentation using Neural Networks and IoMT

J J, Haw SC, Palanichamy N, Ng KW, Thillaigovindhan SK

MethodsX, 2025 Jun;14:103201.
PMID: 40026592 DOI: 10.1016/j.mex.2025.103201

In recent days, Internet of Medical Things (IoMT) and Deep Learning (DL) techniques are broadly used in medical data processing in decision-making. A lung tumour, one of the most dangerous medical diseases, requires early diagnosis with a higher precision rate. With that concern, this work aims to develop an Integrated Model (IM- LTS) for Lung Tumor Segmentation using Neural Networks (NN) and the Internet of Medical Things (IoMT). The model integrates two architectures, MobileNetV2 and U-NET, for classifying the input lung data. The input CT lung images are pre-processed using Z-score Normalization. The semantic features of lung images are extracted based on texture, intensity, and shape to provide information to the training network.•In this work, the transfer learning technique is incorporated, and the pre-trained NN was used as an encoder for the U-NET model for segmentation. Furthermore, Support Vector Machine is used here to classify input lung data as benign and malignant.•The results are measured based on the metrics such as, specificity, sensitivity, precision, accuracy and F-Score, using the data from benchmark datasets. Compared to the existing lung tumor segmentation and classification models, the proposed model provides better results and evidence for earlier disease diagnosis.
Fulltext EMI-LTI: An enhanced integrated model for lung tumor identification using Gabor filter and ROI

J J, Haw SC, Palanichamy N, Ng KW, Aneja M, Taiyab A

MethodsX, 2025 Jun;14:103247.
PMID: 40124330 DOI: 10.1016/j.mex.2025.103247

In this work, the CT scans images of lung cancer patients are analysed to diagnose the disease at its early stage. The images are pre-processed using a series of steps such as the Gabor filter, contours to label the region of interest (ROI), increasing the sharpening and cropping of the image. Data augmentation is employed on the pre-processed images using two proposed architectures, namely (1) Convolutional Neural Network (CNN) and (2) Enhanced Integrated model for Lung Tumor Identification (EIM-LTI).•In this study, comparisons are made on non-pre-processed data, Haar and Gabor filters in CNN and the EIM-LTI models. The performance of the CNN and EIM-LTI models is evaluated through metrics such as precision, sensitivity, F1-score, specificity, training and validation accuracy.•The EIM-LTI model's training accuracy is 2.67 % higher than CNN, while its validation accuracy is 2.7 % higher. Additionally, the EIM-LTI model's validation loss is 0.0333 higher than CNN's.•In this study, a comparative analysis of model accuracies for lung cancer detection is performed. Cross-validation with 5 folds achieves an accuracy of 98.27 %, and the model was evaluated on unseen data and resulted in 92 % accuracy.

Filters

Please provide feedback to Administrator (afdal@afpm.org.my)

External Links