MyMedR

Displaying all 5 publications

Abstract:

Sort:

An alternative data filling approach for prediction of missing data in soft sets (ADFIS)

Sadiq Khan M, Al-Garadi MA, Wahab AW, Herawan T

Springerplus, 2016;5(1):1348.
PMID: 27588241 DOI: 10.1186/s40064-016-2797-x

Soft set theory is a mathematical approach that provides solution for dealing with uncertain data. As a standard soft set, it can be represented as a Boolean-valued information system, and hence it has been used in hundreds of useful applications. Meanwhile, these applications become worthless if the Boolean information system contains missing data due to error, security or mishandling. Few researches exist that focused on handling partially incomplete soft set and none of them has high accuracy rate in prediction performance of handling missing data. It is shown that the data filling approach for incomplete soft set (DFIS) has the best performance among all previous approaches. However, in reviewing DFIS, accuracy is still its main problem. In this paper, we propose an alternative data filling approach for prediction of missing data in soft sets, namely ADFIS. The novelty of ADFIS is that, unlike the previous approach that used probability, we focus more on reliability of association among parameters in soft set. Experimental results on small, 04 UCI benchmark data and causality workbench lung cancer (LUCAP2) data shows that ADFIS performs better accuracy as compared to DFIS.
Fulltext Automatic ICD-10 multi-class classification of cause of death from plaintext autopsy reports through expert-driven feature selection

Mujtaba G, Shuib L, Raj RG, Rajandram R, Shaikh K, Al-Garadi MA

PLoS One, 2017;12(2):e0170242.
PMID: 28166263 DOI: 10.1371/journal.pone.0170242

OBJECTIVES: Widespread implementation of electronic databases has improved the accessibility of plaintext clinical information for supplementary use. Numerous machine learning techniques, such as supervised machine learning approaches or ontology-based approaches, have been employed to obtain useful information from plaintext clinical data. This study proposes an automatic multi-class classification system to predict accident-related causes of death from plaintext autopsy reports through expert-driven feature selection with supervised automatic text classification decision models.
METHODS: Accident-related autopsy reports were obtained from one of the largest hospital in Kuala Lumpur. These reports belong to nine different accident-related causes of death. Master feature vector was prepared by extracting features from the collected autopsy reports by using unigram with lexical categorization. This master feature vector was used to detect cause of death [according to internal classification of disease version 10 (ICD-10) classification system] through five automated feature selection schemes, proposed expert-driven approach, five subset sizes of features, and five machine learning classifiers. Model performance was evaluated using precisionM, recallM, F-measureM, accuracy, and area under ROC curve. Four baselines were used to compare the results with the proposed system.
RESULTS: Random forest and J48 decision models parameterized using expert-driven feature selection yielded the highest evaluation measure approaching (85% to 90%) for most metrics by using a feature subset size of 30. The proposed system also showed approximately 14% to 16% improvement in the overall accuracy compared with the existing techniques and four baselines.
CONCLUSION: The proposed system is feasible and practical to use for automatic classification of ICD-10-related cause of death from autopsy reports. The proposed system assists pathologists to accurately and rapidly determine underlying cause of death based on autopsy findings. Furthermore, the proposed expert-driven feature selection approach and the findings are generally applicable to other kinds of plaintext clinical reports.
Fulltext Classification of forensic autopsy reports through conceptual graph-based document representation model

Mujtaba G, Shuib L, Raj RG, Rajandram R, Shaikh K, Al-Garadi MA

J Biomed Inform, 2018 06;82:88-105.
PMID: 29738820 DOI: 10.1016/j.jbi.2018.04.013

Text categorization has been used extensively in recent years to classify plain-text clinical reports. This study employs text categorization techniques for the classification of open narrative forensic autopsy reports. One of the key steps in text classification is document representation. In document representation, a clinical report is transformed into a format that is suitable for classification. The traditional document representation technique for text categorization is the bag-of-words (BoW) technique. In this study, the traditional BoW technique is ineffective in classifying forensic autopsy reports because it merely extracts frequent but discriminative features from clinical reports. Moreover, this technique fails to capture word inversion, as well as word-level synonymy and polysemy, when classifying autopsy reports. Hence, the BoW technique suffers from low accuracy and low robustness unless it is improved with contextual and application-specific information. To overcome the aforementioned limitations of the BoW technique, this research aims to develop an effective conceptual graph-based document representation (CGDR) technique to classify 1500 forensic autopsy reports from four (4) manners of death (MoD) and sixteen (16) causes of death (CoD). Term-based and Systematized Nomenclature of Medicine-Clinical Terms (SNOMED CT) based conceptual features were extracted and represented through graphs. These features were then used to train a two-level text classifier. The first level classifier was responsible for predicting MoD. In addition, the second level classifier was responsible for predicting CoD using the proposed conceptual graph-based document representation technique. To demonstrate the significance of the proposed technique, its results were compared with those of six (6) state-of-the-art document representation techniques. Lastly, this study compared the effects of one-level classification and two-level classification on the experimental results. The experimental results indicated that the CGDR technique achieved 12% to 15% improvement in accuracy compared with fully automated document representation baseline techniques. Moreover, two-level classification obtained better results compared with one-level classification. The promising results of the proposed conceptual graph-based document representation technique suggest that pathologists can adopt the proposed system as their basis for second opinion, thereby supporting them in effectively determining CoD.
Fulltext Early survey with bibliometric analysis on machine learning approaches in controlling COVID-19 outbreaks

Chiroma H, Ezugwu AE, Jauro F, Al-Garadi MA, Abdullahi IN, Shuib L

PeerJ Comput Sci, 2020;6:e313.
PMID: 33816964 DOI: 10.7717/peerj-cs.313

Background and Objective: The COVID-19 pandemic has caused severe mortality across the globe, with the USA as the current epicenter of the COVID-19 epidemic even though the initial outbreak was in Wuhan, China. Many studies successfully applied machine learning to fight COVID-19 pandemic from a different perspective. To the best of the authors' knowledge, no comprehensive survey with bibliometric analysis has been conducted yet on the adoption of machine learning to fight COVID-19. Therefore, the main goal of this study is to bridge this gap by carrying out an in-depth survey with bibliometric analysis on the adoption of machine learning-based technologies to fight COVID-19 pandemic from a different perspective, including an extensive systematic literature review and bibliometric analysis.
Methods: We applied a literature survey methodology to retrieved data from academic databases and subsequently employed a bibliometric technique to analyze the accessed records. Besides, the concise summary, sources of COVID-19 datasets, taxonomy, synthesis and analysis are presented in this study. It was found that the Convolutional Neural Network (CNN) is mainly utilized in developing COVID-19 diagnosis and prognosis tools, mostly from chest X-ray and chest CT scan images. Similarly, in this study, we performed a bibliometric analysis of machine learning-based COVID-19 related publications in the Scopus and Web of Science citation indexes. Finally, we propose a new perspective for solving the challenges identified as direction for future research. We believe the survey with bibliometric analysis can help researchers easily detect areas that require further development and identify potential collaborators.
Results: The findings of the analysis presented in this article reveal that machine learning-based COVID-19 diagnose tools received the most considerable attention from researchers. Specifically, the analyses of results show that energy and resources are more dispenses towards COVID-19 automated diagnose tools while COVID-19 drugs and vaccine development remains grossly underexploited. Besides, the machine learning-based algorithm that is predominantly utilized by researchers in developing the diagnostic tool is CNN mainly from X-rays and CT scan images.
Conclusions: The challenges hindering practical work on the application of machine learning-based technologies to fight COVID-19 and new perspective to solve the identified problems are presented in this article. Furthermore, we believed that the presented survey with bibliometric analysis could make it easier for researchers to identify areas that need further development and possibly identify potential collaborators at author, country and institutional level, with the overall aim of furthering research in the focused area of machine learning application to disease control.
Fulltext Using online social networks to track a pandemic: A systematic review

Al-Garadi MA, Khan MS, Varathan KD, Mujtaba G, Al-Kabsi AM

J Biomed Inform, 2016 08;62:1-11.
PMID: 27224846 DOI: 10.1016/j.jbi.2016.05.005

BACKGROUND: The popularity and proliferation of online social networks (OSNs) have created massive social interaction among users that generate an extensive amount of data. An OSN offers a unique opportunity for studying and understanding social interaction and communication among far larger populations now more than ever before. Recently, OSNs have received considerable attention as a possible tool to track a pandemic because they can provide an almost real-time surveillance system at a less costly rate than traditional surveillance systems.
METHODS: A systematic literature search for studies with the primary aim of using OSN to detect and track a pandemic was conducted. We conducted an electronic literature search for eligible English articles published between 2004 and 2015 using PUBMED, IEEExplore, ACM Digital Library, Google Scholar, and Web of Science. First, the articles were screened on the basis of titles and abstracts. Second, the full texts were reviewed. All included studies were subjected to quality assessment.
RESULT: OSNs have rich information that can be utilized to develop an almost real-time pandemic surveillance system. The outcomes of OSN surveillance systems have demonstrated high correlations with the findings of official surveillance systems. However, the limitation in using OSN to track pandemic is in collecting representative data with sufficient population coverage. This challenge is related to the characteristics of OSN data. The data are dynamic, large-sized, and unstructured, thus requiring advanced algorithms and computational linguistics.
CONCLUSIONS: OSN data contain significant information that can be used to track a pandemic. Different from traditional surveys and clinical reports, in which the data collection process is time consuming at costly rates, OSN data can be collected almost in real time at a cheaper cost. Additionally, the geographical and temporal information can provide exploratory analysis of spatiotemporal dynamics of infectious disease spread. However, on one hand, an OSN-based surveillance system requires comprehensive adoption, enhanced geographical identification system, and advanced algorithms and computational linguistics to eliminate its limitations and challenges. On the other hand, OSN is probably to never replace traditional surveillance, but it can offer complementary data that can work best when integrated with traditional data.