Displaying all 8 publications

Abstract:
Sort:
  1. Eke CI, Norman AA, Shuib L
    PLoS One, 2021;16(6):e0252918.
    PMID: 34111192 DOI: 10.1371/journal.pone.0252918
    Sarcasm is the main reason behind the faulty classification of tweets. It brings a challenge in natural language processing (NLP) as it hampers the method of finding people's actual sentiment. Various feature engineering techniques are being investigated for the automatic detection of sarcasm. However, most related techniques have always concentrated only on the content-based features in sarcastic expression, leaving the contextual information in isolation. This leads to a loss of the semantics of words in the sarcastic expression. Another drawback is the sparsity of the training data. Due to the word limit of microblog, the feature vector's values for each sample constructed by BoW produces null features. To address the above-named problems, a Multi-feature Fusion Framework is proposed using two classification stages. The first stage classification is constructed with the lexical feature only, extracted using the BoW technique, and trained using five standard classifiers, including SVM, DT, KNN, LR, and RF, to predict the sarcastic tendency. In stage two, the constructed lexical sarcastic tendency feature is fused with eight other proposed features for modelling a context to obtain a final prediction. The effectiveness of the developed framework is tested with various experimental analysis to obtain classifiers' performance. The evaluation shows that our constructed classification models based on the developed novel feature fusion obtained results with a precision of 0.947 using a Random Forest classifier. Finally, the obtained results were compared with the results of three baseline approaches. The comparison outcome shows the significance of the proposed framework.
  2. Mujtaba G, Shuib L, Raj RG, Rajandram R, Shaikh K
    J Forensic Leg Med, 2018 Jul;57:41-50.
    PMID: 29801951 DOI: 10.1016/j.jflm.2017.07.001
    OBJECTIVES: Automatic text classification techniques are useful for classifying plaintext medical documents. This study aims to automatically predict the cause of death from free text forensic autopsy reports by comparing various schemes for feature extraction, term weighing or feature value representation, text classification, and feature reduction.

    METHODS: For experiments, the autopsy reports belonging to eight different causes of death were collected, preprocessed and converted into 43 master feature vectors using various schemes for feature extraction, representation, and reduction. The six different text classification techniques were applied on these 43 master feature vectors to construct a classification model that can predict the cause of death. Finally, classification model performance was evaluated using four performance measures i.e. overall accuracy, macro precision, macro-F-measure, and macro recall.

    RESULTS: From experiments, it was found that that unigram features obtained the highest performance compared to bigram, trigram, and hybrid-gram features. Furthermore, in feature representation schemes, term frequency, and term frequency with inverse document frequency obtained similar and better results when compared with binary frequency, and normalized term frequency with inverse document frequency. Furthermore, the chi-square feature reduction approach outperformed Pearson correlation, and information gain approaches. Finally, in text classification algorithms, support vector machine classifier outperforms random forest, Naive Bayes, k-nearest neighbor, decision tree, and ensemble-voted classifier.

    CONCLUSION: Our results and comparisons hold practical importance and serve as references for future works. Moreover, the comparison outputs will act as state-of-art techniques to compare future proposals with existing automated text classification techniques.

  3. Koutzampasopoulou Xanthidou O, Shuib L, Xanthidis D, Nicholas D
    PMID: 29857585 DOI: 10.3390/ijerph15061137
    An Electronic Medical Record (EMR) is a patient's database record that can be transmitted securely. There are a diversity of EMR systems for different medical units to choose from. The structure and value of these systems is the focus of this qualitative study, from a medical professional's standpoint, as well as its economic value and whether it should be shared between health organizations. The study took place in the natural setting of the medical units' environments. A purposive sample of 40 professionals in Greece and Oman, was interviewed. The study suggests that: (1) The demographics of the EMR should be divided in categories, not all of them accessible and/or visible by all; (2) The EMR system should follow an open architecture so that more categories and subcategories can be added as needed and following a possible business plan (ERD is suggested); (3) The EMR should be implemented gradually bearing in mind both medical and financial concerns; (4) Sharing should be a patient's decision as the owner of the record. Reaching a certain level of maturity of its implementation and utilization, it is useful to seek the professionals' assessment on the structure and value of such a system.
  4. Murtaza G, Abdul Wahab AW, Raza G, Shuib L
    Comput Med Imaging Graph, 2021 04;89:101870.
    PMID: 33545489 DOI: 10.1016/j.compmedimag.2021.101870
    Worldwide, the burden of cancer is drastically increasing over the past few years. Among all types of cancers in women, breast cancer (BrC) is the main cause of unnatural deaths. For early diagnosis, histopathology (Hp) imaging is a gold standard for positive and detailed (at tissue level) diagnosis of breast tumor (BrT) compared to mammogram images. A large number of studies used BrT Hp images to solve binary or multiclassification problems using high computational resources. However, classification models' performance may be compromised due to the high correlation among various types of BrT in Hp images, which raises the misclassification rate. Thus, this paper aims to develop a tree-based BrT multiclassification model via deep learning (DL) to extract discriminative features to solve the multiclassification problem with better performance using less computational resources. The main contributions of this work are to create an ensemble, tree-based DL model that is pre-trained on the BreakHis dataset, and implementation of a misclassification reduction algorithm. The ensemble, tree-based DL model, extracts discriminative BrT features from Hp images. The target dataset (i.e., Bioimaging challenge 2015 breast histology) is small in size; thus, to avoid overfitting of the proposed model, pretraining is performed on the BreakHis dataset. Whereas, misclassification reduction algorithm is implemented to enhance the performance of the classification model. The experimental results show that the proposed model outperformed the existing state-of-the-art baseline studies. The achieved classification accuracy is ranging from 87.50 % to 100 % for four subtypes of BrT. Thus, the proposed model can assist doctors as the second opinion in any healthcare centre.
  5. Mujtaba G, Shuib L, Raj RG, Rajandram R, Shaikh K, Al-Garadi MA
    J Biomed Inform, 2018 06;82:88-105.
    PMID: 29738820 DOI: 10.1016/j.jbi.2018.04.013
    Text categorization has been used extensively in recent years to classify plain-text clinical reports. This study employs text categorization techniques for the classification of open narrative forensic autopsy reports. One of the key steps in text classification is document representation. In document representation, a clinical report is transformed into a format that is suitable for classification. The traditional document representation technique for text categorization is the bag-of-words (BoW) technique. In this study, the traditional BoW technique is ineffective in classifying forensic autopsy reports because it merely extracts frequent but discriminative features from clinical reports. Moreover, this technique fails to capture word inversion, as well as word-level synonymy and polysemy, when classifying autopsy reports. Hence, the BoW technique suffers from low accuracy and low robustness unless it is improved with contextual and application-specific information. To overcome the aforementioned limitations of the BoW technique, this research aims to develop an effective conceptual graph-based document representation (CGDR) technique to classify 1500 forensic autopsy reports from four (4) manners of death (MoD) and sixteen (16) causes of death (CoD). Term-based and Systematized Nomenclature of Medicine-Clinical Terms (SNOMED CT) based conceptual features were extracted and represented through graphs. These features were then used to train a two-level text classifier. The first level classifier was responsible for predicting MoD. In addition, the second level classifier was responsible for predicting CoD using the proposed conceptual graph-based document representation technique. To demonstrate the significance of the proposed technique, its results were compared with those of six (6) state-of-the-art document representation techniques. Lastly, this study compared the effects of one-level classification and two-level classification on the experimental results. The experimental results indicated that the CGDR technique achieved 12% to 15% improvement in accuracy compared with fully automated document representation baseline techniques. Moreover, two-level classification obtained better results compared with one-level classification. The promising results of the proposed conceptual graph-based document representation technique suggest that pathologists can adopt the proposed system as their basis for second opinion, thereby supporting them in effectively determining CoD.
  6. Mujtaba G, Shuib L, Raj RG, Rajandram R, Shaikh K, Al-Garadi MA
    PLoS One, 2017;12(2):e0170242.
    PMID: 28166263 DOI: 10.1371/journal.pone.0170242
    OBJECTIVES: Widespread implementation of electronic databases has improved the accessibility of plaintext clinical information for supplementary use. Numerous machine learning techniques, such as supervised machine learning approaches or ontology-based approaches, have been employed to obtain useful information from plaintext clinical data. This study proposes an automatic multi-class classification system to predict accident-related causes of death from plaintext autopsy reports through expert-driven feature selection with supervised automatic text classification decision models.

    METHODS: Accident-related autopsy reports were obtained from one of the largest hospital in Kuala Lumpur. These reports belong to nine different accident-related causes of death. Master feature vector was prepared by extracting features from the collected autopsy reports by using unigram with lexical categorization. This master feature vector was used to detect cause of death [according to internal classification of disease version 10 (ICD-10) classification system] through five automated feature selection schemes, proposed expert-driven approach, five subset sizes of features, and five machine learning classifiers. Model performance was evaluated using precisionM, recallM, F-measureM, accuracy, and area under ROC curve. Four baselines were used to compare the results with the proposed system.

    RESULTS: Random forest and J48 decision models parameterized using expert-driven feature selection yielded the highest evaluation measure approaching (85% to 90%) for most metrics by using a feature subset size of 30. The proposed system also showed approximately 14% to 16% improvement in the overall accuracy compared with the existing techniques and four baselines.

    CONCLUSION: The proposed system is feasible and practical to use for automatic classification of ICD-10-related cause of death from autopsy reports. The proposed system assists pathologists to accurately and rapidly determine underlying cause of death based on autopsy findings. Furthermore, the proposed expert-driven feature selection approach and the findings are generally applicable to other kinds of plaintext clinical reports.

  7. Chiroma H, Ezugwu AE, Jauro F, Al-Garadi MA, Abdullahi IN, Shuib L
    PeerJ Comput Sci, 2020;6:e313.
    PMID: 33816964 DOI: 10.7717/peerj-cs.313
    Background and Objective: The COVID-19 pandemic has caused severe mortality across the globe, with the USA as the current epicenter of the COVID-19 epidemic even though the initial outbreak was in Wuhan, China. Many studies successfully applied machine learning to fight COVID-19 pandemic from a different perspective. To the best of the authors' knowledge, no comprehensive survey with bibliometric analysis has been conducted yet on the adoption of machine learning to fight COVID-19. Therefore, the main goal of this study is to bridge this gap by carrying out an in-depth survey with bibliometric analysis on the adoption of machine learning-based technologies to fight COVID-19 pandemic from a different perspective, including an extensive systematic literature review and bibliometric analysis.

    Methods: We applied a literature survey methodology to retrieved data from academic databases and subsequently employed a bibliometric technique to analyze the accessed records. Besides, the concise summary, sources of COVID-19 datasets, taxonomy, synthesis and analysis are presented in this study. It was found that the Convolutional Neural Network (CNN) is mainly utilized in developing COVID-19 diagnosis and prognosis tools, mostly from chest X-ray and chest CT scan images. Similarly, in this study, we performed a bibliometric analysis of machine learning-based COVID-19 related publications in the Scopus and Web of Science citation indexes. Finally, we propose a new perspective for solving the challenges identified as direction for future research. We believe the survey with bibliometric analysis can help researchers easily detect areas that require further development and identify potential collaborators.

    Results: The findings of the analysis presented in this article reveal that machine learning-based COVID-19 diagnose tools received the most considerable attention from researchers. Specifically, the analyses of results show that energy and resources are more dispenses towards COVID-19 automated diagnose tools while COVID-19 drugs and vaccine development remains grossly underexploited. Besides, the machine learning-based algorithm that is predominantly utilized by researchers in developing the diagnostic tool is CNN mainly from X-rays and CT scan images.

    Conclusions: The challenges hindering practical work on the application of machine learning-based technologies to fight COVID-19 and new perspective to solve the identified problems are presented in this article. Furthermore, we believed that the presented survey with bibliometric analysis could make it easier for researchers to identify areas that need further development and possibly identify potential collaborators at author, country and institutional level, with the overall aim of furthering research in the focused area of machine learning application to disease control.

  8. Chiroma H, Abdul-kareem S, Khan A, Nawi NM, Gital AY, Shuib L, et al.
    PLoS One, 2015;10(8):e0136140.
    PMID: 26305483 DOI: 10.1371/journal.pone.0136140
    Global warming is attracting attention from policy makers due to its impacts such as floods, extreme weather, increases in temperature by 0.7°C, heat waves, storms, etc. These disasters result in loss of human life and billions of dollars in property. Global warming is believed to be caused by the emissions of greenhouse gases due to human activities including the emissions of carbon dioxide (CO2) from petroleum consumption. Limitations of the previous methods of predicting CO2 emissions and lack of work on the prediction of the Organization of the Petroleum Exporting Countries (OPEC) CO2 emissions from petroleum consumption have motivated this research.
Related Terms
Filters
Contact Us

Please provide feedback to Administrator (afdal@afpm.org.my)

External Links