Word Sense Disambiguation (WSD) is the task of determining which sense of an ambiguous word (word with multiple meanings) is chosen in a particular use of that word, by considering its context. A sentence is considered ambiguous if it contains ambiguous word(s). Practically, any sentence that has been classified as ambiguous usually has multiple interpretations, but just one of them presents the correct interpretation. We propose an unsupervised method that exploits knowledge based approaches for word sense disambiguation using Harmony Search Algorithm (HSA) based on a Stanford dependencies generator (HSDG). The role of the dependency generator is to parse sentences to obtain their dependency relations. Whereas, the goal of using the HSA is to maximize the overall semantic similarity of the set of parsed words. HSA invokes a combination of semantic similarity and relatedness measurements, i.e., Jiang and Conrath (jcn) and an adapted Lesk algorithm, to perform the HSA fitness function. Our proposed method was experimented on benchmark datasets, which yielded results comparable to the state-of-the-art WSD methods. In order to evaluate the effectiveness of the dependency generator, we perform the same methodology without the parser, but with a window of words. The empirical results demonstrate that the proposed method is able to produce effective solutions for most instances of the datasets used.
Matched MeSH terms: Language*; Natural Language Processing*
A review of the questionnaire was carried out basically to assess the relevance of the questions to the objectives of the study, to identify weaknesses of the questions particularly in terms of the wording in order to make them as clear as possible to the respondents and to minimize ambiguity and thus the problems of getting the questions across to the respondents. Based on the review a new set of questionnaire would be proposed. The review thus focuses on two major aspects namely the structure and the content of the questionnaire. From the structural aspects each question was reviewed in terms of the language, wording, sequencing and continuity between one another. Basically, not much problems have been identified except in certain cases of ambiguity largely due to language and words used and some cases lack of continuity due to improper sequencing of the questions In terms of the content, for each questions, the purpose of asking, and what is expected of the questions was thoroughly examined and then the relevance assessed. Based on the analysis, three group of questions were identified i.e., the irrelevant questions, the partially relevant and most inlportant non-existence of many relevant questions. It is recommended that the irrelevant questions be omitted, those partially relevant to be modified and new questions added.
Unified Modeling Language is the most popular and widely used Object-Oriented modelling language in the IT industry. This study focuses on investigating the ability to expand UML to some extent to model crosscutting concerns (Aspects) to support AspectJ. Through a comprehensive literature review, we identify and extensively examine all the available Aspect-Oriented UML modelling approaches and find that the existing Aspect-Oriented Design Modelling approaches using UML cannot be considered to provide a framework for a comprehensive Aspectual UML modelling approach and also that there is a lack of adequate Aspect-Oriented tool support. This study also proposes a set of Aspectual UML semantic rules and attempts to generate AspectJ pseudocode from UML diagrams. The proposed Aspectual UML modelling approach is formally evaluated using a focus group to test six hypotheses regarding performance; a "good design" criteria-based evaluation to assess the quality of the design; and an AspectJ-based evaluation as a reference measurement-based evaluation. The results of the focus group evaluation confirm all the hypotheses put forward regarding the proposed approach. The proposed approach provides a comprehensive set of Aspectual UML structural and behavioral diagrams, which are designed and implemented based on a comprehensive and detailed set of AspectJ programming constructs.
Spoken Language Identification (LID) is the process of determining and classifying natural language from a given content and dataset. Typically, data must be processed to extract useful features to perform LID. The extracting features for LID, based on literature, is a mature process where the standard features for LID have already been developed using Mel-Frequency Cepstral Coefficients (MFCC), Shifted Delta Cepstral (SDC), the Gaussian Mixture Model (GMM) and ending with the i-vector based framework. However, the process of learning based on extract features remains to be improved (i.e. optimised) to capture all embedded knowledge on the extracted features. The Extreme Learning Machine (ELM) is an effective learning model used to perform classification and regression analysis and is extremely useful to train a single hidden layer neural network. Nevertheless, the learning process of this model is not entirely effective (i.e. optimised) due to the random selection of weights within the input hidden layer. In this study, the ELM is selected as a learning model for LID based on standard feature extraction. One of the optimisation approaches of ELM, the Self-Adjusting Extreme Learning Machine (SA-ELM) is selected as the benchmark and improved by altering the selection phase of the optimisation process. The selection process is performed incorporating both the Split-Ratio and K-Tournament methods, the improved SA-ELM is named Enhanced Self-Adjusting Extreme Learning Machine (ESA-ELM). The results are generated based on LID with the datasets created from eight different languages. The results of the study showed excellent superiority relating to the performance of the Enhanced Self-Adjusting Extreme Learning Machine LID (ESA-ELM LID) compared with the SA-ELM LID, with ESA-ELM LID achieving an accuracy of 96.25%, as compared to the accuracy of SA-ELM LID of only 95.00%.
Paraphrase identification serves as an important topic in natural language processing while sequence alignment and matching underlie the principle of this task. Traditional alignment methods take advantage of attention mechanism. Attention mechanism, i.e. weighting technique, could pick out the most similar/dissimilar parts, but is weak in modeling the aligned unmatched parts, which are the crucial evidence to identify paraphrases. In this paper, we empower neural architecture with Hungarian algorithm to extract the aligned unmatched parts. Specifically, first, our model applies BiLSTM/BERT to encode the input sentences into hidden representations. Then, Hungarian layer leverages the hidden representations to extract the aligned unmatched parts. Last, we apply cosine similarity to metric the aligned unmatched parts for a final discrimination. Extensive experiments show that our model outperforms other baselines, substantially and significantly.
Matched MeSH terms: Language; Natural Language Processing
Phrase-level prosody serves two essential functions in many languages of the world: chunking information into units (demarcating) and marking important information (highlighting). Recent work suggests that prosody has a mainly demarcating function in the Trade Malay language family. That is, the use of pitch accents in these languages is limited or absent, as the main prosodic events occur on the final two syllables in a phrase. The current study investigates the extent to which Papuan Malay phrase prosody is used for demarcating and highlighting, taking into account the potential influence of word stress. This is done by means of acoustic analyses on a corpus of spontaneous speech. Both the form (F0 movement) and the possible functions (demarcating and highlighting) of the final two syllables in Papuan Malay phrases are investigated. Although most results favor the demarcating function of Papuan Malay phrase prosody, a highlighting function cannot be ruled out. The results suggest that Papuan Malay might hold an exceptional position in the typology of prosodic prominence.
The existence of word stress in Indonesian languages has been controversial. Recent acoustic analyses of Papuan Malay suggest that this language has word stress, counter to other studies and unlike closely related languages. The current study further investigates Papuan Malay by means of lexical (non-acoustic) analyses of two different aspects of word stress. In particular, this paper reports two distribution analyses of a word corpus, 1) investigating the extent to which stress patterns may help word recognition and 2) exploring the phonological factors that predict the distribution of stress patterns. The facilitating role of stress patterns in word recognition was investigated in a lexical analysis of word embeddings. The results show that Papuan Malay word stress (potentially) helps to disambiguate words. As for stress predictors, a random forest analysis investigated the effect of multiple morpho-phonological factors on stress placement. It was found that the mid vowels /ɛ/ and /ɔ/ play a central role in stress placement, refining the conclusions of previous work that mainly focused on /ɛ/. The current study confirms that non-acoustic research on stress can complement acoustic research in important ways. Crucially, the combined findings on stress in Papuan Malay so far give rise to an integrated perspective to word stress, in which phonetic, phonological and cognitive factors are considered.
A real-time Bangla Sign Language interpreter can enable more than 200 k hearing and speech-impaired people to the mainstream workforce in Bangladesh. Bangla Sign Language (BdSL) recognition and detection is a challenging topic in computer vision and deep learning research because sign language recognition accuracy may vary on the skin tone, hand orientation, and background. This research has used deep machine learning models for accurate and reliable BdSL Alphabets and Numerals using two well-suited and robust datasets. The dataset prepared in this study comprises of the largest image database for BdSL Alphabets and Numerals in order to reduce inter-class similarity while dealing with diverse image data, which comprises various backgrounds and skin tones. The papers compared classification with and without background images to determine the best working model for BdSL Alphabets and Numerals interpretation. The CNN model trained with the images that had a background was found to be more effective than without background. The hand detection portion in the segmentation approach must be more accurate in the hand detection process to boost the overall accuracy in the sign recognition. It was found that ResNet18 performed best with 99.99% accuracy, precision, F1 score, sensitivity, and 100% specificity, which outperforms the works in the literature for BdSL Alphabets and Numerals recognition. This dataset is made publicly available for researchers to support and encourage further research on Bangla Sign Language Interpretation so that the hearing and speech-impaired individuals can benefit from this research.
Early child multilingual acquisition is under-explored. Using a cross-sectional study approach, the present research investigates the rate of multilingual phonological acquisition of English-Mandarin-Malay by 64 ethnic Chinese children aged 2;06-4;05 in Malaysia--a multiracial-multilingual country of Asia. The aims of the study are to provide clinical norms for speech development in the multilingual children and to compare multilingual acquisition with monolingual and bilingual acquisition. An innovative multilingual phonological test which adopts well-defined scoring criteria drawing upon local accents of English, Mandarin and Malay is proposed and described in this article. This procedure has been neglected in the few existing Chinese bilingual phonological acquisition studies resulting in peculiar findings. The multilingual children show comparable phonological acquisition milestones to that of monolingual and bilingual peers acquiring the same languages. The implications of the present results are discussed. The present findings contribute to the development of models and theories of child multilingual acquisition.
Purpose This study introduces a framework to produce very short versions of the MacArthur-Bates Communicative Development Inventories (CDIs) by combining the Bayesian-inspired approach introduced by Mayor and Mani (2019) with an item response theory-based computerized adaptive testing that adapts to the ability of each child, in line with Makransky et al. (2016). Method We evaluated the performance of our approach-dynamically selecting maximally informative words from the CDI and combining parental response with prior vocabulary data-by conducting real-data simulations using four CDI versions having varying sample sizes on Wordbank-the online repository of digitalized CDIs: American English (a very large data set), Danish (a large data set), Beijing Mandarin (a medium-sized data set), and Italian (a small data set). Results Real-data simulations revealed that correlations exceeding .95 with full CDI administrations were reached with as few as 15 test items, with high levels of reliability, even when languages (e.g., Italian) possessed few digitalized administrations on Wordbank. Conclusions The current approach establishes a generic framework that produces very short (less than 20 items) adaptive early vocabulary assessments-hence considerably reducing their administration time. This approach appears to be robust even when CDIs have smaller samples in online repositories, for example, with around 50 samples per month-age.
Matched MeSH terms: Child Language*; Language Development
The present study explores the viability of using tablets in assessing early word comprehension by means of a two-alternative forced-choice task. Forty-nine 18-20-month-old Norwegian toddlers performed a touch-based word recognition task, in which they were prompted to identify the labeled target out of two displayed items on a touchscreen tablet. In each trial, the distractor item was either semantically related (e.g., dog-cat) or unrelated (e.g., dog-airplane) to the target. Our results show that toddlers as young as 18 months can engage meaningfully with a tablet-based assessment, with minimal verbal instruction and child-administrator interaction. Toddlers performed better in the semantically unrelated condition than in the related condition, suggesting that their word representations are still semantically coarse at this age. Furthermore, parental reports of comprehension, using the Norwegian version of the MacArthur-Bates Communicative Development Inventories, predicted toddlers' performance, with parent-child agreement stronger in the semantically unrelated condition, indicating that parents declare a word to be known by their child if it is understood at a coarse representational level. This study provides among the earliest evidence that remote data collection in 18-20 month-old toddlers is viable, as comparable results were observed from both in-laboratory and online administration of the touchscreen recognition task.
Matched MeSH terms: Language Development*; Language Tests
There is a growing need to conduct a neuropsychological assessment with bilingual Middle Eastern populations, particularly those who speak the Persian language (Farsi). Although validated neuropsychological and language tests have emerged in Iran, there remains a shortage of appropriate psychometric tests in the U.S. that have been validated for use with the Iranian-American population. This often leads to an assortment of using U.S. tests in English, U.S. tests translated into Farsi, and Iranian tests in Farsi, which can complicate the clinical assessment. To better understand common testing issues when working with bilingual Iranian-American patients, we review the first report of a 62-year-old, bilingual (English-Farsi) Iranian-American male with 18-years of education who was tested using U.S.-developed and Iranian-developed tests in both English and Farsi language. Pre-surgical, 6 months post-surgical, and 1.5 years of post-surgical assessment data are discussed. We highlight the strengths and limitations of naming tests, test used in the native country versus U.S. language tests, the importance of baseline testing, general bilingual Persian-English assessment considerations, and case-based learning points.
This study in the management of dysphagia or swallowing disorders involved 72 contactable Speech-Language Pathologists (SLP) in Malaysia. A survey was undertaken to identify the patterns of dysphagia management by SLPs in Malaysia by identifying the percentage of SLPs in Malaysia who have managed swallowing disorders, the approximate number of patients, assessment and therapy techniques used, other professional involvement and the factors that influenced the confidence levels of the SLPs in managing swallowing disorders. Fifty percent (50%) of the forty four SLPs (61.6%) who responded to the survey had previously managed swallowing disorders. It was estimated that 5% (430 of 8268) of patients referred to the SLPs in Malaysia presented with dysphagia and were subsequently managed for their swallowing problems. The oromotor examination was carried out most frequently (100%) for evaluation of dysphagia while the compensatory technique proved to be the most frequently used management technique (77.3%). Most referrals to the SLPs were received from the neurosurgeon (59.1%); the otorhinolaryngologist was most referred to by the SLPs (50%). By using the Chi-squared analysis, it was found that clinical training in dysphagia at the undergraduate or post-graduate levels influenced the confidence levels of the SLPs in managing dysphagia cases (χ2 = 10.063 with p value = 0.007).
The purpose of this study is to investigate how reference materials (i.e. dictionaries) commonly
prescribed to Malaysian school learners address and describe a very common and important linguistic
feature - phrasal verbs. Two bilingual learner dictionaries frequently recommended for secondary
school learners in Malaysia were examined. Analysis of common phrasal verbs like pick up, come out,
and go out was carried out by examining entries in the dictionaries that discuss this linguistic feature.
Descriptive analysis was conducted to examine how this particular language form is described by
looking at the selection of phrasal verbs, as well as information provided with respect to phrasal verbs.
Results of the analysis have revealed some interesting findings with regard to the selection and
description of phrasal verbs in these dictionaries, which may have also contributed to learners'
difficulties in understanding and learning the language form. The paper will be concluded by
discussing some recommendations with respect to the inclusion and selection of phrasal verbs in
language reference materials particularly dictionaries in Malaysian schools.
Maximum k Satisfiability logical rule (MAX-kSAT) is a language that bridges real life application to neural network optimization. MAX-kSAT is an interesting paradigm because the outcome of this logical rule is always negative/false. Hopfield Neural Network (HNN) is a type of neural network that finds the solution based on energy minimization. Interesting intelligent behavior has been observed when the logical rule is embedded in HNN. Increasing the storage capacity during the learning phase of HNN has been a challenging problem for most neural network researchers. Development of Metaheuristics algorithms has been crucial in optimizing the learning phase of Neural Network. The most celebrated metaheuristics model is Genetic Algorithm (GA). GA consists of several important operators that emphasize on solution improvement. Although GA has been reported to optimize logic programming in HNN, the learning complexity increases as the number of clauses increases. GA is more likely to be trapped in suboptimal fitness as the number of clauses increases. In this paper, metaheuristic algorithm namely Artificial Bee Colony (ABC) were proposed in learning MAX-kSAT programming. ABC is swarm-based metaheuristics that capitalized the capability of Employed Bee, Onlooker Bee, and Scout Bee. To this end, all the learning models were tested in a new restricted learning environment. Experimental results obtained from the computer simulation demonstrate the effectiveness of ABC in modelling MAX-kSAT.
This study is fundamental in looking to validate the agreement of Self-Assesment Instrument of Outdoor Competency (OCL-oMR) among the Co-curriculum Center Coaches in Malaysia. The instrument are newly developed by the researcher . The Inventory Responses –oMR (IR-oMR) are purposely to evaluate and determine the goodness of self-assesement instrument of outdoor competency (OCL-oMR) among co-curriculum center coaches in Malaysia. By using the correlation & percentage, the analysis were used. N=10 of head coaches of co-curriculum Center were selected to be a sampels. These data is a secondary data that researcher used in the main research. But as a secondary data, its really important to researcher to identify and justify the newly instrument of self assesment of outdoor competency (OCL-oMR). Findings shown contents validity r=.82 were recorded and the language validity were shown r=.83. Meanwhile, anothers supporting data were used percentage of agreement of Inventory Responses –Omr (IR-oMR) toward the Self-Assesment Instrument of Outdoor Competency (OCL-oMR) among the Co-curriculum Center Coaches in Malaysia. Overall, from these findings, researcher found that’s the Inventory Responses – oMR (IR-oMR) shown that the Self-Assesment Instrument of Outdoor Competency (OCL-oMR) among the Co-curriculum Center Coaches in Malaysia are valid instrument to measure the competency level of outdoor education coaches in co-curriculum center in Malaysia and the Inventory Responses – oMR (IR-oMR) are significantly toward the outdoor competency (OCL-oMR).
A reading chart that resembles real reading conditions is important to evaluate the quality of life in terms of reading performance. The purpose of this study was to compare the reading speed of UiTM Malay related words (UiTM-Mrw) reading chart with MNread Acuity Chart and Colenbrander Reading Chart.