Prediction is one characteristic of the human mind. But what does it mean to say the mind is a "prediction machine" and inherently forward looking as is frequently claimed? In natural languages, many contexts are not easily predictable in a forward fashion. In English, for example, many frequent verbs do not carry unique meaning on their own but instead, rely on another word or words that follow them to become meaningful. Upon reading take a the processor often cannot easily predict walk as the next word. But the system can "look back" and integrate walk more easily when it follows take a (e.g., as opposed to *make|get|have a walk). In the present paper, we provide further evidence for the importance of both forward and backward-looking in language processing. In two self-paced reading tasks and an eye-tracking reading task, we found evidence that adult English native speakers' sensitivity to word forward and backward conditional probability significantly predicted reading times over and above psycholinguistic predictors of reading latencies. We conclude that both forward and backward-looking (prediction and integration) appear to be important characteristics of language processing. Our results thus suggest that it makes just as much sense to call the mind an "integration machine" which is inherently backward 'looking.'
Morphological processing in visual word recognition has been extensively studied in a few languages, but other languages with interesting morphological systems have received little attention. Here, we examined Malay, an Austronesian language that is agglutinative. Agglutinative languages typically have a large number of morphemes per word. Our primary aim was to facilitate research on morphological processing in Malay by augmenting the Malay Lexicon Project (a database containing lexical information for almost 10,000 words) to include a breakdown of the words into morphemes as well as morphological properties for those morphemes. A secondary goal was to determine which morphological variables influence Malay word recognition. We collected lexical decision data for Malay words that had one prefix and one suffix, and first examined the predictive power of 15 morphological and four lexical variables on response times (RT). Of these variables, two lexical and three morphological variables emerged as strong predictors of RT. In GAMM models, we found a facilitatory effect of root family size, and inhibitory effects of prefix length and prefix percentage of more frequent words (PFMF) on RT. Next, we explored the interactions between overall word frequency and several of these predictors. Of particular interest, there was a significant word frequency by root family size interaction in which the effect of root family size is stronger for low-frequency words. We hope that this initial work on morphological processing in Malay inspires further research in this and other understudied languages, with the goal of developing a universal theory of morphological processing.
Word Sense Disambiguation (WSD) is the task of determining which sense of an ambiguous word (word with multiple meanings) is chosen in a particular use of that word, by considering its context. A sentence is considered ambiguous if it contains ambiguous word(s). Practically, any sentence that has been classified as ambiguous usually has multiple interpretations, but just one of them presents the correct interpretation. We propose an unsupervised method that exploits knowledge based approaches for word sense disambiguation using Harmony Search Algorithm (HSA) based on a Stanford dependencies generator (HSDG). The role of the dependency generator is to parse sentences to obtain their dependency relations. Whereas, the goal of using the HSA is to maximize the overall semantic similarity of the set of parsed words. HSA invokes a combination of semantic similarity and relatedness measurements, i.e., Jiang and Conrath (jcn) and an adapted Lesk algorithm, to perform the HSA fitness function. Our proposed method was experimented on benchmark datasets, which yielded results comparable to the state-of-the-art WSD methods. In order to evaluate the effectiveness of the dependency generator, we perform the same methodology without the parser, but with a window of words. The empirical results demonstrate that the proposed method is able to produce effective solutions for most instances of the datasets used.
Matched MeSH terms: Language*; Natural Language Processing*
A prominent methodological issue in cognitive research on bilingualism is the lack of consistency in measuring second language (L2) proficiency. To reduce the inconsistency in L2 proficiency measurements, brief and valid vocabulary tests have been developed as an objective measure of proficiency in a variety of languages (e.g., English, French, Spanish). Here, we present LexCHI, a valid lexical test to measure Chinese proficiency. This freely available short test consists of 60 two-character items presented in simplified Chinese. Although it only takes a few minutes to complete LexCHI, the LexCHI scores in two studies correlated significantly with L2 participants' performance in a translation task and a cloze test. We believe that LexCHI is a useful tool for researchers who need to objectively measure Chinese proficiency as part of their investigations.
A review of the questionnaire was carried out basically to assess the relevance of the questions to the objectives of the study, to identify weaknesses of the questions particularly in terms of the wording in order to make them as clear as possible to the respondents and to minimize ambiguity and thus the problems of getting the questions across to the respondents. Based on the review a new set of questionnaire would be proposed. The review thus focuses on two major aspects namely the structure and the content of the questionnaire. From the structural aspects each question was reviewed in terms of the language, wording, sequencing and continuity between one another. Basically, not much problems have been identified except in certain cases of ambiguity largely due to language and words used and some cases lack of continuity due to improper sequencing of the questions In terms of the content, for each questions, the purpose of asking, and what is expected of the questions was thoroughly examined and then the relevance assessed. Based on the analysis, three group of questions were identified i.e., the irrelevant questions, the partially relevant and most inlportant non-existence of many relevant questions. It is recommended that the irrelevant questions be omitted, those partially relevant to be modified and new questions added.
Unified Modeling Language is the most popular and widely used Object-Oriented modelling language in the IT industry. This study focuses on investigating the ability to expand UML to some extent to model crosscutting concerns (Aspects) to support AspectJ. Through a comprehensive literature review, we identify and extensively examine all the available Aspect-Oriented UML modelling approaches and find that the existing Aspect-Oriented Design Modelling approaches using UML cannot be considered to provide a framework for a comprehensive Aspectual UML modelling approach and also that there is a lack of adequate Aspect-Oriented tool support. This study also proposes a set of Aspectual UML semantic rules and attempts to generate AspectJ pseudocode from UML diagrams. The proposed Aspectual UML modelling approach is formally evaluated using a focus group to test six hypotheses regarding performance; a "good design" criteria-based evaluation to assess the quality of the design; and an AspectJ-based evaluation as a reference measurement-based evaluation. The results of the focus group evaluation confirm all the hypotheses put forward regarding the proposed approach. The proposed approach provides a comprehensive set of Aspectual UML structural and behavioral diagrams, which are designed and implemented based on a comprehensive and detailed set of AspectJ programming constructs.
Paraphrase identification serves as an important topic in natural language processing while sequence alignment and matching underlie the principle of this task. Traditional alignment methods take advantage of attention mechanism. Attention mechanism, i.e. weighting technique, could pick out the most similar/dissimilar parts, but is weak in modeling the aligned unmatched parts, which are the crucial evidence to identify paraphrases. In this paper, we empower neural architecture with Hungarian algorithm to extract the aligned unmatched parts. Specifically, first, our model applies BiLSTM/BERT to encode the input sentences into hidden representations. Then, Hungarian layer leverages the hidden representations to extract the aligned unmatched parts. Last, we apply cosine similarity to metric the aligned unmatched parts for a final discrimination. Extensive experiments show that our model outperforms other baselines, substantially and significantly.
Matched MeSH terms: Language; Natural Language Processing
Phrase-level prosody serves two essential functions in many languages of the world: chunking information into units (demarcating) and marking important information (highlighting). Recent work suggests that prosody has a mainly demarcating function in the Trade Malay language family. That is, the use of pitch accents in these languages is limited or absent, as the main prosodic events occur on the final two syllables in a phrase. The current study investigates the extent to which Papuan Malay phrase prosody is used for demarcating and highlighting, taking into account the potential influence of word stress. This is done by means of acoustic analyses on a corpus of spontaneous speech. Both the form (F0 movement) and the possible functions (demarcating and highlighting) of the final two syllables in Papuan Malay phrases are investigated. Although most results favor the demarcating function of Papuan Malay phrase prosody, a highlighting function cannot be ruled out. The results suggest that Papuan Malay might hold an exceptional position in the typology of prosodic prominence.
Spoken Language Identification (LID) is the process of determining and classifying natural language from a given content and dataset. Typically, data must be processed to extract useful features to perform LID. The extracting features for LID, based on literature, is a mature process where the standard features for LID have already been developed using Mel-Frequency Cepstral Coefficients (MFCC), Shifted Delta Cepstral (SDC), the Gaussian Mixture Model (GMM) and ending with the i-vector based framework. However, the process of learning based on extract features remains to be improved (i.e. optimised) to capture all embedded knowledge on the extracted features. The Extreme Learning Machine (ELM) is an effective learning model used to perform classification and regression analysis and is extremely useful to train a single hidden layer neural network. Nevertheless, the learning process of this model is not entirely effective (i.e. optimised) due to the random selection of weights within the input hidden layer. In this study, the ELM is selected as a learning model for LID based on standard feature extraction. One of the optimisation approaches of ELM, the Self-Adjusting Extreme Learning Machine (SA-ELM) is selected as the benchmark and improved by altering the selection phase of the optimisation process. The selection process is performed incorporating both the Split-Ratio and K-Tournament methods, the improved SA-ELM is named Enhanced Self-Adjusting Extreme Learning Machine (ESA-ELM). The results are generated based on LID with the datasets created from eight different languages. The results of the study showed excellent superiority relating to the performance of the Enhanced Self-Adjusting Extreme Learning Machine LID (ESA-ELM LID) compared with the SA-ELM LID, with ESA-ELM LID achieving an accuracy of 96.25%, as compared to the accuracy of SA-ELM LID of only 95.00%.
The existence of word stress in Indonesian languages has been controversial. Recent acoustic analyses of Papuan Malay suggest that this language has word stress, counter to other studies and unlike closely related languages. The current study further investigates Papuan Malay by means of lexical (non-acoustic) analyses of two different aspects of word stress. In particular, this paper reports two distribution analyses of a word corpus, 1) investigating the extent to which stress patterns may help word recognition and 2) exploring the phonological factors that predict the distribution of stress patterns. The facilitating role of stress patterns in word recognition was investigated in a lexical analysis of word embeddings. The results show that Papuan Malay word stress (potentially) helps to disambiguate words. As for stress predictors, a random forest analysis investigated the effect of multiple morpho-phonological factors on stress placement. It was found that the mid vowels /ɛ/ and /ɔ/ play a central role in stress placement, refining the conclusions of previous work that mainly focused on /ɛ/. The current study confirms that non-acoustic research on stress can complement acoustic research in important ways. Crucially, the combined findings on stress in Papuan Malay so far give rise to an integrated perspective to word stress, in which phonetic, phonological and cognitive factors are considered.
Cultural differences-as well as similarities-have been found in explicit color-emotion associations between Chinese and Western populations. However, implicit associations in a cross-cultural context remain an understudied topic, despite their sensitivity to more implicit knowledge. Moreover, they can be used to study color systems-that is, emotional associations with one color in the context of an opposed one. Therefore, we tested the influence of two different color oppositions on affective stimulus categorization: red versus green and red versus white, in two experiments. In Experiment 1, stimuli comprised positive and negative words, and participants from the West (Austria/Germany), and the East (Mainland China, Macau) were tested in their native languages. The Western group showed a significantly stronger color-valence interaction effect than the Mainland Chinese (but not the Macanese) group for red-green but not for red-white opposition. To explore color-valence interaction effects independently of word stimulus differences between participant groups, we used affective silhouettes instead of words in Experiment 2. Again, the Western group showed a significantly stronger color-valence interaction than the Chinese group in red-green opposition, while effects in red-white opposition did not differ between cultural groups. Our findings complement those from explicit association research in an unexpected manner, where explicit measures showed similarities between cultures (associations for red and green), our results revealed differences and where explicit measures showed differences (associations with white), our results showed similarities, underlining the value of applying comprehensive measures in cross-cultural research on cross-modal associations.
Language and culture ecological environment introduces ecological theory into language and culture research, expanding the horizon of language research. The influence of language and cultural, ecological environment on English writing covers many aspects. The cognitive process of English writing involves preparation before writing, self-monitoring during writing, and self-reflection after writing. Therefore, the use of metacognition and other strategies in the cognitive process of English writing is the key to improving the cognitive level of English writing. Under the guidance of the new curriculum standards for high school English, the cognitive process of English writing should pay attention to the guidance and shaping of students' emotional experience and thinking values. Education is inseparable from the development of language and culture, and analyzing the educational ecosystem from an ecological perspective is conducive to further understanding the ecological view of language and culture. This paper focuses on the composition of the language and culture ecological environment and the influence of the language and culture environment on the cognitive process of English writing and appropriately reviews the history of cognitive psychology and ecology and development of knowledge research.
A real-time Bangla Sign Language interpreter can enable more than 200 k hearing and speech-impaired people to the mainstream workforce in Bangladesh. Bangla Sign Language (BdSL) recognition and detection is a challenging topic in computer vision and deep learning research because sign language recognition accuracy may vary on the skin tone, hand orientation, and background. This research has used deep machine learning models for accurate and reliable BdSL Alphabets and Numerals using two well-suited and robust datasets. The dataset prepared in this study comprises of the largest image database for BdSL Alphabets and Numerals in order to reduce inter-class similarity while dealing with diverse image data, which comprises various backgrounds and skin tones. The papers compared classification with and without background images to determine the best working model for BdSL Alphabets and Numerals interpretation. The CNN model trained with the images that had a background was found to be more effective than without background. The hand detection portion in the segmentation approach must be more accurate in the hand detection process to boost the overall accuracy in the sign recognition. It was found that ResNet18 performed best with 99.99% accuracy, precision, F1 score, sensitivity, and 100% specificity, which outperforms the works in the literature for BdSL Alphabets and Numerals recognition. This dataset is made publicly available for researchers to support and encourage further research on Bangla Sign Language Interpretation so that the hearing and speech-impaired individuals can benefit from this research.
Amidst the contemporary diasporic landscape in Sinophone literature, this research critically examines the nexus of language, culture, and identity. The study aims to analyze literary pieces composed in Sinophone languages across diverse diasporic communities and uncover the impact of language and cultural elements on the articulation and comprehension of diasporic identity. This paper used the following. comparative and typological research, an in-depth analysis of three Sinophonic texts, and contextual analysis. The subject of the study was three texts: The Joy Luck Club (Amy Tan), Balzac and the Little Chinese Seamstress (Dai Sijie), and The Woman Warrior (Maxine Hong Kingston). The results showed that In The Joy Luck Club, language and cultural facets unveil the characters' dual identity struggles due to living abroad, exemplified through code-switching's psychological tension. Balzac and the Little Chinese Seamstress utilizes language and cultural details to underscore the significance of preserving heritage within the diaspora, with literary allusions amplifying this endeavor. In The Woman Warrior, language and cultural elements reflect the heroine's inner conflict as she navigates her dual cultural allegiance. This scholarly revelation deepens comprehension of how these aspects influence identity formation in the diaspora. These findings broaden the understanding of Sinophone diasporic literature, spotlighting shared trends in identity portrayal through language and culture. The research has theoretical value for literary, cultural, and anthropological studies and practical significance, potentially informing educational initiatives on diasporic literature and cultural diversity. This study's outcomes hold relevance for students, researchers, and cultural scholars exploring the role of language and culture in diasporic identity expression.
Early child multilingual acquisition is under-explored. Using a cross-sectional study approach, the present research investigates the rate of multilingual phonological acquisition of English-Mandarin-Malay by 64 ethnic Chinese children aged 2;06-4;05 in Malaysia--a multiracial-multilingual country of Asia. The aims of the study are to provide clinical norms for speech development in the multilingual children and to compare multilingual acquisition with monolingual and bilingual acquisition. An innovative multilingual phonological test which adopts well-defined scoring criteria drawing upon local accents of English, Mandarin and Malay is proposed and described in this article. This procedure has been neglected in the few existing Chinese bilingual phonological acquisition studies resulting in peculiar findings. The multilingual children show comparable phonological acquisition milestones to that of monolingual and bilingual peers acquiring the same languages. The implications of the present results are discussed. The present findings contribute to the development of models and theories of child multilingual acquisition.
Purpose This study introduces a framework to produce very short versions of the MacArthur-Bates Communicative Development Inventories (CDIs) by combining the Bayesian-inspired approach introduced by Mayor and Mani (2019) with an item response theory-based computerized adaptive testing that adapts to the ability of each child, in line with Makransky et al. (2016). Method We evaluated the performance of our approach-dynamically selecting maximally informative words from the CDI and combining parental response with prior vocabulary data-by conducting real-data simulations using four CDI versions having varying sample sizes on Wordbank-the online repository of digitalized CDIs: American English (a very large data set), Danish (a large data set), Beijing Mandarin (a medium-sized data set), and Italian (a small data set). Results Real-data simulations revealed that correlations exceeding .95 with full CDI administrations were reached with as few as 15 test items, with high levels of reliability, even when languages (e.g., Italian) possessed few digitalized administrations on Wordbank. Conclusions The current approach establishes a generic framework that produces very short (less than 20 items) adaptive early vocabulary assessments-hence considerably reducing their administration time. This approach appears to be robust even when CDIs have smaller samples in online repositories, for example, with around 50 samples per month-age.
Matched MeSH terms: Child Language*; Language Development
The present study explores the viability of using tablets in assessing early word comprehension by means of a two-alternative forced-choice task. Forty-nine 18-20-month-old Norwegian toddlers performed a touch-based word recognition task, in which they were prompted to identify the labeled target out of two displayed items on a touchscreen tablet. In each trial, the distractor item was either semantically related (e.g., dog-cat) or unrelated (e.g., dog-airplane) to the target. Our results show that toddlers as young as 18 months can engage meaningfully with a tablet-based assessment, with minimal verbal instruction and child-administrator interaction. Toddlers performed better in the semantically unrelated condition than in the related condition, suggesting that their word representations are still semantically coarse at this age. Furthermore, parental reports of comprehension, using the Norwegian version of the MacArthur-Bates Communicative Development Inventories, predicted toddlers' performance, with parent-child agreement stronger in the semantically unrelated condition, indicating that parents declare a word to be known by their child if it is understood at a coarse representational level. This study provides among the earliest evidence that remote data collection in 18-20 month-old toddlers is viable, as comparable results were observed from both in-laboratory and online administration of the touchscreen recognition task.
Matched MeSH terms: Language Development*; Language Tests
There is a growing need to conduct a neuropsychological assessment with bilingual Middle Eastern populations, particularly those who speak the Persian language (Farsi). Although validated neuropsychological and language tests have emerged in Iran, there remains a shortage of appropriate psychometric tests in the U.S. that have been validated for use with the Iranian-American population. This often leads to an assortment of using U.S. tests in English, U.S. tests translated into Farsi, and Iranian tests in Farsi, which can complicate the clinical assessment. To better understand common testing issues when working with bilingual Iranian-American patients, we review the first report of a 62-year-old, bilingual (English-Farsi) Iranian-American male with 18-years of education who was tested using U.S.-developed and Iranian-developed tests in both English and Farsi language. Pre-surgical, 6 months post-surgical, and 1.5 years of post-surgical assessment data are discussed. We highlight the strengths and limitations of naming tests, test used in the native country versus U.S. language tests, the importance of baseline testing, general bilingual Persian-English assessment considerations, and case-based learning points.