Displaying publications 1 - 20 of 151 in total

Abstract:
Sort:
  1. Sim JH
    Acad Med, 2017 06;92(6):726.
    PMID: 28557910 DOI: 10.1097/ACM.0000000000001680
    Matched MeSH terms: Educational Measurement*
  2. Puthiaparampil T, Rahman M
    BMC Med Educ, 2021 Jan 07;21(1):29.
    PMID: 33413332 DOI: 10.1186/s12909-020-02463-0
    BACKGROUND: Distractor efficiency and the optimum number of functional distractors per item in One Best Answer Questions have been debated. The prevalence of non-functional distractors has led to a reduction in the number of distractors per item with the advantage of adding more items in the test. The existing literature eludes a definite answer to the question of what distractor efficiency best matches excellent psychometric indices. We examined the relationship between distractor efficiency and the psychometric indices of One Best Answer Questions in search of an answer.

    METHODS: We analysed 350 items used in 7 professional examinations and determined their distractor efficiency and the number of functional distractors per item. The items were sorted into five groups - excellent, good, fair, remediable and discarded based on their discrimination index. We studied how the distractor efficiency and functional distractors per item correlated with these five groups.

    RESULTS: Correlation of distractor efficiency with psychometric indices was significant but far from perfect. The excellent group topped in distractor efficiency in 3 tests, the good group in one test, the remediable group equalled excellent group in one test, and the discarded group topped in 2 tests.

    CONCLUSIONS: The distractor efficiency did not correlate in a consistent pattern with the discrimination index. Fifty per cent or higher distractor efficiency, not hundred percent, was found to be the optimum.

    Matched MeSH terms: Educational Measurement*
  3. Balasundaram R
    Family Practitioner, 1983;6(1):91-97.
    Matched MeSH terms: Educational Measurement*
  4. Shankar PR
    Can Med Educ J, 2023 Nov;14(5):152-153.
    PMID: 38045090 DOI: 10.36834/cmej.77411
    Matched MeSH terms: Educational Measurement*
  5. Taha MH, Mohammed HEEG, Abdalla ME, Yusoff MSB, Mohd Napiah MK, Wadi MM
    Med Educ Online, 2024 Dec 31;29(1):2412392.
    PMID: 39445670 DOI: 10.1080/10872981.2024.2412392
    The Extended matching Questions (EMQs), or R-type questions, are format of selected-response. The validity evidence for this format is crucial, but there have been reports of misunderstandings about validity. It is unclear what kinds of evidence should be presented and how to present them to support their educational impact. This review explores the pattern and quality of reporting the sources of validity evidence of EMQs in health professions education, encompassing content, response process, internal structure, relationship to other variables, and consequences. A systematic search in the electronic databases including MEDLINE via PubMed, Scopus, Web of Science, CINAHL, and ERIC was conducted to extract studies that utilize EMQs. The framework for a unitary concept of validity was applied to extract data. A total of 218 titles were initially selected, the final number of titles was 19. The most reported pieces of evidence were the reliability coefficient, followed by the relationship to another variable. Additionally, the adopted definition of validity is mostly the old tripartite concept. This study found that reporting and presenting validity evidence appeared to be deficient. The available evidence can hardly provide a strong validity argument that supports the educational impact of EMQs. This review calls for more work on developing a tool to measure the reporting and presenting validity evidence.
    Matched MeSH terms: Educational Measurement/methods; Educational Measurement/standards
  6. Mitra, N.K., Nagaraja, H.S., Ponnudurai, G., Judson, J. P.
    MyJurnal
    Item analysis is the process of collecting, summarizing and using information from students’ responses to assess the quality of test items. Difficulty index (P) and Discrimination index (D) are two parameters which help evaluate the standard of MCQ questions used in an examination, with abnormal values indicating poor quality. In this study, 120 test items of 12 Type A MCQ tests of Foundation 1 multi-disciplinary summative assessment from M2 / 2003 to M2 / 2006 cohorts of International Medical University were selected and their P-scores in percent and D-scores were estimated using Microsoft Office Excel. The relationship between the item difficulty index and discrimination index for each test item was determined by Pearson correlation analysis using SPSS 11.5. Mean difficulty index scores of the individual summative tests were in the range of 64% to 89%. One-third of total test items crossed the difficulty index of 80% indicating that those items were easy for the students. Sixty seven percent of the test items showed acceptable (> 0.2) discrimination index. Forty five out of 120 test items showed excellent discrimination index. Discrimination index correlated poorly with difficulty index (r = -0.325). In conclusion, a consistent level of test difficulty and discrimination indices was maintained from 2003 to 2006 in all the twelve summative type A MCQ tests.
    Matched MeSH terms: Educational Measurement
  7. Puthiaparampil T, Rahman MM, Shazrina AR, Nariman S, Lukas S, Chai CS, et al.
    Med J Malaysia, 2022 Nov;77(6):724-729.
    PMID: 36448391
    INTRODUCTION: Our faculty used one long case (LC) and three short cases for the clinical component of the final professional examinations. During the COVID-19 pandemic, the LC had to be replaced with scenario-based clinical examination (SBCE) due to the impracticability of using recently hospitalised patients. While keeping the short case component as usual, the LC had to be replaced with SBCE in 2020 for the first time at a short notice. To evaluate the positive and negative aspects of SBCE and LC to determine the feasibility of replacing LC with SBCE in future examinations.

    MATERIALS AND METHODS: We compared the LC scores of three previous years with those of the SBCE and studied the feedback of the three stakeholders: students, examiners, and simulated patients (SPs), regarding their experience with SBCE and the suitability of SBCE as an alternative for LC in future examinations.

    RESULTS: The SBCE scores were higher than those of the LC. Most of the examiners and students were not in favour of SBCE replacing LC, as such. The SPs were more positive about the proposition. The comments of the three stakeholders brought out the plus and minus points of LC and SBCE, which prompted our proposals to make SBCE more practical for future examinations.

    CONCLUSION: Having analysed the feedback of the stakeholders, and the positive and negative aspects of LC and SBCE, it was evident that SBCE needed improvements. We have proposed eight modifications to SBCE to make it a viable alternative for LC.

    Matched MeSH terms: Educational Measurement*
  8. Grant IW
    Br Med J, 1978 Jun 10;1(6126):1549.
    PMID: 656792
    Matched MeSH terms: Educational Measurement*
  9. Finn GM, Tai J, Nadarajah VD
    Med Educ, 2025 Jan;59(1):88-96.
    PMID: 39255998 DOI: 10.1111/medu.15535
    CONTEXT: In this article, we draw upon diverse and contextually different experiences of working on inclusive assessment, with the aim of bridging and enhancing practices of inclusive assessments for health professions education (HPE) within universities. Instead of juxtaposing our views from three countries, we combine our perspectives to advocate for inclusive assessment.

    DISCUSSION: Creating an inclusive assessment culture is important for equitable education, even if priorities for inclusion might differ between contexts. We recognise challenges in the enactment of inclusive assessment, namely, the notion of lowering standards, harming reliability and robustness of assessment design and inclusion as a poorly defined and catchall term. Importantly, the lack of awareness that inclusion means recognising intersectionality is a barrier for well-designed inclusive assessments. This is why we offer considerations for HPE practitioners that can guide towards a unified direction of travel for inclusive assessments. This article highlights the importance of contextual prioritisation and initiatives to be considered at the global level to national, institutional, programme and the individual level. Utilising experience and literature from undergraduate, higher education contexts, we offer considerations with applicability across the assessment continuum.

    CONTEXT: In this state of science paper, we were set the challenge of providing cross-cultural viewpoints on inclusive assessment. In this discursive article, we focus on inclusive assessment within undergraduate health professions education whilst looking to the wider higher education literature, since institutional policies and procedures frequently drive assessment decisions and influence the environment in which they occur. We explore our experiences of working in inclusive assessment, with the aim of bridging and enhancing practices of inclusive assessments for HPE. Unlike other articles that juxtapose views, we all come from the perspective of supporting inclusive assessment. We begin with a discussion on what inclusive assessment is and then describe our contexts as a basis for understanding differences and broadening conversations. We work in the United Kingdom, Australia and Malaysia, having undertaken research, facilitated workshops and seminars on inclusive assessment nationally and internationally. We recognise our perspectives will differ as a consequence of our global context, institutional culture, individual characteristics and educational experiences. (Note that individual characteristics are also known as protected characteristics in some countries). Then, we outline challenges and opportunities associated with inclusive assessment, drawing on evidence within our contexts, acknowledging that our understanding of inclusive assessment research is limited to publications in English and currently tilted to publications from the Global North. In the final section, we then offer recommendations for championing inclusion, focussing firstly on assessment designs, and then broader considerations to organise collective action. Our article is unapologetically practical; the deliberate divergence from a theoretical piece is with the intent that anyone who reads this paper might enact even one small change progressing towards more inclusive assessment practices within their context.

    Matched MeSH terms: Educational Measurement/methods
  10. Wan Salwina Wan Ismail, Ruzanna ZamZam, Marhani Midin, Azlin Baharudin, Hazli Zakaria, Hatta Sidi, et al.
    ASEAN Journal of Psychiatry, 2010;11(1):118-122.
    MyJurnal
    Objective: This study aims to report on the process of standard settings (SS) and to compare the passing rates between the norm-reference and SS methods, for OSCE in psychiatry undergraduate examination at UKM for 2009/2010 session. Methods: In the SS method, examiners were asked to imagine the performance of a minimally competent student and gave marks using a standardized check-list. The marks in particular outliers were discussed. After the first round, the examiners went through the same process again, to rate the minimally competent students independently. The median of the marks was taken as the passing mark for the particular question. The passing rate using the passing mark of 50% in the normreference method was compared to the passing rate from the passing mark obtained from the settings method. Results: For question 1, the passing rate with the norm-reference method (i.e. passing mark of 50%) was 93% (106/114) and that by the SS method was 72.8% (83/114). For question 2, the pass rate with the norm-reference method was 92% (105/114) and that by the SS method was 67.5% (77/114). Conclusion: The passing rates between the two methods showed significant differences. Although OSCE is an improvement to the undergraduate psychiatry examination in Universiti Kebangsaan Malaysia, there were few limitations and challenges that need to be tackled for further improvement.
    Matched MeSH terms: Educational Measurement
  11. Schwartz PL, Kyaw Tun Sein
    Med Educ, 1987 May;21(3):265-8.
    PMID: 3600444
    Matched MeSH terms: Educational Measurement/methods*
  12. Schwartz PL, Crooks TJ, Sein KT
    Med Educ, 1986 Sep;20(5):399-406.
    PMID: 3762442
    It has been suggested that the 'ideal' measure of reliability of an examination is obtained by test and retest using the one examination on the same group of students. However, because of practical and theoretical arguments, most reported reliabilities for multiple choice examinations in medicine are actually measures of internal consistency. While attempting to minimize the effects of potential interfering factors, we have undertaken a study of true test-retest reliability of multiple true-false type multiple choice questions in preclinical medical subjects. From three end-of-term examinations, 363 items (106 of 449 from term 1, 150 of 499 from term 2, and 107 of 492 from term 3) were repeated in the final examination (out of 999 total items). Between test and retest, there was little overall decrease in the percentage of items answered correctly and a decrease of only 3.4 in the percentage score after correction for guessing. However, there was an inverse relation between test-retest interval and decrease in performance. Between test and retest, performance decreased significantly on 33 items and increased significantly on 11 items. Test-retest correlation coefficients were 0.70 to 0.78 for items from the separate terms and 0.885 for all items that were retested. Thus, overall, these items had a very high degree of reliability, approximately the 0.9 which has been specified as the requirement for being able to distinguish between individuals.
    Matched MeSH terms: Educational Measurement/methods*
  13. Shahabudin SH
    Med Educ, 1983 Sep;17(5):316-8.
    PMID: 6621433
    The belief that it is unwise to alter the initial response to a multiple choice question is questioned. Among 39 380 MCQ responses, there were 1818 changes (4.62%) of which 21.9% were correct to incorrect responses, 46.3% incorrect to correct responses and 31.8% incorrect to incorrect. This effect was very much more marked among the better students, incorrect to correct changes accounting for 61% of the responses in the upper group, 42% in the middle group and 34% in the lower group.
    Matched MeSH terms: Educational Measurement/methods*
  14. Roslani AM, Sein KT, Nordin R
    Med J Malaysia, 1989 Mar;44(1):75-82.
    PMID: 2626116
    The Phase I and Phase II undergraduate teaching programmes of the School of Medical Sciences were reviewed at the end of the 1985/86 academic year. It was found that deviations from the School's philosophy had crept into the implementation process. Modifications were therefore made in Phase I and Phase II programmes with a view to:--(i) reducing content, (ii) promoting integration, (iii) improving clinical examination skills of students, and (iv) providing more opportunities to students for self learning, reinforcement and application of knowledge. The number of assessment items in Phase I and the frequency of assessment in Phase II were also found to be inappropriate and so modifications in assessment were made to rectify this situation.
    Matched MeSH terms: Educational Measurement/methods*
  15. Sandosham AA
    Med J Malaysia, 1973 Dec;28(2):63-4.
    PMID: 4276221
    Matched MeSH terms: Educational Measurement*
  16. Ngim CF, Fullerton PD, Ratnasingam V, Arasoo VJT, Dominic NA, Niap CPS, et al.
    BMC Med Educ, 2021 Mar 24;21(1):180.
    PMID: 33761946 DOI: 10.1186/s12909-021-02585-z
    BACKGROUND: The Objective Structured Clinical Exam (OSCE) is a useful means of generating meaningful feedback. OSCE feedback may be in various forms (written, face to face and audio or video recordings). Studies on OSCE feedback are uncommon, especially involving Asian medical students.

    METHODS: We compared two methods of OSCE feedback delivered to fourth year medical students in Malaysia: (i) Face to face (FTF) immediate feedback (semester one) (ii) Individualised enhanced written (EW) feedback containing detailed scores in each domain, examiners' free text comments and the marking rubric (semester two). Both methods were evaluated by students and staff examiners, and students' responses were compared against their OSCE performance.

    RESULTS: Of the 116 students who sat for both formative OSCEs, 82.8% (n=96) and 86.2% (n=100) responded to the first and second survey respectively. Most students were comfortable to receive feedback (91.3% in FTF, 96% in EW) with EW feedback associated with higher comfort levels (p=0.022). Distress affected a small number with no differences between either method (13.5% in FTF, 10% in EW, p=0.316). Most students perceived both types of feedback improved their performance (89.6% in FTF, 95% in EW); this perception was significantly stronger for EW feedback (p=0.008). Students who preferred EW feedback had lower OSCE scores compared to those preferring FTF feedback (mean scores ± SD: 43.8 ± 5.3 in EW, 47.2 ± 6.5 in FTF, p=0.049). Students ranked the "marking rubric" to be the most valuable aspect of the EW feedback. Tutors felt both methods of feedback were equally beneficial. Few examiners felt they needed training (21.4% in FTF, 15% in EW) but students perceived this need for tutors' training differently (53.1% in FTF, 46% in EW) CONCLUSION: Whilst both methods of OSCE feedback were highly valued, students preferred to receive EW feedback and felt it was more beneficial. Learning cultures of Malaysian students may have influenced this view. Information provided in EW feedback should be tailored accordingly to provide meaningful feedback in OSCE exams.

    Matched MeSH terms: Educational Measurement*
  17. Barman A
    Ann Acad Med Singap, 2005 Sep;34(8):478-82.
    PMID: 16205824
    INTRODUCTION: The main aim of medical education is to foster the development of clinical competence in students at all levels. Differences in experiences, methods of instruction and ambiguous forms of assessment are obstacles to attaining this goal. Dissatisfaction with the conventional methods of clinical assessment on the part of teachers and students led assessors to search for appropriate alternatives and in 1975, Harden and his colleagues introduced the objective structured clinical examination (OSCE). It is nearly impossible to have a test that satisfies all the criteria of a good test. Sometimes, a compromise has to be made between the available resources (in terms of man, money and time), and the method and quality of assessment (in terms of reliability, validity, objectivity and practicability).

    METHODS: This critique on the OSCE is based on the published findings of researchers from its inception in 1975 to 2004.

    RESULTS: The reliability, validity, objectivity and practicability or feasibility of this examination are based on the number of stations, construction of stations, method of scoring (checklists and/ or global scoring) and number of students assessed. For a comprehensive assessment of clinical competence, other methods should be used in conjunction with the OSCE.

    CONCLUSION: The OSCE can be a reasonably reliable, valid and objective method of assessment, but its main drawback is that it is resource-intensive.

    Matched MeSH terms: Educational Measurement*
  18. Ramanathan A, Zaini ZM, Ghani WMN, Wong GR, Zainuddin NI, Yang YH, et al.
    Oral Dis, 2024 Nov;30(8):5483-5489.
    PMID: 38488212 DOI: 10.1111/odi.14927
    OBJECTIVE: This study evaluated the effectiveness of face-to-face (F2F) and online OralDETECT training programme in enhancing early detection skills for oral cancer.

    METHODS: A total of 328 final-year dental students were trained across six cohorts. Three cohorts (175 students) received F2F training from the academic years 2016/2017 to 2018/2019, and the remaining three (153 students) underwent online training during the Covid-19 pandemic from 2019/2020 to 2021/2022. Participant scores were analysed using the Wilcoxon signed rank test, the Mann-Whitney test, Cohen's d effect size, and multiple linear regression.

    RESULTS: Both F2F and online training showed increases in mean scores from pre-test to post-test 3: from 67.66 ± 11.81 to 92.06 ± 5.27 and 75.89 ± 11.03 to 90.95 ± 5.22, respectively. Comparison between F2F and online methods revealed significant differences in mean scores with large effect sizes at the pre-test stage (p 

    Matched MeSH terms: Educational Measurement/methods
  19. Yusoff MS
    Med Educ, 2012 Nov;46(11):1122.
    PMID: 23078712 DOI: 10.1111/medu.12057
    Matched MeSH terms: Educational Measurement/methods*; Educational Measurement/standards
Filters
Contact Us

Please provide feedback to Administrator (afdal@afpm.org.my)

External Links