MyMedR

Displaying publications 1 - 20 of 144 in total

Abstract:

Sort:

Moving Towards a Mixed-Method Approach to Educational Assessments

Sim JH

Acad Med, 2017 06;92(6):726.
PMID: 28557910 DOI: 10.1097/ACM.0000000000001680
Matched MeSH terms: Educational Measurement*
Fulltext How important is distractor efficiency for grading Best Answer Questions?

Puthiaparampil T, Rahman M

BMC Med Educ, 2021 Jan 07;21(1):29.
PMID: 33413332 DOI: 10.1186/s12909-020-02463-0

BACKGROUND: Distractor efficiency and the optimum number of functional distractors per item in One Best Answer Questions have been debated. The prevalence of non-functional distractors has led to a reduction in the number of distractors per item with the advantage of adding more items in the test. The existing literature eludes a definite answer to the question of what distractor efficiency best matches excellent psychometric indices. We examined the relationship between distractor efficiency and the psychometric indices of One Best Answer Questions in search of an answer.
METHODS: We analysed 350 items used in 7 professional examinations and determined their distractor efficiency and the number of functional distractors per item. The items were sorted into five groups - excellent, good, fair, remediable and discarded based on their discrimination index. We studied how the distractor efficiency and functional distractors per item correlated with these five groups.
RESULTS: Correlation of distractor efficiency with psychometric indices was significant but far from perfect. The excellent group topped in distractor efficiency in 3 tests, the good group in one test, the remediable group equalled excellent group in one test, and the discarded group topped in 2 tests.
CONCLUSIONS: The distractor efficiency did not correlate in a consistent pattern with the discrimination index. Fifty per cent or higher distractor efficiency, not hundred percent, was found to be the optimum.

Matched MeSH terms: Educational Measurement*
The M.C.G.P. Examination

Balasundaram R

Family Practitioner, 1983;6(1):91-97.

Matched MeSH terms: Educational Measurement*
Fulltext This is never asked in the USMLE-why are you teaching it?

Shankar PR

Can Med Educ J, 2023 Nov;14(5):152-153.
PMID: 38045090 DOI: 10.36834/cmej.77411
Matched MeSH terms: Educational Measurement*
Fulltext The pattern of reporting and presenting validity evidence of extended matching questions (EMQs) in health professions education: a systematic review

Taha MH, Mohammed HEEG, Abdalla ME, Yusoff MSB, Mohd Napiah MK, Wadi MM

Med Educ Online, 2024 Dec 31;29(1):2412392.
PMID: 39445670 DOI: 10.1080/10872981.2024.2412392

The Extended matching Questions (EMQs), or R-type questions, are format of selected-response. The validity evidence for this format is crucial, but there have been reports of misunderstandings about validity. It is unclear what kinds of evidence should be presented and how to present them to support their educational impact. This review explores the pattern and quality of reporting the sources of validity evidence of EMQs in health professions education, encompassing content, response process, internal structure, relationship to other variables, and consequences. A systematic search in the electronic databases including MEDLINE via PubMed, Scopus, Web of Science, CINAHL, and ERIC was conducted to extract studies that utilize EMQs. The framework for a unitary concept of validity was applied to extract data. A total of 218 titles were initially selected, the final number of titles was 19. The most reported pieces of evidence were the reliability coefficient, followed by the relationship to another variable. Additionally, the adopted definition of validity is mostly the old tripartite concept. This study found that reporting and presenting validity evidence appeared to be deficient. The available evidence can hardly provide a strong validity argument that supports the educational impact of EMQs. This review calls for more work on developing a tool to measure the reporting and presenting validity evidence.

Matched MeSH terms: Educational Measurement/methods; Educational Measurement/standards
Fulltext The levels of difficulty and discrimination indices in type a multiple choice questions of pre-clinical semester 1 multidisciplinary summative tests

Mitra, N.K., Nagaraja, H.S., Ponnudurai, G., Judson, J. P.

International e-Journal of Science, Medicine & Education, 2009;3(1):-.
MyJurnal

Item analysis is the process of collecting, summarizing and using information from students’ responses to assess the quality of test items. Difficulty index (P) and Discrimination index (D) are two parameters which help evaluate the standard of MCQ questions used in an examination, with abnormal values indicating poor quality. In this study, 120 test items of 12 Type A MCQ tests of Foundation 1 multi-disciplinary summative assessment from M2 / 2003 to M2 / 2006 cohorts of International Medical University were selected and their P-scores in percent and D-scores were estimated using Microsoft Office Excel. The relationship between the item difficulty index and discrimination index for each test item was determined by Pearson correlation analysis using SPSS 11.5. Mean difficulty index scores of the individual summative tests were in the range of 64% to 89%. One-third of total test items crossed the difficulty index of 80% indicating that those items were easy for the students. Sixty seven percent of the test items showed acceptable (> 0.2) discrimination index. Forty five out of 120 test items showed excellent discrimination index. Discrimination index correlated poorly with difficulty index (r = -0.325). In conclusion, a consistent level of test difficulty and discrimination indices was maintained from 2003 to 2006 in all the twelve summative type A MCQ tests.

Matched MeSH terms: Educational Measurement
Moving from long case to scenario-based clinical examination: Proposals for making it feasible

Puthiaparampil T, Rahman MM, Shazrina AR, Nariman S, Lukas S, Chai CS, et al.

Med J Malaysia, 2022 Nov;77(6):724-729.
PMID: 36448391

INTRODUCTION: Our faculty used one long case (LC) and three short cases for the clinical component of the final professional examinations. During the COVID-19 pandemic, the LC had to be replaced with scenario-based clinical examination (SBCE) due to the impracticability of using recently hospitalised patients. While keeping the short case component as usual, the LC had to be replaced with SBCE in 2020 for the first time at a short notice. To evaluate the positive and negative aspects of SBCE and LC to determine the feasibility of replacing LC with SBCE in future examinations.
MATERIALS AND METHODS: We compared the LC scores of three previous years with those of the SBCE and studied the feedback of the three stakeholders: students, examiners, and simulated patients (SPs), regarding their experience with SBCE and the suitability of SBCE as an alternative for LC in future examinations.
RESULTS: The SBCE scores were higher than those of the LC. Most of the examiners and students were not in favour of SBCE replacing LC, as such. The SPs were more positive about the proposition. The comments of the three stakeholders brought out the plus and minus points of LC and SBCE, which prompted our proposals to make SBCE more practical for future examinations.
CONCLUSION: Having analysed the feedback of the stakeholders, and the positive and negative aspects of LC and SBCE, it was evident that SBCE needed improvements. We have proposed eight modifications to SBCE to make it a viable alternative for LC.

Matched MeSH terms: Educational Measurement*
Fulltext The MRCP (UK) examination in Commonwealth countries

Grant IW

Br Med J, 1978 Jun 10;1(6126):1549.
PMID: 656792
Matched MeSH terms: Educational Measurement*
Fulltext Objective structured clinical examination (OSCE) in psychiatry new curriculum undergraduate posting and its standard setting procedure: an experience in Universiti Kebangsaan Malaysia (UKM)

Wan Salwina Wan Ismail, Ruzanna ZamZam, Marhani Midin, Azlin Baharudin, Hazli Zakaria, Hatta Sidi, et al.

ASEAN Journal of Psychiatry, 2010;11(1):118-122.
MyJurnal

Objective: This study aims to report on the process of standard settings (SS) and to compare the passing rates between the norm-reference and SS methods, for OSCE in psychiatry undergraduate examination at UKM for 2009/2010 session. Methods: In the SS method, examiners were asked to imagine the performance of a minimally competent student and gave marks using a standardized check-list. The marks in particular outliers were discussed. After the first round, the examiners went through the same process again, to rate the minimally competent students independently. The median of the marks was taken as the passing mark for the particular question. The passing rate using the passing mark of 50% in the normreference method was compared to the passing rate from the passing mark obtained from the settings method. Results: For question 1, the passing rate with the norm-reference method (i.e. passing mark of 50%) was 93% (106/114) and that by the SS method was 72.8% (83/114). For question 2, the pass rate with the norm-reference method was 92% (105/114) and that by the SS method was 67.5% (77/114). Conclusion: The passing rates between the two methods showed significant differences. Although OSCE is an improvement to the undergraduate psychiatry examination in Universiti Kebangsaan Malaysia, there were few limitations and challenges that need to be tackled for further improvement.

Matched MeSH terms: Educational Measurement
Fulltext Misconceptions highlighted among medical students in the annual International Intermedical School Physiology Quiz

Cheng HM, Durairajanayagam D

Adv Physiol Educ, 2012 Sep;36(3):229-32.
PMID: 22952263 DOI: 10.1152/advan.00089.2011
Matched MeSH terms: Educational Measurement*
Critiques on the Objective Structured Clinical Examination

Barman A

Ann Acad Med Singap, 2005 Sep;34(8):478-82.
PMID: 16205824

INTRODUCTION: The main aim of medical education is to foster the development of clinical competence in students at all levels. Differences in experiences, methods of instruction and ambiguous forms of assessment are obstacles to attaining this goal. Dissatisfaction with the conventional methods of clinical assessment on the part of teachers and students led assessors to search for appropriate alternatives and in 1975, Harden and his colleagues introduced the objective structured clinical examination (OSCE). It is nearly impossible to have a test that satisfies all the criteria of a good test. Sometimes, a compromise has to be made between the available resources (in terms of man, money and time), and the method and quality of assessment (in terms of reliability, validity, objectivity and practicability).
METHODS: This critique on the OSCE is based on the published findings of researchers from its inception in 1975 to 2004.
RESULTS: The reliability, validity, objectivity and practicability or feasibility of this examination are based on the number of stations, construction of stations, method of scoring (checklists and/ or global scoring) and number of students assessed. For a comprehensive assessment of clinical competence, other methods should be used in conjunction with the OSCE.
CONCLUSION: The OSCE can be a reasonably reliable, valid and objective method of assessment, but its main drawback is that it is resource-intensive.

Matched MeSH terms: Educational Measurement*
A different twist to the 'staff:student ratio': administering medical oral examinations to students in groups

Schwartz PL, Kyaw Tun Sein

Med Educ, 1987 May;21(3):265-8.
PMID: 3600444
Matched MeSH terms: Educational Measurement/methods*
Test-retest reliability of multiple true-false questions in preclinical medical subjects

Schwartz PL, Crooks TJ, Sein KT

Med Educ, 1986 Sep;20(5):399-406.
PMID: 3762442

It has been suggested that the 'ideal' measure of reliability of an examination is obtained by test and retest using the one examination on the same group of students. However, because of practical and theoretical arguments, most reported reliabilities for multiple choice examinations in medicine are actually measures of internal consistency. While attempting to minimize the effects of potential interfering factors, we have undertaken a study of true test-retest reliability of multiple true-false type multiple choice questions in preclinical medical subjects. From three end-of-term examinations, 363 items (106 of 449 from term 1, 150 of 499 from term 2, and 107 of 492 from term 3) were repeated in the final examination (out of 999 total items). Between test and retest, there was little overall decrease in the percentage of items answered correctly and a decrease of only 3.4 in the percentage score after correction for guessing. However, there was an inverse relation between test-retest interval and decrease in performance. Between test and retest, performance decreased significantly on 33 items and increased significantly on 11 items. Test-retest correlation coefficients were 0.70 to 0.78 for items from the separate terms and 0.885 for all items that were retested. Thus, overall, these items had a very high degree of reliability, approximately the 0.9 which has been specified as the requirement for being able to distinguish between individuals.

Matched MeSH terms: Educational Measurement/methods*
Pattern of answer changes to multiple choice questions in physiology

Shahabudin SH

Med Educ, 1983 Sep;17(5):316-8.
PMID: 6621433

The belief that it is unwise to alter the initial response to a multiple choice question is questioned. Among 39 380 MCQ responses, there were 1818 changes (4.62%) of which 21.9% were correct to incorrect responses, 46.3% incorrect to correct responses and 31.8% incorrect to incorrect. This effect was very much more marked among the better students, incorrect to correct changes accounting for 61% of the responses in the upper group, 42% in the middle group and 34% in the lower group.

Matched MeSH terms: Educational Measurement/methods*
Fulltext Review of the phase I and phase II teaching programmes in the School of Medical Sciences, Universiti Sains Malaysia

Roslani AM, Sein KT, Nordin R

Med J Malaysia, 1989 Mar;44(1):75-82.
PMID: 2626116

The Phase I and Phase II undergraduate teaching programmes of the School of Medical Sciences were reviewed at the end of the 1985/86 academic year. It was found that deviations from the School's philosophy had crept into the implementation process. Modifications were therefore made in Phase I and Phase II programmes with a view to:--(i) reducing content, (ii) promoting integration, (iii) improving clinical examination skills of students, and (iv) providing more opportunities to students for self learning, reinforcement and application of knowledge. The number of assessment items in Phase I and the frequency of assessment in Phase II were also found to be inappropriate and so modifications in assessment were made to rectify this situation.

Matched MeSH terms: Educational Measurement/methods*
Editorial: Post-graduate medical education and examinations

Sandosham AA

Med J Malaysia, 1973 Dec;28(2):63-4.
PMID: 4276221
Matched MeSH terms: Educational Measurement*
Fulltext Feedback after OSCE: A comparison of face to face versus an enhanced written feedback

Ngim CF, Fullerton PD, Ratnasingam V, Arasoo VJT, Dominic NA, Niap CPS, et al.

BMC Med Educ, 2021 Mar 24;21(1):180.
PMID: 33761946 DOI: 10.1186/s12909-021-02585-z

BACKGROUND: The Objective Structured Clinical Exam (OSCE) is a useful means of generating meaningful feedback. OSCE feedback may be in various forms (written, face to face and audio or video recordings). Studies on OSCE feedback are uncommon, especially involving Asian medical students.
METHODS: We compared two methods of OSCE feedback delivered to fourth year medical students in Malaysia: (i) Face to face (FTF) immediate feedback (semester one) (ii) Individualised enhanced written (EW) feedback containing detailed scores in each domain, examiners' free text comments and the marking rubric (semester two). Both methods were evaluated by students and staff examiners, and students' responses were compared against their OSCE performance.
RESULTS: Of the 116 students who sat for both formative OSCEs, 82.8% (n=96) and 86.2% (n=100) responded to the first and second survey respectively. Most students were comfortable to receive feedback (91.3% in FTF, 96% in EW) with EW feedback associated with higher comfort levels (p=0.022). Distress affected a small number with no differences between either method (13.5% in FTF, 10% in EW, p=0.316). Most students perceived both types of feedback improved their performance (89.6% in FTF, 95% in EW); this perception was significantly stronger for EW feedback (p=0.008). Students who preferred EW feedback had lower OSCE scores compared to those preferring FTF feedback (mean scores ± SD: 43.8 ± 5.3 in EW, 47.2 ± 6.5 in FTF, p=0.049). Students ranked the "marking rubric" to be the most valuable aspect of the EW feedback. Tutors felt both methods of feedback were equally beneficial. Few examiners felt they needed training (21.4% in FTF, 15% in EW) but students perceived this need for tutors' training differently (53.1% in FTF, 46% in EW) CONCLUSION: Whilst both methods of OSCE feedback were highly valued, students preferred to receive EW feedback and felt it was more beneficial. Learning cultures of Malaysian students may have influenced this view. Information provided in EW feedback should be tailored accordingly to provide meaningful feedback in OSCE exams.

Matched MeSH terms: Educational Measurement*
Discrepancy-agreement grading provides feedback on rater judgements

Yusoff MS

Med Educ, 2012 Nov;46(11):1122.
PMID: 23078712 DOI: 10.1111/medu.12057
Matched MeSH terms: Educational Measurement/methods*; Educational Measurement/standards
Fulltext In preparing for MRCGP[International]

Ariffin F

Br J Gen Pract, 2012 Jun;62(599):316.
PMID: 22687219 DOI: 10.3399/bjgp12X649214
Matched MeSH terms: Educational Measurement*
Standard setting in student assessment: is a defensible method yet to come?

Barman A

Ann Acad Med Singap, 2008 Nov;37(11):957-63.
PMID: 19082204

INTRODUCTION: Setting, maintaining and re-evaluation of assessment standard periodically are important issues in medical education. The cut-off scores are often "pulled from the air" or set to an arbitrary percentage. A large number of methods/procedures used to set standard or cut score are described in literature. There is a high degree of uncertainty in performance standard set by using these methods. Standards set using the existing methods reflect the subjective judgment of the standard setters. This review is not to describe the existing standard setting methods/procedures but to narrate the validity, reliability, feasibility and legal issues relating to standard setting.
MATERIALS AND METHODS: This review is on some of the issues in standard setting based on the published articles of educational assessment researchers.
RESULTS: Standard or cut-off score should be to determine whether the examinee attained the requirement to be certified competent. There is no perfect method to determine cut score on a test and none is agreed upon as the best method. Setting standard is not an exact science. Legitimacy of the standard is supported when performance standard is linked to the requirement of practice. Test-curriculum alignment and content validity are important for most educational test validity arguments.
CONCLUSION: Representative percentage of must-know learning objectives in the curriculum may be the basis of test items and pass/fail marks. Practice analysis may help in identifying the must-know areas of curriculum. Cut score set by this procedure may give the credibility, validity, defensibility and comparability of the standard. Constructing the test items by subject experts and vetted by multi-disciplinary faculty members may ensure the reliability of the test as well as the standard.

Matched MeSH terms: Educational Measurement/methods*

Filters

Please provide feedback to Administrator (afdal@afpm.org.my)

External Links