Background: Summative assessment in postgraduate examination globally employs multiple measures. A standard-setting method decides on pass or fail based on an arbitrarily defined cut-off point on a test score, which is often content expert’s subjective judgment. Contrary to this a standard-setting strategy primarily practices two approaches, a compensatory approach, which decides on overall performance as a sum of all the test scores and a conjunctive approach that requires passing performance for each instrument. However, the challenge using multiple measures is not due to number of measurement tools but due to logic by which the measures are combined to draw inferences on pass or fail in summative assessment. Conjoint University Board of Examination of Masters’ of Otolaryngology and Head-Neck Surgery (ORL-HNS) in Malaysia also uses multiple measures to reach a passing or failing decision in summative assessment. However, the standard setting strategy of assessment is loosely and variably applied to make ultimate decision on pass or fail. To collect the evidences, the summative assessment program of Masters’ of ORL-HNS in School of Medical Sciences at Universiti Sains Malaysia was analyzed for validity to evaluate the appropriateness of decisions in postgraduate medical education in Malaysia. Methodology: A retrospective study was undertaken to evaluate the validity of the conjoint summative assessment results of part II examination of USM candidates during May 2000-May 2011. The Pearson correlation and multiple linear regression tests were used to determine the discriminant and convergent validity of assessment tools. Pearson’s correlation coefficient analyzed the association between assessment tools and the multiple linear regression compared the dominant roles of factor variables in predicting outcomes. Based on outcome of the study, reforms for standard-setting strategy are also recommended towards programming the assessment in a surgical-based discipline. Results: The correlation coefficients of MCQ and essay questions were found not significant (0.16). Long and short cases were shown to have good correlations (0.53). Oral test stood as a component to show fair correlation with written (0.39-0.42) as well as clinical component (0.50-0.66). The predictive values in written tests suggested MCQ predicted by oral (B=0.34, P