To evaluate the effectiveness of binary content checklists in measuring increasing levels of clinical competence.
Fourteen clinical clerks, 14 family practice residents, and 14 family physicians participated in two 15-minute standardized patient interviews. An examiner rated each participant's performance using a binary content checklist and a global process rating. The participants provided a diagnosis two minutes into and at the end of the interview.
On global scales, the experienced clinicians scored significantly better than did the residents and clerks, but on checklists, the experienced clinicians scored significantly worse than did the residents and clerks. Diagnostic accuracy increased for all groups between the two-minute and 15-minute marks without significant differences between the groups.
These findings are consistent with the hypothesis that binary checklists may not be valid measures of increasing clinical competence.
To determine who is the better rater of history taking in an objective structured clinical examination (OSCE): a physician or a standardized patient (SP).
During the 1991 pilot administration of an OSCE for the Medical Council of Canada's qualifying examination, five history-taking stations were videotaped. Candidates at these stations were scored by three raters: a physician (MD), an SP observer (SPO), and an SP rating from recall (SPR). To determine the validity of each rater's scores, these scores were compared with a "gold standard", which was the average of videotape ratings by three physicians, each scoring independently. Analysis included both correlations with the standard and a repeated-measures analysis of variance (ANOVA) comparing raters' mean scores on each station with mean scores of the gold standard.
Ninety-one videotapes were scored by the "gold-standard" physicians. Correlations with the standard showed no clear preference for MD, SPO, or SPR raters. ANOVAs revealed significant differences from the standard on three stations for the SPR, two stations for the SPO, and one stations for the MD.
An MD rater is less likely to differ from a standard established by a consensus of MD ratings than are SP raters rating from recall. If an MD cannot be used, an SP observer is preferable to an SP rating from recall.
Two complimentary examinations designed to comprehensively assess competence for surgical practice have been developed. The Objective Structured Assessment of Technical Skill (OSATS) evaluates a resident's operative skill, and the Patient Assessment and Management Examination (PAME) evaluates clinical management skills.
Twenty-four postgraduate year (PGY)-4 and PGY-5 general surgery residents from four training programs were examined. Each examination had eight stations, with a total of 6 hours of testing time.
Interstation reliability for the OSATS was 0.64, for the PAME was 0.71, and for the total test was 0. 74. Examination scores discriminated between PGY-4 and PGY-5 residents for the OSATS (t = 4.39, P
The management of multiply injured trauma patients is a skill requiring broad knowledge, sound judgment, and leadership capabilities. The purpose of this study was to evaluate the effectiveness of a computer-based trauma simulator as a teaching tool for senior medical students.
All year-4 clinical clerks at the University of Toronto were approached to participate in a focused, 2-hour trauma management course. The volunteer rate for the course was 79%. Students were randomized to either computer-based simulator or seminar-based teaching groups. Outcome measures in this study were students' trauma objective structured clinical examination (OSCE) scores.
Both the trauma simulator and seminar teaching groups performed significantly better than the comparison group (no additional teaching) on the trauma OSCE patient encounter component, but not the written component of the examination. There was no significant difference in the performances of the trauma simulator and seminar teaching groups. Students overwhelmingly felt the trauma simulator was effective for their trauma teaching, and improved their overall confidence in clinical trauma scenarios.
There is a significant benefit associated with a focused, clinically based trauma management course for senior medical students. No additional improvement was noted with the use of a high fidelity computer-based trauma simulator.
The purposes of this study were to develop and assess a rating form for selection of surgical residents, determine the criteria most important in selection, determine the reliability of the assessment form and process both within and across sites, and document differences in procedure and structure of resident selection processes across Canada.
Twelve of 13 English-speaking orthopedic surgery training programs in Canada participated during the 1999 selection year. The critical incident technique was utilized to determine the criteria most important in selection. From these criteria a 10-item rating form was developed with each item on a 5-point scale. Sixty-six candidates were invited for interviews across the country. Each interviewer completed one assessment form for each candidate, and independently ranked all candidates at the conclusion of all interviews. Consensus final rank orders were then created for each residency program. Across all programs, pairwise program-by-program correlations for each assessment parameter were made.
The internal consistency of assessment form ratings for each interviewer was moderately high (mean Cronbach's alpha = 0.71). A correlation between each item and the final rank order for each program revealed that the items work ethic, interpersonal qualities, orthopedic experience, and enthusiasm correlated most highly with final candidate rank orders (r = 0.5, 0.48, 0.48, 0.45, respectively). The interrater reliabilities (within panels) and interpanel reliabilities (within programs) for the rank orders were 0.67 and 0.63, respectively. Using the Spearman-Brown prophecy formula, it was found that two panels with two interviewers on each panel are required to obtain a stable measure of a given candidate (reliabilities of 0.80). The average pairwise program-by-program correlations were low for the final candidate rank orders (0.14).
A method was introduced to develop a standard, reliable candidate assessment form to evaluate residency selection procedures. The assessment form ratings were found to be consistent within interviewers. Candidate assessments within programs (both between interviewers and between panels) were moderately reliable suggesting agreement within programs regarding the relative quality of candidates, but there was very little agreement across programs.
To examine the validity of a psychiatry clerkship's objective structured clinical examination (OSCE).
In 1996, 33 clinical clerks and 17 psychiatry residents at the University of Toronto participated in an eight-station OSCE evaluated by psychiatrist-examiners using binary checklists and global ratings. Prior to the OSCE, communication course instructors were asked to rank the clerks on interviewing ability, and faculty supervisors were asked to identify the OSCE stations on which the clerks were likely to do well or poorly.
Mean OSCE scores were significantly higher for the residents than for the clerks on global ratings but not on checklists. The communication instructors accurately predicted the clerks' rankings on the global scores but not their scores on the checklists. The faculty supervisors predicted with moderate accuracy the clerks' success on the OSCE stations as measured by the checklists but not by the global ratings. The residents rated the OSCE scenarios as highly realistic.
The evidence of construct and concurrent validity together with high ratings of realism suggest that a psychiatry OSCE can be a valid assessment of clerks' clinical competence.
This study examined whether an operative product and time to completion could serve as measures of technical skill.
Nine final-year (PGY5) and 11 penultimate-year (PGY4) general surgery residents participated in a 6-station bench model examination. Time to completion was recorded. Twelve faculty surgeons (2 per station) evaluated the quality of the final product using a 5-point scale.
The mean interrater reliability was 0. 59 for product quality. Interstation reliability was 0.59 for analysis of the final product and 0.72 for time to completion. There was 63% and 78% agreement between attendings' ratings and product quality and time scores respectively. PGY5s' mean product quality score was 4.14 +/- 0.26, compared with 3.82 +/- 0.33 for PGY4s (P