Routine evaluation of mental health: reliable information or worthless "guesstimates'?

Author Affiliation: Department of Clinical Psychiatric Research, Ullevaal University Hospital, Oslo, Norway.

Keywords: Diagnosis, Differential; Humans; Mental Disorders - classification - diagnosis - psychology; Norway; Observer Variation; Patient care team; Personality Inventory - statistics & numerical data; Psychometrics; Reproducibility of Results

Abstract: Routine evaluation of mental health care systems necessitates a quick assessment of progress and outcome. This study was designed to determine the value of the GAF-scale in such applications. We allowed 104 raters from six therapeutic centres to rate five clinical case-vignettes. Interrater reliability was almost equal for raters within different professional categories. The highest and the lowest scores for each of the case-vignettes differed by between 39 and 45 points. The raters' biases ranged from -23 to +30 points, and random deviations were between 1 and 20 points. Systematic differences between centres were up to 6 points. Our main finding is that the reliability of GAF scores in routine settings proved unsatisfactory with entrained raters.