  1. 香港中文大学教育学院,香港中文大学教育学院,香港中文大学教育学院,香港中文大学教育学院,云南师范大学 香港 ,香港 ,香港 ,香港 ,昆明650092
摘要: 该研究采用心理测量中的概化(generalizablity theory)理论,通过两个研究;分析国家语言文字工作委员会的普通话测试中采用录音评分的可行性,并探讨了其信度、经济效率及心理测量等特性。研究共有25名被试及8名评分员。结果表明录音评分和现场评分测试的结果是一致的,最少能区分90%的能力差异。此外,研究亦指出现行测试的评分者人数及题数已经算足够,但仍可依考生能力恃性等,作一些调整以提高测试效率。

关键词: 概化理论, 信度, 普通化测试

Abstract: This article contains two generaizability studies of the State Putonghua test In the first study, we examined the consistency between the live and tape, recorded assessment of Putonghua. Twenty five examinees participated in the first study. There were eight raters divided into four panels of two each. Five examinees were assigned into one of the four panels. The live assessment in the four panels took Place simultaneously. During the live assessment examinees were tape recorded. Each examinee's tape was later assessed by all eight raters. Standard assessment instrument (prepared by the State Language Commission) was used. For the purpose of this study, all examinees received the same items. The items were rated on a three-point scale where 0 = no credit 1 = partial credit and 2 = full credit. The objects of measurement were examinees (e) who were nested in panels (P). The raters (r) who were also nested in panels were crossed with examinees. Items (i) and mode (m) of assessment (i.e., recorded versus live assessment) were crossed with the rest of the conditions. The G study design was (exr):pxixm. Except for the mode facet which was considered fixed, all other facets and the objects of measurement were assumed random. This special G study focused on determining the consistency between the live and recorded assessment. The results indicated a relatively high degree of consistency. The signal-to-noise ratio reached 0.80, meaning that 80% of the absolute domain status differences in Putonghua were exchangeable between these two modes of assessment. The purpose of the second study was to determine an efficient tape recording assessment procedure to be adopted in the future. The objective was to employ an efficient number of raters and items in measuring Putonghua which will maximize reliablity and minimize costs. The second study adopted a fully crossed design so that unique variance could be estimated. Tapes of 25 examinees were each rated by the same six raters on 50 single-word items. The G study design was exrxi. Among the seven variance components, the largest was associared with the item facet, indicating the importance of samping more items. Using two raters and 100 items will achieve a satisfastory reliability of 0.90 and 0.84 for norm and domain referenced use of the test.

