The objective of computerized adaptive testing (CAT) is to construct an optimal test for each examinee. Item Selection Strategy (ISS) is an important part of CAT research, whose quality is directly related to the reliability, efficiency, and security of the test.
Many researches and applications of CAT are based on a dichotomously scored model. It is highly evident that more information can be obtained from examinees using a polytomously scored model rather than a dichotomous model. Moreover, it is necessary for us to further explore CAT research based on a polytomously scored model.
Both the Generalized Partial Credit Model (GPCM) and the Graded Response Model (GRM) are within the range of a polytomously scored model. However, they differ from each other. In the GRM, the item grade difficulties ascend monotonously as the grades increase; while the GPCM shows the performing process of the item, which is separated into some line-steps to put forwards. In the GPCM, each item contains several step parameters, and there are no specific rules governing them. The posterior step cannot advance when the earlier step has not been completed, and the posterior’s step parameter may be lower than that of the previous one. Considerable research is already being conducted on CAT using the GRM; however, in our country, there are few reports pertaining to research on CAT using the GPCM.
This study investigated the four types of ISS in comparison with CAT in various circumstances, using the GPCM through computer simulated programs. They are implemented in four item pools, and each item pool has a capacity of 3000 items. Each item has five step parameters; further, the discrimination parameter and step parameters are distributed as follows: {(b~N(0,1), (lna~N(0,1)), (b~N(0,1)), (a~U(0.2,2.5)), (b~U(-3,3)), (lna~N(0,1)), (b~U(-3,3)), and (a~U(0.2,2.5)). Item parameters are generated based on the Monte Carlo simulation method. Responses to the items are generated according to the GPCM for a sample of 1000 simulatees ( ) whose trait level was also generated using the Monte Carlo simulation method in some types of ISS. During the course of responses, the simulatees’ ability is estimated based on the response obtained. In addition, after the four item pools are sorted by the discrimination parameter to complete the a-stratified design, the abovementioned process is performed repeatedly. Thirty-two simulated CATs are administered with the output evaluated with regard to the following measurements: precision, ISS steady, item used even, average use of item per person, χ2, efficiency, and item overlap.
The data in tables 1 and 2 include both the index values used for evaluation (which were obtained from the CAT process using four types of ISS when the item pool did not adopt the stratified design and instead adopted the a-stratified design) and values that are calculated after summing the weight of every index value. We can draw the following conclusions from the data in the tables: all the ability estimates are highly accurate and have fewer differences. Moreover, we compare the value by summing every means weight, we learn that the item step parameter distribution greatly influences the choices of ISS.
On the condition that the examinee’s trait level follows normal distribution, the application results of the ISS and the item step parameter distribution share a very close relationship. (1) If the item’s step parameters follow a normal distribution, the efficiency of the ISS for a random step parameter matching the trait level is much better than that for others. (2) If the item’s step parameters follow a uniform distribution, the efficiency of the item selection strategy for the item’s average step parameter matching the trait level is much better than that for others