ISSN 0439-755X
CN 11-1911/B

Acta Psychologica Sinica ›› 2016, Vol. 48 ›› Issue (8): 1047-1056.

### Warm’sweighted maximum likelihood estimation of latent trait in the four-parameter logistic model

MENG Xiangbin1,2; TAO Jian2,3; CHEN Shali2

1. (1 Faculty of Education, Northeast Normal University, Changchun 130024, China) (2 KLAS, School of Mathematics and Statistics, Northeast Normal University, Changchun 130024, China) (3 Northeast Normal University Branch, Collaborative Innovation Center of Assessment toward Basic Education Quality, Changchun 130024, China)
• Received:2015-10-31 Published:2016-08-25 Online:2016-08-25
• Contact: TAO Jian, E-mail: taoj@nenu.edu.cn

Abstract:

There are two types of aberrant responses, the correct responses resulting from lucky guesses, and the false responses resulting fromcarelessness. Because the two aberrant responses do not reflect the examinee’s actual knowledge, they may cause an erroneous estimation of the latent trait of examinee.Compared with guesses, careless errors might cause more serious estimation biases, especially if these errors occur at the beginning of a test. To account for the effect of careless errors, Barton and Lord (1981) developed a four-parameter logistic (4PL) model by adding an upper asymptote parameter in the three-parameter logistic (3PL) model. Recently, the 4PLmodel received more attentions, and some literatures highlighted its potential and usefulness both from a methodological point of view and for practical purposes. It can be expected that the 4PL model will be promoted as a competing item response model in psychological and educational measurement. This paper focuses on one important aspect of the 4PL model, that is, the estimation of latent trait levels. In general, unbiased parameter estimation is desirable. Reducing bias in the latent trait estimator is very important for the application of IRT model. Warm (1989) proposed a weighted maximum likelihood (WML) method for estimating the latent trait parameter in the 3PL model, which was found to be less bias than the maximum likelihood (ML) and expected a posteriori (EAP) estimates. The WML estimate has also been extended to the generalized partial credit model (GPCM). In light of the superior performance of the WML method in previous studies, this studyapplies a WML latent trait estimator to the 4PL model. The main works of this article are to present the derivations of the WML estimator under the 4PL model, and to construct a simulation study to compare the properties of the WML estimator to that of the ML and EAP estimators. The results of the simulation study suggested that, the bias of the WML estimator was consistently smaller than that of the ML and EAP estimators, particularly, the accuracy of the WML estimator was superior to that of the ML estimator and nearly equivalent to the EAPE. The difference in bias (and accuracy)of the three estimators was substantial when the latent trait is far away from the location of test, but was negligible when the latent trait matches the location of test. Furthermore, both the test length and the item discriminationhad a greater impacton the performanceof the ML and EAP estimatorsthan that of the WML estimator. In the relatively short tests of low discriminating items, the EAP estimator displayed grossly inflated levels of bias, the ML estimator displayed the largest decrease in accuracy, but theWML estimator performed more robustly. In general, the WML estimator maintains better properties than both the ML and EAP estimators, especially under conditions thatthe test information function was relatively small. Such conditions include, but are not limited to:(a) the mismatch between the latent trait and the location of test; (b) the shortness of the tests (e.g., n ≤12); and (c) the low-discrimination ofitems. In our paper, the findings are not extended to the framework of computer adaptive testing (CAT), asthe simulation was conducted under the linear testing. As a result, our research may be of greatvalue to test developers concerned with constructing fixed and non-adaptive tests.