     The cognitive processing of contrastive focus and its relationship with pitch accent
    LI Weijun, ZHANG Jingjing, YANG Yufang
    Acta Psychologica Sinica. 2017, 49 (9): 1137-1149.   DOI: 10.3724/SP.J.1041.2017.01137
    Abstract   PDF (1428KB) ( 4124 )
     Information structure (IS) is a very important pragmatic concept in linguistics. It has been broadly studied in linguistics, psychology, neuroscience, etc. IS can be generally distinguished as focus/new information and background/given information. It is proper for focused/new information to receive accent. Recently, researchers have shown increasing interest in the neural mechanism of focus processing and its relationship with pitch accent. It was generally found that focus elicited a widely distributed positivity compared to background (non-focused) information in both visual and auditory domain, although these positivities varied in time course, amplitude and scalp distribution. As for its relationship with pitch accent, the results are complicated due to the variability in task (prosodic, semantic), language (German, Dutch, and Chinese, etc.), focus-marking device (context-question, pitch accent, it cleft structure, etc.), as well as information status (being new or given information). The present study aims to investigate the processing of contrastive focus and its interaction with pitch accent at different positions using ERPs. We used a highly constraining question as context, which posited two single nouns (NP1 and NP2) at different positions (in the medial and end of clause) in the answer sentence as contrastive focus (new information, narrow focus). Twenty (nine males) healthy undergraduates participated in the experiment. The participants were told to listen carefully to each dialogue, and completed a sentence comprehension task. The EEG was recorded from 64 scalp channels using electrodes mounted in an elastic cap. Focus and accent related ERPs were calculated for a 1500 ms epoch including a 200 ms pre-critical words baseline. It was found that focus evoked a larger positivity compared to non-focus at both positions. This was convinced by the statistical analysis result at both NP1 during 650-1300 ms, F(1, 19) = 8.29, p < 0.05, η2p = 0.29, and NP2 during 550-1050 ms, F(1, 19) = 14.45, p < 0.001, η2p = 0.38. Besides, accented words elicited a larger positivity than unaccented ones at both of NP1 (950-1150 ms), F(1, 19) = 7.39, p < 0.05, η2p = 0.22, and NP2(1050-1400ms), F(1, 19) = 8.04, p < 0.05, η2p = 0.30. Furthermore, missing accent on focus did not elicit any observable brain effect compared to accented focus at both positions in the lateral area, F(1, 19) < 1, ps > 0.05. At the end of the clause, however, accent on background information elicited a larger negativity (200-350 ms) compared to consistently unaccented background, F(1, 19) = 10.84, p < 0.01, η2p = 0.38, while there was no significant difference between accented and unaccented focus, F(1, 19) < 1, p > 0.05. Overall, the positive effect elicited by focus at both positions may reflect that listeners consume more cognitive resource to integrate focus to discourse compared to non-focus. Besides, accented words elicited a larger positivity than unaccented ones at both positions, indicating that prosodic prominence attracted more attention than unaccented information. Finally, accent on non-focus evoked a larger negativity compared to unaccented non-focus at the end of the clause. This result may reflect that listeners were sensitive to the information structure induced by pitch accent and the processing were influenced by the position of focus. In sum, the current results suggest that listeners make on-line use of both focus and pitch accent in various ways at different positions to build coherent representations of dialogues.
     Automatic emotional access in emotional stroop of different proficient type of bilinguals
    JIAO Jiangli, LIU Yi, WEN Suxia
    Acta Psychologica Sinica. 2017, 49 (9): 1150-1157.   DOI: 10.3724/SP.J.1041.2017.01150
    Abstract   PDF (342KB) ( 5388 )
     In the domain of second language acquisition, one of the key questions relates to the representation of emotions in different languages of the bilingual’s. Although it has been proposed that the first language (L1) contains more richly interconnected semantic associations than the second language (L2), the emotional representation difference between L1 and L2 has been debated. Because the proficiency of L2 may influence the bilingual’s emotion automatic access, we thus investigated the emotional access in different proficiency type of bilinguals in this regard. According to their proficiency levels of L2, three types of bilinguals were selected in our study. That is, comprehensive proficient bilinguals, reading-proficient bilinguals, and listening and spoken-proficient bilinguals. Comprehensive proficient bilinguals are proficient in L2 input and output. Reading-proficient bilinguals are able to read and write in L2 without being able to listen and speak it, whereas listening and spoken-proficient bilinguals are the opposite. The present study investigated the modulation effect of bilingual’s types on the emotional access in L1 and L2 through the Emotional Stroop paradigm. The experiment was a 3-factor mixed design with 2 (Languages: L1 vs. L2) × 3 (Type of bilinguals: Comprehensive proficient bilinguals vs. Reading-proficient bilinguals vs. Listening and Spoken-proficient bilinguals) × 3 (Emotional valence: Positive vs. Negative vs. Neutral). The stimulus were delivered using E-prime software, which also automatically recorded reaction times and error rates. The results of reaction times showed that: (1) For Comprehensive proficient bilinguals: under the condition of L1, there was significant difference between positive and neutral words (F(1, 19) = 4.75, p < 0.05, ηp2 = 0.81), also negative and neutral words (F(1, 19) = 4.80, p < 0.05, ηp2 = 0.81). Under the condition of L2, there was significant difference between positive and neutral words (F(1, 19) = 6.98, p < 0.001, ηp2 = 0.69), also negative and neutral words (F(1, 19) = 6.65, p < 0.05, ηp2 = 0.68). That is, comprehensive proficient bilinguals showed Emotional Stroop effect under the condition of L1 and L2. (2) For Reading-proficient bilinguals: Under the condition of L1, there was significant difference between positive and neutral words (F(1, 19) = 5.96, p < 0.001, ηp2 = 0.73), also negative and neutral words (F(19) = 6.60, p < 0.001, ηp2 = 0.74). Under the condition of L2, there was significant difference between positive and neutral words (F(1, 19) = 5.56, p < 0.05, ηp2 = 0.86), also negative and neutral words (F(1, 19) = 3.86, p < 0.05, ηp2 = 0.85). That is, Reading-proficient bilinguals also showed Emotional Stroop effect under the condition of L1 and L2. (3) For Listening and Spoken-proficient bilinguals: Under the condition of L1, there was significant difference between positive and neutral words (F(1, 19) = 5.33, p < 0.001, ηp2 = 0.86), also negative and neutral words (F(1, 19) = 4.92, p < 0.05, ηp2 = 0.85). Under the condition of L2 there were no significant difference between these three type of words. That is, Listening and spoken-proficient bilinguals showed Emotional Stroop effect only under the condition of L1 but not L2.There were no significant differences in error rates. In summary, the results suggested that Comprehensive proficient bilinguals and Reading-proficient bilinguals had automatic emotional access in L1 and L2, and Listening and Spoken-proficient bilinguals have weaker emotional access in L1 but not in L2.
     The effect of part-list cues on memory retrieval: The role of inhibition ability
    LIU Tuanli, BAI Xuejun
    Acta Psychologica Sinica. 2017, 49 (9): 1158-1171.   DOI: 10.3724/SP.J.1041.2017.01158
    Abstract   PDF (556KB) ( 5135 )
     When people are asked to recall items from a previously studied list and are given a subset of the items on that list as retrieval cues, they often do more poorly at recalling the remaining items on the list than do people asked to recall the items in the absence of such retrieval cues. Such part-list cueing effect has often been attributed to inhibitory executive-control processes that supposedly suppress the non-cue items’ memory representation. According to this account, part-list cueing effect arises as an ‘aftereffect’ of executive-control processes during the presentation of part-list cues. The presence of part-list cues at testing leads to an early covert retrieval of the cue items, and this covert retrieval is assumed to trigger inhibitory processes on the non-cue items, affecting the representation of the non-cues itself and thus lowering their recovery chances. The core functions of executive-control processes include inhibition, working memory, and cognitive flexibility. The aim of current study was to further investigate the relationship between individual’s inhibitory executive-control ability and the part-list cueing effect. In this study, undergraduate students with different cognitive inhibitory ability were asked to finish a part-list recall task, and participants’ age, learning experience, and living background etc. were well balanced. In Experiment 1, a color-word Stroop task was carried out to test participants’ inhibitory ability, which can be reflected by the accuracy difference between the incongruent and congruent conditions of the Stroop task. In Experiment 2, participants’ working memory capacity, which is typically reflected by the OSPAN and T-OSPAN scores, was tested by an operation span task. We found typical part-list cueing effect in both experiments, that participants’ memory performance, discrimination, and response bias for target items were worse in the part-list cue condition than in the non-cue condition. The regression analysis showed a negative relationship (b = -2.525) between the amount of part-list cue effect and participants’ cognitive inhibitory ability, with the increasing Stroop effect, the part-list effect reduced. However, a positive correlation was shown between the amount of part-list cue effect and individual’s working memory capacity, indicated by the OSPAN score and T-OSPAN score. Higher the OSPAN and T-OSPAN score is, larger part-list cue effect was observed. The above results indicated that low-Stroop-effect individuals showing stronger part-list cueing effect than high-Stroop-effect individuals, and high-WMC individuals showing more part-list cueing effect than low-WMC individuals. Our findings are consistent with previous studies looking into individual-differences, suggesting a close link between working memory capacity, cognitive inhibitory ability and inhibitory efficiency. In addition, the current results also support the inhibitory executive-control account of part-list cueing effect.
     The effect of emotional scene and body expression on facial expression recognition
    BAI Lu, MAO Weibin, WANG Rui, Zhang Wenhai
    Acta Psychologica Sinica. 2017, 49 (9): 1172-1183.   DOI: 10.3724/SP.J.1041.2017.01172
    Abstract   PDF (641KB) ( 4223 )
     Traditionally, the recognition of facial expression was studied by using isolated faces, which was affected by the categorical theory of emotion. In fact, in real life, facial expression always occurs in context. The result of facial expression recognition could be ambiguous when observer typically viewed face in isolation without any other cues. Recently many researches investigated how context may influence the facial expression processing. Numerous studies found context effect in recognition of facial expression, namely visual scene, body expression, emotional concept and surrounding other faces all shaped the facial expression processing. Evidence from ERPs and eye movements showed that facial expression was combined with both of emotional visual scene and body expression during the early stage of processing, and to some degree, was automatic. But none of these studies investigated the role they played simultaneously. Moreover, studies of context effect in recognition of facial expression seldom investigated the recognition of visual scene directly by using memory paradigm. Given that emotional concept may provide a top-down constraint in facial expression recognition, in the current study, we increased number of emotion labels for a better understanding of the context effect. Two experiments were conducted in this study, both of them adopted a 2 (facial expression: disgust, fear)×2(emotion congruent between face and context: congruent, incongruent) within-subject design. Stimuli were made up by images of disgust and fear (low level of perceptual similarity) facial expression, visual scene and body expression. All participants were required to label an emotion word for the face which was shown against backgrounds of visual scene (Exp1) or both visual scene and body expression (Exp2). Experiment 1 aimed to study the effect of emotion congruent between facial expression and visual scene on facial expression recognition. On the basis of experiment 1, experiment 2 added congruent or incongruent body expression to investigate whether body expression disturbed the effect of visual scene on facial expression recognition. Results of the two experiments indicate that: (1)The influence of visual scene on facial expression recognition is still significant when increased the number of emotion labels; (2)Participants would rely more on visual scene and remember the scene more often when emotional facial expression is shown against an incongruent visual scene; (3)The effect of visual scene on facial expression recognition can be influenced by body expression, but the emotion of visual scene still play an important role in recognition of negative facial expression
     Advancing the Effort-Reward Imbalance Model: Economic rewards influence on teachers’ mental health
    YANG Ruijuan, YOU Xuqun
    Acta Psychologica Sinica. 2017, 49 (9): 1184-1194.   DOI: 10.3724/SP.J.1041.2017.01184
    Abstract   PDF (593KB) ( 3897 )
     In many countries over the past two decades, workplace stress has increased remarkably. Teaching is an example of a highly stressful occupation due to the diverse requirements of the job: teachers have always been subject to high job-related stress and tend to suffer from stress-related psychosomatic problems at unusually high rates. The purpose of this study is to explore how economic rewards influence on teachers’ mental health. This article uses the interdisciplinary perspective of psychology and economics to test the Effort-Reward Imbalance (ERI) model. This study consists of a cross-temporal meta-analysis that examines the changes of Chinese teachers’ scores on the Symptom Checklist-90 (SCL-90) from 1995 to 2013. Samples of Chinese teachers (N = 48712) from one hundred and thirteen different past studies were included in this study’s data. The means and SDs of the nine different SCL-90 dimensions were calculated for each of the 19 years under examination, and were compared using Excel2010 and SPSS19.0. Annual average teacher salaries were gathered from the China Statistical Yearbook. Results showed that: (1) Teachers’ mental health decreased from 1995 to 2009 and improved from 2009 to 2013. Although though some studies suggested that there were significant differences in mental health between genders, our composite conclusions showed that there was no significant difference between male and female teachers (N = 19919, p = 0.596). Kindergarten teachers and college professors tended to have the best mental health, whereas primary school, middle school, and special education teachers tended to have the worst mental health. Vocational middle school teachers scored between these two groups (N = 32260). (2) Economic rewards play an important role in influencing teachers’ mental health over time. A one-way causal relationship was observed between teachers’ compensation and psychological factors. The result show that interpersonal sensitivity, anxiety, hostility, and psychoticism were all influenced by teachers’ compensation and a lag period for such influences was one year. Obsessive neurosis, paranoid psychosis, and depression were also influenced by teachers’ compensation, with a lag period of three years. The present study represents the first time that economic methodology has been combined with psychological research to test the ERI model. Using Granger Causality to investigate the link between teacher salary and changes in SCL-90 scores, the results clearly indicate that economic rewards influence teachers’ mental health. This study also represents the first recent use of a large sample of members from one profession to test the ERI. Possible contributions to the field of Psychological Economics are discussed in the conclusion.
     he influence of different sex ratios and resource-gaining capability on male’s mating selection
    WANG Yan, HOU Bowen, LI Xinyao, LI Xiaoxu, JIAO Lu
    Acta Psychologica Sinica. 2017, 49 (9): 1195-1205.   DOI: 10.3724/SP.J.1041.2017.01195
    Abstract   PDF (456KB) ( 5801 )
     With two experiments the present study tried to explore the influential factors of male’s mating standards under the priming of different sex ratios. To understand the changing pattern of mating strategies, the life history theory was integrated, which focuses on interpreting organisms’ trade-off in the allocation of limited resources. Research has shown that individual’s life history strategies could be different under different environmental priming. Literature indicated that individual’s mating standard could also be influenced either by clues related to mating preferences such as resources and good looking, or by clues not related to mating preferences like death rate, economic condition of the society. This study generally put the emphasis on influences of the priming of different sex ratios, ability to gain resources and childhood economic conditions on male’s mating preference. It was hypothesized that when primed by different sex ratios, male participants with different childhood economic conditions or ability to gain resources would show different changing pattern of mating standards. In experiment 1 participants were 230 unmarried males, with an average age of 20.02 years old (SD = 1.94). Participants were primed by reading a news article about the condition of high sex ratio (more men than women), or about the condition of low sex ratio (less men than women), or about a new kind of robot (control group). Then, the mating preferences were measured by self-report scales. Furthermore, participants were asked to rate the level of childhood economic condition and the self-appraisal of potential resource-gaining capability. The results revealed that males showed significantly lower mating standards on “good resources” when primed by high sex ratio clues (more men than women), compared with low sex ratio clues (less men than women). The interaction effect on mating standard of good resources between sex ratio and the capability of resource-gaining was significant. With higher level of resource-gaining capability, male participants showed lower mating standard for good resources when primed by high sex ratio than low sex ratio. On the other hand, with lower level of resource-gaining capability, male participants showed a quite similar mating preference for good resources under different priming. No main or interaction effects were found on the influence of sex ratio clues on the mating standards for good appearance and good parent. Finally individual’s childhood economic condition showed no significant effect on male’s mating preferences under different priming. In experiment 2 participants were 82 male undergraduates with a mean age of 19.37 years old (SD = 3.325). All the subjects were divided into two groups randomly. After filling with the demographic data individuals in both of the two groups would read information of 8 pictures with different priming, one was 6 male pictures and 2 female pictures, and the other was 6 female pictures and 2 male pictures. After then they filled a form of 12 mating selection items. The results in experiment 2 were the same as that in experiment 1. Both of the experiments indicated that the sex ratio as an important environmental clue might influence male’s mating standard for good resources, but not the good appearance or good spouse (parents). The capability of resource-gaining, but not the harshness of childhood, played a moderating role between sex ratio and mating standards of good resource.
     Should I sacrifice my profit before his eyes? Partner’s ability and social distance affecting the tendency of reputation-profit game
    WANG Pei, TAN Chenhao, CUI Yichen
    Acta Psychologica Sinica. 2017, 49 (9): 1206-1218.   DOI: 10.3724/SP.J.1041.2017.01206
    Abstract   PDF (614KB) ( 3312 )
     Previous studies have shown that when individuals must make a choice between reputation and profit (reputation-profit game), individuals usually tend to get a reputation from the sacrifice of profit. According to competitive altruism theory, the reason why people cooperate to get a reputation at the cost of profit is to compete against others to get some valuable opportunities in the future with the help of the reputation. Based on this perspective, ability and social distance of the game partner (only receives information about reputation) which decide the upper limit of profit and the belief of whether the partner would afford such a chance would affect the tendency of reputation-profit game. To demonstrate these two factors and reveal the nature of reputation obtaining behavior, in this research, we hypothesized that ability and social distance of the partner would affect the preference between reputation and profit, when faced with a partner whose ability is strong or social distance is close, individual would prefer reputation than profit, and there would be an interaction between ability and social distance. A condition about contribution was set up in Experiment 1. 40 undergraduate students participated in this experiment. They were told that they would attend in an online activity. First, participants took part in a series of dummy prisoner dilemma game and won some money (100 tokens). Second, they were told that they would play a game (trust game) with another student in the future, and the importance of reputation was introduced (all participants were trustee). After that, participants were told that there were some public accounts which need their contribution, and they could make virtual contribution to each account and only one would be chosen as the real. Before the contribution, they were told that the contribution would be seen by a student who would be the trustee. Ability (truster’s principal: high/low) and social distance (schoolmate/students from other schools) were manipulated as independent variables, and contribution was used as dependent variable. A condition about bargain was set up in Experiment 2. 55 undergraduate students participated in this experiment. The background was just like experiment 1. First, participants were told that they will perform as suppliers in an online task whose task was pricing the materials they sold. They were told their profit would be calculated respectively in each zone, and only one would be selected as the final result. Second, participants were told if the price they set was higher than the “real value”, there would be 50% chance of being confiscated half of earnings in this turn. Then they were learned that there would be two times of bargain in each zone, and buyers would chat about each supplier soon after the first bargain. Before the pricing tasks, participants were informed that the number of suppliers was greater than the number of players in each zone; each player had the right to choose a supplier to buy material in the second bargain. Ability (player’s demand: high/low/none) and social distance (schoolmate/ students from other schools) were manipulated as independent variables, and participant’s pricing result was used as dependent variable. The results of Experiment 1 showed that participants donated more money when his donation was seen by a student whose ability was high. This finding demonstrated that the higher the ability is the more reputation individual wants. And the results of Experiment 2 showed that there was an interaction between ability and social distance. Only when partner’s ability of the future mission was low, participants preferred to propose a lower price before the partner whose social distance is close in order to gain reputation from the sacrifice of profit. This finding showed only when ability is low, then individuals would take social distance into account, and they tend to acquire a good reputation before a partner whose social distance is close. All the findings supported our hypothesis that ability and social distance are the core factors which affect the tendency of reputation-profit game. These results verified the tactical of individual’s choice in reputation- profit game.
     The mechanism and effect of leader humility: An interpersonal relationship perspective
    MAO Jianghua, LIAO Jianqiao, HAN Yi, LIU Wenxing
    Acta Psychologica Sinica. 2017, 49 (9): 1219-1233.   DOI: 10.3724/SP.J.1041.2017.01219
    Abstract   PDF (561KB) ( 5880 )
     Humility is a traditional virtue in China, however, the effectiveness of humility has been questioned in modern society. For example, by expressing humility, one’s strength and contribution may not be recognized by others. Although the traditional view requires us to show humility, however, the modern view indicates that if someone expresses humility, he/she may not get deserved reputation. Meanwhile, sometimes followers may regard leaders’ humbleness as weakness because followers need powerful leaders to guide them. The paradox of traditional and modern views of humility raises leaders’ concern about whether and how to display humility in workplace. Based on interpersonal relationship perspective, this research aims to examine the effect of leader humility on employee organizational citizenship behavior, as well as the role of inferred leader humility motives during this process. Instead of focusing on the formal relationship between leader and follower, we proposed that humble leaders would be more attractive to followers and would be easier to form a good interpersonal relationship with followers. Meanwhile, we proposed that when followers inferred different motives of leader-expressed humility, the positive relationship between leader humility and relational closeness would be weakened or strengthened. In order to test our hypothesized model, we conducted a time-lagged leader-member matching questionnaire design. Our sample came from 13 big companies located in Wuhan or Xiangyang, which resulted in 295 leader-followers dyads. To reducing common method bias, leader humility, inferred leader humility motives and demographic variables was measured in time 1. 7 weeks later, we measured relational closeness and LMX from followers and we measured voice behavior and helping behavior from leaders. Because leader humility was embedded in teams, we used hierarchical linear regression to analysis the data. Meanwhile, to test the indirect effect of relational closeness in relationship between leader humility and employee organizational citizenship behavior, bootstrapping method was also adopted. Results showed that: (1) after controlling leader-member exchange, leader humility was positively related to relational closeness. (2) Leader humility could enhance followers’ voice and helping behavior. (3) Relational closeness mediated the relationship between leader humility and followers’ voice and helping behavior. (4) When follower inferred leader humility’s impression management motive was high, the positive relationship between leader humility and relational closeness would be weakened. However, the moderation effect of follower inferred leader humility performance enhancement motives on relationship between leader humility and relational closeness was not significant. The present research makes several contributions to leader humility literature. By examining the positive effect of leader humility on follower OCB, this research proves the effectiveness of leader humility further. Moreover, this study confirmed the mediating role of relational closeness as well as its boundary conditions. As to the practical implications, this research suggested that leaders could show more humility in workplace to trigger followers to do more organizational citizenship behavior. However, at the same time, leaders should realize that if their humility was motivated by impression management, the positive effect of leader humility would be weakened.
     Reporting overall scores and domain scores of bi-factor models
    LIU Yue, LIU Hongyun
    Acta Psychologica Sinica. 2017, 49 (9): 1234-1246.   DOI: 10.3724/SP.J.1041.2017.01234
    Abstract   PDF (799KB) ( 1142 )
     In large-scale assessments, most of the tests have a multidimensional structure. There is an increasing interest in reporting overall scores and domain scores simultaneously. The domain scores complement the overall scores by providing finer grained diagnosis of examinees’ strengths and weaknesses. However, due to the small number of items within each dimension, the lack of sufficiently high reliability is the primary impediment for generating and reporting domain scores. A number of methods have been developed recently to improve the reliability and optimality of the overall scores and domain scores. For overall scores, simply averaging or weighted averaging the scores from different content areas, using maximum information method to compute the weights of composite scores under the MIRT framework were some commonly-used procedures. There were also some subscoring methods in the CTT and IRT framework, such as Kelly’s (1927) regressed score method, the MIRT method, and the higher order IRT method. Nowadays, the bi-factor model became more and more popular in education measurement. Reporting overall scores and domain scores based on it became an important topic. The purpose of this study was to investigate several methods to generate overall scores and domain scores based on the bi-factor model, and to compare them with the MIRT method under different condition. Study 1 was a mixed measure design of simulation conditions (between-factors) and methods (within- factor). There were three between-factors: (1) 3 sample sizes (500,1000,2000); (2) 3 test length (18 items, 36 items, 60 items); and (3) 5 correlations between dimensions (0.0, 0.3, 0.5, 0.7, 0.9). The methods for generating overall scores and domain scores were: (1) original scores from bi-factor model (Bifactor-M1); (2) summed original scores from the bi-factor model (Bifactor-M2); (3) weighted sum original scores from the bi-factor model based on all the items (Bifactor-M3); (4) weighted sum original scores from the bi-factor model based on items of each dimension (Bifactor-M4). The overall scores from Bifactor-M3 and Bifactor-M4 were the same. As many studies found that the MIRT-based methods provided the best estimates of overall and subscores, this method was also conducted and compared with the other methods based on the bi-factor model. Under each condition, 30 replications were generated using SimuMIRT (Yao, 2015). BMIRT (Yao, 2015) was applied to estimate domain ability parameters using an MCMC method, then the overall ability was generated by the maximum information method. Finally, the results were evaluated by four criteria: root mean square error (RMSE), reliability, correlation between the estimated scores and true values, and correlation between the estimated domain scores. Study 2 was a real data example. 4815 responses for science test of National College Entrance Examination were collected. The test contained 66 items covering three subjects: Physics (17 items), Chemistry (30 items), Biology (19 items). Four proposed methods and the MIRT method were applied to estimate overall scores and domain scores. For the real data, the overall ability and domain ability estimates from the MIRT model were used as “true” values to compare the relative performances between different methods. The evaluation criteria were similar to the simulation study. The results of the simulation showed that, for overall scores: (1) the Bifactor-M1 and the Bifactor-M2 had larger RMSE than other methods; when the correlation between dimensions was low, the RMSE of Bifactor-M1 was the largest; as the correlation became larger, the RMSE of Bifactor-M2 became the largest. (2) The Bifactor-M3 and the MIRT method had the smallest RMSE. (3) As the correlation between dimensions increased, the RMSE of the Bifactor-M3 and the MIRT method decreased. (4) When the test length and the correlation between dimensions increased, Bifactor-M3 tended to report more reliable overall scores (reliability higher than 0.8). For domain scores: (1) Bifactor-M1 had the largest RMSE. (2) When test length was short, the RMSE of Bifactor-M2 was smaller than that of the MIRT method; when test length was long, the RMSE of Bifactor-M2 increased as the correlation between dimensions increased, and larger than that of MIRT method when the correlation was 0.9. (3) The RMSE of Bifactor-M3 and Bifactor-M4 decreased as the correlation between dimensions increased. (4) The RMSE of Bifactor-M4 was equal to or smaller than that of MIRT method. (5) When the test length and the correlation between dimensions increased, the Bifactor-M3 and the Bifactor-M4 tended to report more reliable overall scores. Finally, domain scores from the Bifactor-M4 could recover the correlations of true value better than other methods. For the real data example, the results showed that: (1) the bi-factor model fitted the data best as compared to the UIRT and MIRT models; (2) overall scores from the Bifactor-M3 and the domain score from the Bifactor-M4 were similar to those from the MIRT method. In conclusion, overall scores and domain score from the Bifactor-M4 generally performed better than the other proposed methods. First, the scores from Bifactor-M4 had smaller RMSE and higher reliability. Second, the correlation between domain scores form the Bifactor-M4 was similar to the true value. Therefore, it was highly recommended to use this method in practical, especially in the following situations: (1) the test designers have specific definition of the core competencies, then bi-factor model can provide the estimations of core competencies, overall scores, and domain scores simultaneously. (2) When tests have a multidimensional structure and the correlations between dimensions are high, it is suggested to use bi-factor model to calibrate the data. (3) Other than reporting overall scores and domain scores, if the study focuses on the relationship between general construct, domain specific construct, and criterion as well, it is recommended to use the bi-factor model.
