ISSN 0439-755X
CN 11-1911/B

中国科学院心理研究所

• 论文 •

### 运用基因表达式编程的自陈量表数据建模

1. (南京师范大学心理学院, 南京 210097)
• 收稿日期:2012-11-19 出版日期:2013-06-25 发布日期:2013-06-25
• 通讯作者: 余嘉元
• 基金资助:

国家社会科学基金教育学课题(BBA080050)、国家自然科学基金项目(71071065、71131004)、江苏省一级重点学科“心理学”资助成果。

### Modeling Self-reported Instrument Data with Gene Expression Programming

QIAN Jinxin;YU Jiayuan

1. (School of Psychology, Nanjing Normal University, Nanjing 210097, China)
• Received:2012-11-19 Online:2013-06-25 Published:2013-06-25
• Contact: YU Jiayuan

Abstract: It is often difficult to represent the complex relations among psychological variables with traditional analytical models like regressions. Supposedly, neural networks and support vector regression machine can be used instead. However, the limitation is that these models are recessive. Gene expression programming (GEP) can be used to handle these models with observable variables. At present, most of the data using GEP models are obtained with objective methods. But a lot of the psychological measurement data are obtained from self-report instruments and are affected by many subjective factors. Could these kinds of data be used in GEP models? How large is the modeling error? Is there any advantage in using the GEP modeling as compared with the multivariate linear regression or the polynomial regression modeling? Is the GEP modeling more accurate than neural networks and support vector regression machine modeling? All the above issues would be explored in this paper. The responses of 400 middle school students were obtained with the Williams creativity assessment packet and the need for cognition scale. A total of 17 students were deleted because of the abnormality in responses and the data from 383 students were retained for modeling. Common method biases had not been found with the Harman’s single-factor test. Five parameters of gene expression programming were optimized with the uniform design. These parameters were head length, gene number, fitness function, chromosome number and mutation probability. There were nine levels for each parameter, each established under different testing conditions respectively. The condition with maximum fitness was obtained through experiments. The GEP program was repeated 10 times under this condition. The accuracy of the models was calculated and the model with the minimum error was found, of which the expression tree was drawn. The models of the relations between need for cognition and creativity personality traits were established using BP neural networks, support vector regression machine, multivariate linear regression and polynomial regression respectively. These models were compared with the model using gene expression programming. The results showed that: (a) the accuracy of model 10, with four independent variables, was the highest; (b) the expressions of these ten models were different but their predictive errors were very close, thus supporting the robustness of the GEP modeling method; and (c) the predictive errors of different models were: GEP, 1.28; BP networks 2.76; support vector regression machine 2.31; polynomial regression 3.21; multivariate linear regression 3.86 respectively. It can be concluded that: (a) data from self-reported instruments can still be modeled with gene expression programming even though these data are affected by many subjective factors; (b) the GEP modeling is more accurate than the other intelligent computing methods (neural networks, support vector regression machine, etc.) and traditional statistical methods (multivariate linear regression, polynomial regression, etc.), and (c) the models established with GEP are robust; their predictive accuracy is similar even though their mathematical formulae are quite different.