ISSN 0439-755X
CN 11-1911/B

心理学报 ›› 2021, Vol. 53 ›› Issue (2): 155-169.doi: 10.3724/SP.J.1041.2021.00155

• 研究报告 • 上一篇    下一篇


骆方1, 姜力铭1, 田雪涛2, 肖梦格1, 马彦珍3, 张生3()   

  1. 1北京师范大学心理学部, 北京 100875
    2北京交通大学计算机与信息技术学院, 北京 100044
    3中国基础教育质量监测协同创新中心, 北京 100875
  • 收稿日期:2020-04-22 出版日期:2021-02-25 发布日期:2020-12-29
  • 通讯作者: 张生
  • 基金资助:

Shyness prediction and language style model construction of elementary school students

LUO Fang1, JIANG Liming1, TIAN Xuetao2, XIAO Mengge1, MA Yanzhen3, ZHANG Sheng3()   

  1. 1School of Psychology, Beijing Normal University, Beijing 100875, China
    2School of Computer and Information Technology, Beijing Jiaotong University, Beijing 100044, China
    3Collaborative Innovation Center of Assessment toward Basic Education Quality, Beijing Normal University, Beijing 100875, China
  • Received:2020-04-22 Online:2021-02-25 Published:2020-12-29
  • Contact: ZHANG Sheng


收集在线教学平台上1306名小学生的作文、日记及评论, 采用自然语言处理技术进行文本分析, 并应用机器学习模型实现对羞怯特质的自动预测, 构建小学生羞怯行为、认知和情绪的语言风格模型。研究发现:(1)扩充的心理词典适合分析小学生文本; (2)分别存在羞怯行为、认知和情绪问题的学生其日常用语既有共性也有特性, 且与普通学生存在差异; (3)羞怯各维度在不同分类器上达到较好的预测效果, 其中随机森林模型的整体表现相对最好。

关键词: 羞怯, 在线写作, 心理词典, 文本挖掘, 语言风格模型


The present study aimed to explore a new method of measuring shyness based on 1306 elementary school students’ online writing texts. A supervised learning method was used to map students' labels (tagged by their results of scale) with their text features (extracted from online writing texts based on a psychological dictionary) to build a machine learning model. Key feature sets for different dimensions of shyness were built and a machine learning model was constructed based on the selected feature to achieve automatic prediction.

The labels were obtained through “National School Children Shyness Scale” completed online by elementary students. The scale includes three dimensions of shyness: shy behavior, shy cognition and shy emotion. Students with Z-scores of each dimension over 1 were labeled as shy and others were labeled as normal. Students’ online writing texts were collected from "TeachGrid" (, an online learning platform wherein students writing texts.

The dictionary applied in the present study was Textmind, a widely used Chinese psychological dictionary developed based on Linguistic Inquiry and Word Count (LIWC). The dictionary was compiled mainly based on the corpus of adults. To ensure the validity of extracted features, we modified the original dictionary by expanding the categories and vocabulary with the real writing text of elementary students. The revised dictionary contained 118 categories.

Features were extracted based on the revised dictionary. Chi-square algorithm was applied to identify the features that can distinguish between shy and normal groups to the greatest extent. Three sets of key features confirmed a significant lexical difference between shy and normal individuals. Among the selected features, some were shared by multiple dimensions reflecting the universal textual expression of shy individuals (e.g., The average number of words per sentence and the frequency of social words of shy individuals were less than that of normal counterparts.), and there were certain features reflected the unique characteristics of certain dimension (Perception words predicted shy behavior reflecting that high shy behavior individuals frequently felt being watched).

Based on the selected features, Python 3.6.2 was used to construct the six prediction modes: Decision Tree, Random Forest, Support Vector Machine, Logistic Stitch Regression, K-Nearest Neighbor and Multilayer Perceptron. Overall, random forests have achieved the best results in the present study. The F1 score was 0.582, 0.552 and 0.545 for behavior cognition and emotion showing the feasibility of automatically predicting shyness characteristics of elementary school students based on textual language. The implication of word embedding, and deep learning models would improve the final prediction.

Key words: shyness, online writing, psychological dictionary, text mining, language style model