The distinct effects of reward prediction error on item and associative memory: The influence of metamemory

doi:10.3724/SP.J.1041.2023.00877

Abstract

Abstract:

Episodic memory consists of item memory and associative memory. Individual cognitive resources are typically allocated to more valuable information during encoding through metamemory, leading to competitive processing of item and associative information. Reward prediction error (RPE), defined as the difference between reward results and reward expectations, has two properties: valence (positive or negative) and salience (degree of difference). To examine the impact of reward prediction error valence and salience on item and associative memory, and how reward prediction error influences memory based on metamemory, three experiments were conducted.

In Experiment 1, the effect of RPE on memory performance and metamemory monitoring in retrieval was investigated. The reward results included 1, 4 and 7, and their proportions in high and low value pictures were 2:3:5 and 5:3:2 respectively. RPE is generated by the difference between the reward results and the value prediction given by the subject, including −6, −3, 0, 3 and 6, and its salience is calculated as unsigned RPE (URPE). Number of subjects for analysis is 34, and the process is shown in Figure 1. In the learning stage, participants were presented with indoor and outdoor scene pictures. They were asked to predict the score of each picture and then received feedback on the actual score. Through this reinforcement learning process, participants had to find out which type of pictures is more valuable, and 30% of the scores were accumulated into the total score. To induce the effect of reward motivation on memory, participants were introduced to the opportunity to choose between two pictures and receive the value of the selected picture, although the actual program did not include a decision-making stage. After the learning stage, participants were tested on item and reward associative memory.

On the basis of taking reward result as another independent variable, take RPE and URPE as the independent variables respectively to carry out the generalized mixed linear model (GMLM) on the hit rate of item and associative memory. The results are shown in Table 1. The performance of associative memory is higher in positive valence and low salience of RPE. On the basis of taking the memory result in each trial as another independent variable, RPE and URPE were taken as the independent variables respectively to carry out the mixed linear model (MLM) of item and associative memory JOCs. The results are shown in Table 2. When RPE valence is positive, the associative memory JOCs are higher in correct trials and lower in wrong trials, indicating the promotion of RPE positive valence to the metamemory monitoring at retrieval.

Experiment 2 recorded the eye-movement data (the change of pupil diameter and the fixation time of pictures and scores during the reward feedback display screen) of the subjects, to investigate the effect of RPE on the metamemory control during memory encoding. Number of subjects for behavior and eye movement analysis are 23 and 20. The process is shown in Figure 2. In order to avoid the unsignificant results of item memory in Experiment 1, the presentation time of each screen in the learning stage is adjusted.

The results of the GMLM on the hit rates of item and associative memory are shown in Table 3. Contrary to the associative memory performance, positive valence and high salience of RPE improved item memory performance. MLM was carried out on the picture fixation time, score fixation time, mean and maximum change of pupil diameter (Table 4). With the decrease of RPE salience, the subjects' fixation time on pictures was shortened, and the fixation time on scores was prolonged, which was consistent with the effect of salience on memory performance, revealing the allocation of cognitive resources through metamemory control based on RPE salience. The positive valence and low salience of RPE promote the change of pupil diameter. Because the performance of associative memory is also higher in these conditions, this may indicate that individuals have taken measures to increase the degree of encoding effort in order to promote associative memory.

In Experiment 3, the reward results were set as 1, 3, 5 and 7 to increase the RPE levels, in order to verify the stability of the influence of RPE valence and salience on item and associative memory when the overlap between the RPE valence and reward result size is reduced. 27 subjects were included in analysis, and the process was similar to Experiment 2, except for the value feedback time increased to 5 s. The results of GMLM of memory performance are shown in Table 5. The effects of RPE on item and associative memory performance are the same as those of Experiment 2, which shows that the effects of RPE valence and salience on item and associative memory are stable.

To sum up, the results above suggest that the effects of RPE on item and associative memory are distinct. During the encoding stage, individuals use the valence and salience of reward prediction error as cues to allocate cognitive resources differently in item and associative memory encoding through metamemory control. In the retrieval stage, positive valence of reward prediction error enhances the metamemory monitoring level of associative memory retrieval.

Keywords reward prediction error, associative memory, eye movements, episodic memory, metamemory

LONG Yiting, JIANG Yingjie, CUI Can, YUE Yang. (2023). The distinct effects of reward prediction error on item and associative memory: The influence of metamemory. Acta Psychologica Sinica, 55(6), 877-891.

variables	hit rate of item memory				hit rate of associative memory
variables	b	z	p	95% CI	b	z	p	95% CI
intercept	1.11	7.18	<0.001	[0.34, 0.68]	−0.17	−1.21	0.227	[−0.46, 0.11]
RPE	0.03	1.23	0.217	[−0.02, 0.07]	0.14	3.54	<0.001	[0.06, 0.22]
reward result	−0.02	−0.51	0.610	[−0.07, 0.04]	0.09	3.15	0.002	[0.04, 0.15]
interaction					−0.04	−5.03	0.001	[−0.06, −0.02]
intercept	0.89	0.64	<0.001	[0.60, 1.17]	0.12	0.96	0.335	[−0.12, 0.36]
URPE	0.05	1.81	0.070	[−0.00, 0.09]	−0.19	−7.13	<0.001	[−0.24, −0.13]
reward result	0.01	0.64	0.525	[−0.03, 0.05]	0.08	3.54	<0.001	[0.03, 0.12]

variables	hit rate of item memory				hit rate of associative memory
variables	b	z	p	95% CI	b	z	p	95% CI
intercept	1.11	7.18	<0.001	[0.34, 0.68]	−0.17	−1.21	0.227	[−0.46, 0.11]
RPE	0.03	1.23	0.217	[−0.02, 0.07]	0.14	3.54	<0.001	[0.06, 0.22]
reward result	−0.02	−0.51	0.610	[−0.07, 0.04]	0.09	3.15	0.002	[0.04, 0.15]
interaction					−0.04	−5.03	0.001	[−0.06, −0.02]
intercept	0.89	0.64	<0.001	[0.60, 1.17]	0.12	0.96	0.335	[−0.12, 0.36]
URPE	0.05	1.81	0.070	[−0.00, 0.09]	−0.19	−7.13	<0.001	[−0.24, −0.13]
reward result	0.01	0.64	0.525	[−0.03, 0.05]	0.08	3.54	<0.001	[0.03, 0.12]

variables	JOCs of item memory				JOCs of associative memory
variables	b	t	p	95% CI	b	t	p	95% CI
intercept	2.47	29.91	<0.001	[2.31, 2.63]	2.46	28.01	<0.001	[2.29, 2.64]
memory result	0.70	29.91	<0.001	[0.57, 0.83]	0.05	0.52	0.602	[−0.12, 0.22]
RPE × memory result	−0.01	−0.76	0.448	[−0.03, 0.01]	0.03	2.19	0.029	[0.00, 0.06]
reward result × memory result	0.02	1.20	0.230	[−0.01, 0.04]	0.05	2.79	0.005	[0.01, 0.08]
Intercept	2.47	29.96	<0.001	[2.31, 2.63]	2.46	27.89	<0.001	[2.29, 2.64]
memory result	0.71	11.34	<0.001	[0.00, 0.09]	−0.05	−0.58	0.564	[−0.21, 0.11]
URPE × memory result	0.01	0.62	0.534	[−0.01, 0.03]	0.00	−0.23	0.817	[−0.04, 0.03]
reward result × memory result	0.01	0.96	0.335	[−0.01, 0.03]	0.07	5.36	<0.001	[0.05, 0.10]

variables	JOCs of item memory				JOCs of associative memory
variables	b	t	p	95% CI	b	t	p	95% CI
intercept	2.47	29.91	<0.001	[2.31, 2.63]	2.46	28.01	<0.001	[2.29, 2.64]
memory result	0.70	29.91	<0.001	[0.57, 0.83]	0.05	0.52	0.602	[−0.12, 0.22]
RPE × memory result	−0.01	−0.76	0.448	[−0.03, 0.01]	0.03	2.19	0.029	[0.00, 0.06]
reward result × memory result	0.02	1.20	0.230	[−0.01, 0.04]	0.05	2.79	0.005	[0.01, 0.08]
Intercept	2.47	29.96	<0.001	[2.31, 2.63]	2.46	27.89	<0.001	[2.29, 2.64]
memory result	0.71	11.34	<0.001	[0.00, 0.09]	−0.05	−0.58	0.564	[−0.21, 0.11]
URPE × memory result	0.01	0.62	0.534	[−0.01, 0.03]	0.00	−0.23	0.817	[−0.04, 0.03]
reward result × memory result	0.01	0.96	0.335	[−0.01, 0.03]	0.07	5.36	<0.001	[0.05, 0.10]

variables	hit rate of item memory				hit rate of associative memory
variables	b	z	p	95% CI	b	z	p	95% CI
intercept	0.82	4.09	<0.001	[0.53, 1.06]	−0.14	−1.03	0.302	[−0.42, 0.13]
RPE	−0.15	−4.16	<0.001	[−0.22, −0.08]	0.10	3.07	0.002	[0.04, 0.17]
reward result	0.09	3.38	<0.001	[0.04, 0.15]	0.12	4.25	<0.001	[0.06, 0.17]
interaction	0.02	3.04	0.002	[0.01, 0.04]	−0.03	−4.06	<0.001	[−0.04, −0.01]
intercept	0.96	4.99	<0.001	[0.58, 1.35]	0.20	1.60	0.110	[−0.05, 0.44]
URPE	0.09	3.57	<0.001	[0.04, 0.13]	−0.18	−7.72	<0.001	[−0.22, −0.13]
reward result	0.04	2.11	0.035	[0.00, 0.08]	0.10	5.21	<0.001	[0.06, 0.14]

The distinct effects of reward prediction error on item and associative memory: The influence of metamemory

RichHTML

PDF (PC)

Knowledge

Abstract

Cite this article

share this article

Figures/Tables 7

References

Related Articles 0

Recommended Articles

Metrics

Comments

variables	picture fixation time (ms)				score fixation time (ms)
variables	b	t	p	95% CI	b	t	p	95% CI
intercept	2607.41	30.35	<0.001	[2438.91, 2775.92]	528.31	10.53	<0.001	[429.88, 626.74]
RPE	8.72	1.46	0.143	[−2.96, 20.41]	−5.72	−1.50	0.133	[−13.16, 1.74]
reward result	1.98	0.25	0.801	[−13.43, 17.39]	−2.09	−0.42	0.676	[−11.92, 7.73]
intercept	2531.79	29.81	<0.001	[2365.20, 2698.39]	851.26	11.77	<0.001	[484.36, 678.15]
URPE	15.66	2.41	0.016	[2.93, 28.39]	−11.51	−2.78	0.006	[−19.62, −3.39]
reward result	10.81	1.90	0.058	[−0.36, 21.97]	−7.95	−2.19	0.029	[−15.06, −0.83]
variables	mean change of pupil diameter (μm)				maximum change of pupil diameter (μm)
variables	b	t	p	95% CI	b	t	p	95% CI
intercept	−25.94	−1.84	0.067	[−53.66, 1.79]	192.32	14.50	<0.001	[166.31, 218.33]
RPE	7.06	3.05	0.002	[2.53, 11.60]	4.40	3.63	<0.001	[2.03, 6.78]
reward result	−2.08	−1.11	0.266	[−5.74, 1.58]	−3.33	−2.08	0.037	[−6.47, −020]
interaction	−1.09	−2.28	0.023	[−2.02, −0.15]
intercept	−21.52	−1.46	0.144	[−50.40, 7.37]	194.58	13.95	<0.001	[167.22, 221.95]
URPE	−7.84	−2.97	0.003	[−13.02, −2.66]	−6.67	−2.95	0.003	[−11.11, −2.23]
reward result	−2.80	−1.38	0.169	[−6.79, 1.19]	−4.28	−2.45	0.014	[−7.70, −0.86]
interaction	1.18	2.12	0.034	[0.09, 2.28]	1.83	3.82	<0.001	[0.89, 2.77]

variables	picture fixation time (ms)				score fixation time (ms)
variables	b	t	p	95% CI	b	t	p	95% CI
intercept	2607.41	30.35	<0.001	[2438.91, 2775.92]	528.31	10.53	<0.001	[429.88, 626.74]
RPE	8.72	1.46	0.143	[−2.96, 20.41]	−5.72	−1.50	0.133	[−13.16, 1.74]
reward result	1.98	0.25	0.801	[−13.43, 17.39]	−2.09	−0.42	0.676	[−11.92, 7.73]
intercept	2531.79	29.81	<0.001	[2365.20, 2698.39]	851.26	11.77	<0.001	[484.36, 678.15]
URPE	15.66	2.41	0.016	[2.93, 28.39]	−11.51	−2.78	0.006	[−19.62, −3.39]
reward result	10.81	1.90	0.058	[−0.36, 21.97]	−7.95	−2.19	0.029	[−15.06, −0.83]
variables	mean change of pupil diameter (μm)				maximum change of pupil diameter (μm)
variables	b	t	p	95% CI	b	t	p	95% CI
intercept	−25.94	−1.84	0.067	[−53.66, 1.79]	192.32	14.50	<0.001	[166.31, 218.33]
RPE	7.06	3.05	0.002	[2.53, 11.60]	4.40	3.63	<0.001	[2.03, 6.78]
reward result	−2.08	−1.11	0.266	[−5.74, 1.58]	−3.33	−2.08	0.037	[−6.47, −020]
interaction	−1.09	−2.28	0.023	[−2.02, −0.15]
intercept	−21.52	−1.46	0.144	[−50.40, 7.37]	194.58	13.95	<0.001	[167.22, 221.95]
URPE	−7.84	−2.97	0.003	[−13.02, −2.66]	−6.67	−2.95	0.003	[−11.11, −2.23]
reward result	−2.80	−1.38	0.169	[−6.79, 1.19]	−4.28	−2.45	0.014	[−7.70, −0.86]
interaction	1.18	2.12	0.034	[0.09, 2.28]	1.83	3.82	<0.001	[0.89, 2.77]