  • 學位論文


Reward Prediction Errors in Reinforcement Learning: Psychosis, Personality, and Modeling Issues

指導教授 : 徐永豐
共同指導教授 : 賴文崧


動物與人在不確定性環境中決策皆需經由試誤學習才能學習到環境的規則,對環境形成預期並依此作為未來決策的依據。根據增強學習理論的假設,預期的更新發生於預期與實際經驗之間有落差時,該落差被稱為酬賞預測誤差(reward prediction error, RPE)。當研究發現多巴胺細胞能夠記錄RPE訊號,RPE在神經科學的研究開始興起。本研究包含兩個議題,分別探討(i)精神分裂症病人(schizophrenia, SZ)的精神病症狀與RPE之關係,(ii)個體在性格上的差異是否影響RPE的處理歷程。首先,有學者認為多巴胺系統異常導致RPE處理錯誤是SZ病人產生精神病症狀(例如:幻覺與妄想)的原因。為檢驗該假設,本研究讓SZ病人進行以兩選項之機率學習的回饋性決策—動態酬賞作業,在作業中得到酬賞機率大的選項每過一段時間會改變,而且該改變不會告知受試者。本研究使用增強學習模型來分析資料,參數估計使用貝氏估計法。研究發現SZ病人更新預期的速度比較快且有較多探索性決策。此外,隨著病人的精神病症狀越嚴重,則會有越多的探索性決策。這些研究結果與假設相符。研究的第二部份分析Cloninger之三向度性格量表各向度得分與動態酬賞作業表現的相關,結果發現一般大學生在動態酬賞作業的表現存在性別差異,而且新奇追求傾向越高者其更新預期的速度越慢,酬賞依賴傾向越高者其探索性決策越多。然而,在SZ病人並無發現類似的結果。另外,本研究也約略討論增強學習模型的參數性質,包括參數間的相關性與單位不變性。


Making appropriate decisions involves the ability to update information of alternatives from previous experiences. In particular, the updated reward prediction error (RPE) – a discrepancy between the predicted and the actual rewards, is regarded as being encoded by dopamine neurons. Two issues about RPE were discussed in this thesis. First, dysfunction of RPE might link abnormal dopamine systems and therefore the formation of psychotic symptoms (i.e., hallucination and delusion) in schizophrenia (SZ). To examine this hypothesis, we tested SZ patients and healthy controls using a feedback-based “dynamic rewarding task,” in which the subject was required to choose between two different reward options that were alternated in a block fashion. We fit the experimental data with a (standard) reinforcement learning (RL) model using the Bayesian estimation approach. Model-fitting results revealed that SZ patients update their values more rapidly and have more exploratory decisions. We also found that the degree of exploration increases with the severity of the psychotic symptoms. These findings support the hypothesis that abnormal RPE processes correlate with aberrant dopaminergic activities and subjective psychotic experiences. Second, since an individual’s heritable trait might predispose her/his decision-making behavior, we conducted a Tridimensional Personality Questionnaire on subjects to investigate the correlation between personality traits and the estimated parameters in the RL model. Results showed that college students with higher novelty seeking scores have lower value-updating rates, and those with higher reward dependence scores have higher degree of explorations. Moreover, gender differences were found in the task performance. However, no similar patterns were found in SZ patients. Finally, we briefly discussed two modeling issues that are yet to be resolved. The first concerns the negative correlation between the learning rate parameter and the perseveration parameter in the RL model. The second concerns the issue of scale invariance with regard to the perseveration parameter in the RL model.


Baum, W. M. (1974). On two types of deviation from the matching law: Bias and undermatching. Journal of the Experimental Analysis of Behavior, 22, 231-242.
Bayer, H. M., & Glimcher, P. W. (2005). Midbrain dopamine neurons encode a quantitative reward prediction error signal. Neuron, 47, 129-141.
Bayes, T. (1763). An essay towards solving a problem in the doctrine of chances. Philosophical Transactions of the Royal Society of London, 53, 370-418. Reprinted in 1958 with biographical note by G. A. Barnard in Biometrika, 45, 293-315.
Bechara, A., Damasio, A. R., Damasio, H., & Anderson, S. W. (1994). Insensitivity to future consequences following damage to human prefrontal cortex. Cognition, 50, 7-15.
Bentall, R. P., Kaney, S., & Dewey, M. E. (1991). Paranoia and social reasoning: An attribution theory analysis. British Journal of Clinical Psychology, 30, 13-23.
