透過您的圖書館登入
IP:3.18.220.243
  • 學位論文

Dealing with Perceptual Aliasing by Using Pruning Suffix Tree Memory in Reinforcement Learning

在增強式學習中用修剪字尾樹處理感知混淆現象

指導教授 : 蘇豐文

摘要


在一個部分可觀測馬可夫決策過程中,增強式學習代理人有時候會因為偵 測系統的限制而無法分辨出問題中兩個不同的狀態,也就是俗稱的感知混淆現 象。為了解決這個問題,有些研究合併先前事件的記憶來區別出受到感知混淆 的狀態。McCallum使用Utile Suffix Memory (USM):一種將實例儲存在樹狀結 構來表示狀態的實例基底方法。他使用了邊緣(fringe) (在真正樹下的預定義深 度之延伸子樹)的概念並提供了一個有底限的前瞻演算法。然而,但是使用邊緣 會造成整棵樹使用過多的節點。我們介紹了一種以使用不同於USM的分離樹葉 節點方法取代邊緣的改良版USM來解決這個問題。我們在實驗中展現了我們的 方法總是產生比USM還要小的樹並且代理人可以學到一個可接受的策略。

並列摘要


In a POMDP (Partially Observable Markov Decision Process) problem, the Reinforcement Learning agent always has a chance to unable to distinguish two different state of the world, called perceptual aliasing, due to the limitation of sensory system. To solve this problem, some researchers have incorporated memory of preceding events to distinguish perceptually-aliased states. McCallum proposed Utile Suffix Memory (USM) [7], an instance-based method using a tree to store instances and to represent states. He use of a fringe (an extension of the tree to a pre-specified depth below the real tree) provides the algorithm a limited degree of lookahead capability. However, the use of a fringe make the tree hold more node in terms of tree size. We introduce a modification of USM to solve this issue without the use of fringe by using a different criterion with USM to split a leaf node. In our experiments, we have show that our method always produces trees that contain fewer nodes than USM and the agent learns a applicable policy.

參考文獻


learning for partially observable markov decision process. In
In Proceedings of the second international conference on From animals to
animats 2 : simulation of adaptive behavior, pages 271–280, Cambridge, MA,
USA, 1993. MIT Press.
[6] Andrew Kachites Mccallum. Reinforcement learning with selective perception

延伸閱讀