Dealing with Perceptual Aliasing by Using Pruning Suffix Tree Memory in Reinforcement Learning

在一個部分可觀測馬可夫決策過程中，增強式學習代理人有時候會因為偵測系統的限制而無法分辨出問題中兩個不同的狀態，也就是俗稱的感知混淆現象。為了解決這個問題，有些研究合併先前事件的記憶來區別出受到感知混淆的狀態。McCallum使用Utile Suffix Memory (USM)：一種將實例儲存在樹狀結構來表示狀態的實例基底方法。他使用了邊緣(fringe) (在真正樹下的預定義深度之延伸子樹)的概念並提供了一個有底限的前瞻演算法。然而，但是使用邊緣會造成整棵樹使用過多的節點。我們介紹了一種以使用不同於USM的分離樹葉節點方法取代邊緣的改良版USM來解決這個問題。我們在實驗中展現了我們的方法總是產生比USM還要小的樹並且代理人可以學到一個可接受的策略。

關鍵字

部分可觀測馬可夫決策過程；感知混淆現象；字尾樹；增強式學習

並列摘要

In a POMDP (Partially Observable Markov Decision Process) problem, the Reinforcement Learning agent always has a chance to unable to distinguish two different state of the world, called perceptual aliasing, due to the limitation of sensory system. To solve this problem, some researchers have incorporated memory of preceding events to distinguish perceptually-aliased states. McCallum proposed Utile Suffix Memory (USM) [7], an instance-based method using a tree to store instances and to represent states. He use of a fringe (an extension of the tree to a pre-specified depth below the real tree) provides the algorithm a limited degree of lookahead capability. However, the use of a fringe make the tree hold more node in terms of tree size. We introduce a modification of USM to solve this issue without the use of fringe by using a different criterion with USM to split a leaf node. In our experiments, we have show that our method always produces trees that contain fewer nodes than USM and the agent learns a applicable policy.

並列關鍵字

POMDP ； Perceptual Aliasing ； Suffix Tree ； Reinforcement Learning

參考文獻

learning for partially observable markov decision process. In

In Proceedings of the second international conference on From animals to

animats 2 : simulation of adaptive behavior, pages 271–280, Cambridge, MA,

USA, 1993. MIT Press.

[6] Andrew Kachites Mccallum. Reinforcement learning with selective perception

國際替代計量

Dealing with Perceptual Aliasing by Using Pruning Suffix Tree Memory in Reinforcement Learning

全文下載

主題瀏覽