在一個部分可觀測馬可夫決策過程中,增強式學習代理人有時候會因為偵 測系統的限制而無法分辨出問題中兩個不同的狀態,也就是俗稱的感知混淆現 象。為了解決這個問題,有些研究合併先前事件的記憶來區別出受到感知混淆 的狀態。McCallum使用Utile Suffix Memory (USM):一種將實例儲存在樹狀結 構來表示狀態的實例基底方法。他使用了邊緣(fringe) (在真正樹下的預定義深 度之延伸子樹)的概念並提供了一個有底限的前瞻演算法。然而,但是使用邊緣 會造成整棵樹使用過多的節點。我們介紹了一種以使用不同於USM的分離樹葉 節點方法取代邊緣的改良版USM來解決這個問題。我們在實驗中展現了我們的 方法總是產生比USM還要小的樹並且代理人可以學到一個可接受的策略。
In a POMDP (Partially Observable Markov Decision Process) problem, the Reinforcement Learning agent always has a chance to unable to distinguish two different state of the world, called perceptual aliasing, due to the limitation of sensory system. To solve this problem, some researchers have incorporated memory of preceding events to distinguish perceptually-aliased states. McCallum proposed Utile Suffix Memory (USM) [7], an instance-based method using a tree to store instances and to represent states. He use of a fringe (an extension of the tree to a pre-specified depth below the real tree) provides the algorithm a limited degree of lookahead capability. However, the use of a fringe make the tree hold more node in terms of tree size. We introduce a modification of USM to solve this issue without the use of fringe by using a different criterion with USM to split a leaf node. In our experiments, we have show that our method always produces trees that contain fewer nodes than USM and the agent learns a applicable policy.