透過您的圖書館登入
IP:18.224.214.215
  • 學位論文

以自我組織特徵映射圖為基礎之 模糊系統實作連續性Q-learning

A SOM-based Fuzzy Systems Q-learning in Continuous State and Action Space

指導教授 : 蘇木春
若您是本文的作者,可授權文章由華藝線上圖書館中協助推廣。

摘要


所謂的增強式學習法(Reinforcement Learning),就是訓練對象與環境互動的過程中,不藉助監督者提供完整的指令下,可以自行發掘在各種狀態下該採取什麼行動才能獲得最大報酬。而Q-learning 是一種常見的增強式學習法,藉由建立每一個狀態對應每一個動作之Q值的查詢表(look-up table),Q-learning 可以順利的處理存在少量離散狀態與動作空間的問題上。但當處理的問題擁有大量的狀態與動作時,所要建立的查詢表便會十分的巨大,所以此種對於每一個狀態-動作建立查詢表的方法便顯得不可行。本論文提出一個以自我組織特 徵映射網路(Self-Organization Feature Map network, SOM network)為基礎的模糊系統來實作Q-learning,並以此方法來設計控制系統。為了加速訓練的過程,本論文結合任務分解(task decomposition)與自動任務分解的機制來處理複雜的任務。藉由機器人的模擬實驗,可以看出此方法的有效性。

並列摘要


In reinforcement learning, there is no supervisor to critically judge the chosen action at each step. The learning is through a trial-and-error procedure interacting with a dynamic environment. Q-learning is one popular approach to reinforcement learning. It is widely applied to problems with discrete states and actions and usually implemented by a look-up table where each item corresponds to a combination of a state and an action. However, the look-up table plementation of Q-learning fails in problems with continuous state and action space because an exhaustive enumeration of all state-action pairs is impossible. In this thesis, an implementation of Q-learning for solving problems with continuous state and action space using SOM-based fuzzy systems is proposed. Simulations of training a robot to complete two different tasks are used to demonstrate the effectiveness of the proposed approach. Reinforcement learning usually is a slow process. In order to accelerate the learning procedure, a hybrid approach which integrates the advantages of the ideas of hierarchical learning and the progressive learning to decompose a complex task into simple elementary tasks is proposed.

參考文獻


[2] J. S. Albus, “A new approach to manipulator control: the cerebrellar model articulated controller(CMAC),” Journal of Dynamic Systems, Measurement and Control, pp. 220-227, 1997.
[3] G. A. Carpenter and S. Grossberg, “A massively parallel architecture for a self-organizing neural pattern recognition machine,” Comput. Vision Graphics Image Process, vol. 37, pp. 54-115, 1987.
[4] G. A. Carpenter and S. Grossberg, “ART 2: Self-organization of stable category recognition codes for analog input patterns,” Appl. Opt., vol. 26, pp. 4919-4930, 1987.
[5] G. A. Carpenter and S. Grossberg, “The ART of adaptive pattern recognition by a self-organization neural network,” computer, vol. 21, no. 3, pp. 77-88, 1988.
[6] G. A. Carpenter and S. Grossberg, “ART 3: Hierarchical search using chemical transmitters in self-organizing pattern recognition architectures,” Neural Networks, vol. 3, no. 2, pp. 129-152, 1990.

被引用紀錄


林敬斌(2009)。使用增強式學習法改善一個簡易的臺灣股價指數期貨當沖交易系統〔碩士論文,國立臺灣大學〕。華藝線上圖書館。https://doi.org/10.6342/NTU.2009.00477
莊智凱(2007)。以自動分群機制輔助網路服務管理之研究〔碩士論文,國立臺北科技大學〕。華藝線上圖書館。https://www.airitilibrary.com/Article/Detail?DocID=U0006-2408200716575200

延伸閱讀