透過您的圖書館登入
IP:3.142.40.43
  • 學位論文

應用離散事件模擬於強化學習之探討

Investigation of Discrete Event Simulation and Reinforcement Learning

指導教授 : 林則孟

摘要


本研究提出如何將強化學習與離散事件模擬結合,並將強化學習結合離散事件模擬模型應用於FMS/AGV系統之派車問題。本研究強調透過學習事件連結離散事件模擬與強化學習。在模擬的過程中,只有當學習事件被觸發時,才觸發強化學習的學習機制,進行MDP狀態的轉移與獎勵值的反饋,產生訓練樣本。 本研究提出如何以SysML語言輔助建構概念模型,並根據概念模型進一步設計模型並實作Python+SimPy模擬軟體。建構離散事件模擬分4大階段:起始階段、分析階段、設計階段、實作階段。起始階段主要的目的是定義問題,包含定義模擬目標、輸入/輸出、範圍等;分析階段利用SysML工具幫助分析系統的物件結構與系統行為;設計階段則利用SysML工具來設計模擬程式中的類別架構與設計各類別的行為;最後於實作階段對應Python與SimPy語法設至計階段中SysML的類別架構設計與類別行為設計並一一實作,完成模擬模型。 本研究提出以離散事件模擬為基礎之深度強化學習架構,並應用於FMS/AGV系統之派車問題。詳述應用強化學習的生產系統「環境」建置,以及「代理人」設計。實驗結果驗證,強化學習在適當獎勵值、狀態與行動設計下,考慮不確定訂單來到率的情境中,DQN方法之績效能夠超越單一最佳派車法則。

並列摘要


In this research, a method of integrated reinforcement learning and discrete-event simulation is proposed and applied in a AGV dispatching problem in FMS/AGV system. This study combined reomforcement learning and discrete event simulation by learning event. Only when the learning event is triggered, the learning process will be executed which means the MDP transition to next state, get a reward from environment and generate a traning sample. This study also proposes a process of conceptual modeling support by SysML tool. With the conceptual model, one can further design model and implement simulation model by Python and SimPy. The process of building a simulation model can be separated to 4 phase: The inception phase, The analysis phase, The design phase and the implement phase. The goal in inception phase is to define the problem. It covers the define of simulation goal, the input and output and the boundary, etc. The analysis phase use SysML to analyze the object structure and the system behavior. The design phase use SysML to design the structure of python simulation program and design the behavior of each class. Finally, build the simulation model by mapping Python and SimPy with the result of design phase in the implement phase. This study proposes a simulation-based deep reinforcement learning framework in an AGV dispatching problem in FMS/AGV. We demonstrate a process of building an “Environment” of production system and the design of “Agent”. The experimental result shows that under the uncertain dynamic situation of new jobs arrival rate, DQN can learn a policy that out perfrom the best single dispatching rule with designed reward function.

參考文獻


[1] 郭曜賑,“UML為基礎之物件導向模擬模式發展程序方法論-晶圓廠自動物料搬運系統為例”,清華大學工業工程研究所碩士論文,1999。
[2] 彭士齊,”應用元件為基礎之物件導向模擬模式發展程序-eM-Plant為例”, 清華大學工業工程研究所碩士論文,2002。
[3] 徐孟維,” 機器學習於無人搬運車系統之派車應用”,清華大學工業工程研究所碩士論文,2018。
[4] Antonelli, G., Arrichiello, F., Caccavale, F., & Marino, A. (2014). Decentralized time-varying formation control for multi-robot systems. The International Journal of Robotics Research, 33(7), 1029-1043.
[5] Arbez, G., Birta, L. G., Robinson, S., Brooks, R. J., Kotiadis, K., & van der Zee, D. J. (2011). The ABCmod conceptual modeling framework. Conceptual modeling for discrete-event simulation, 133-178.

延伸閱讀