透過您的圖書館登入
IP:3.144.86.138
  • 學位論文

基於 Actor-Critic 的深度強化學習演算法應用於不同到達時間之零工式排程問題

Job Shop Scheduling Problems with different arrival time using Deep Reinforcement Learning based on Actor-Critic

指導教授 : 劉建良

摘要


在製造業中,零工式排程問題是重要的研究之一,因為它影響了整個製程的完工時 間,也影響了生產過程中所衍生的成本。在傳統上,我們可以透過數學規劃法求解零 工式排程問題,但在求解問題範圍較大的情況下,每個工作依照一定的製造順序所對 應到的機台組合將會非常多,這將會導致過多的計算負擔且這樣的方法也難以運用於 現實所面臨的動態環境下的零工式排程問題。雖然啟發式演算法可以利用更少的時間 並求得近似最佳解,但它仍然無法用於處理快速變化的環境。而另一方面,深度強化 學習是一個由深度學習與強化學習所組成的方法,深度學習可以在無需人工干預的情 況下學習特徵,而強化學習透過試誤法訓練並可以快速地做決策。在人工智慧的領域 中,由於 DeepMind 所提出的 AlphaGo 與 AlphaStar 在實際的問題上獲得成功,使得 深度強化學習變得更廣為人知。由於實際的零工式排程問題往往伴隨著不同的到達時 間,環境的資訊也並非固定的。為了使求解的問題更接近現實的零工式排程問題,在 本論文中,我們提出一個基於 Actor-Critic 的深度強化學習演算法以求解擁有不同到達 時間的零工式排程問題。深度強化學習手法可以快速地做決策,因此我們可以將它應 用於動態環境下。我們透過 OR-library 中零工式排程問題的指標資料來評估我們所提 出的方法在單一環境中的預測結果並與其他現有的方法比較。另一方面,我們也針對 擁有不同到達時間且在訓練的過程中沒有看過的環境進行預測,以驗證我們提出的方 法泛化能力。

並列摘要


Job Shop Scheduling Problem (JSSP) is an important research topic in manufacturing industries as the results affect the completion time and cost of the process. Traditional approaches tended to cope with the JSSP by using mathematical programming. However, the problem is intractable when confronted with large-scale problems. Besides, it is difficulty to generalize mathematical programming to dynamic environments that are always present in the practical settings of JSSP. The meta-heuristic could obtain approximation solutions with less computational effort than mathematical programming, but still fail to deal with the dynamic situations. On the other hand, deep reinforcement learning (DRL) approach, the combination of deep learning (DL) and reinforcement learning (RL), can benefit from DL to learn feature representations without human intervention. Moreover, RL is trained by a trial-and-error mechanism, making it possible to quickly response to a dynamic environment and make decisions rapidly. In artificial intelligence, DRL has become more and more popular because of the success in realistic problem such as the AlphaGo and AlphaStar proposed by DeepMind. The practical JSSP is often accompanied by different arrival times, and the information of the environment is not always fixed, meaning that dealing with dynamic environments is an important research topic in JSSP. To make the developed model cope with these issues in JSSP, this thesis proposes to develop a method that is based on an Actor-Critic DRL architecture to attack JSSP comprising different arrival times. The proposed model could deal with the JSSP in dynamic environments by benefiting from the characteristics of DRL, namely quick decision making. We use the benchmark data of JSSP from OR-library to evaluate our proposed method and compare with other approaches. On the other hand, we also make predictions for environments with different arrival times that have not been seen during training to verify the generalization capability of the proposed model

參考文獻


[1] Michael L. Pinedo. Scheduling: Theory, Algorithms, and Systems (fifth ed.) New York: Springer, 2016, pp. 1–10.
[2] Jeffrey D. Ullman. “NP-complete scheduling problems”. In: Journal of Computer and System sciences 10.3 (1975), pp. 384–393.
[3] Darrell Whitley. “A genetic algorithm tutorial”. In: Statistics and computing 4.2 (1994), pp. 65–85.
[4] Vladimir N Temlyakov. “Greedy approximation”. In: Acta Numerica 17 (2008), pp. 235–409.
[5] Henry Pierreval and Nasser Mebarki. “Dynamic scheduling selection of dispatching rules for manufacturing system”. In: International Journal of Production Research 35.6 (1997), pp. 1575–1591.

延伸閱讀