基於 Actor-Critic 的深度強化學習演算法應用於不同到達時間之零工式排程問題

在製造業中，零工式排程問題是重要的研究之一，因為它影響了整個製程的完工時間，也影響了生產過程中所衍生的成本。在傳統上，我們可以透過數學規劃法求解零工式排程問題，但在求解問題範圍較大的情況下，每個工作依照一定的製造順序所對應到的機台組合將會非常多，這將會導致過多的計算負擔且這樣的方法也難以運用於現實所面臨的動態環境下的零工式排程問題。雖然啟發式演算法可以利用更少的時間並求得近似最佳解，但它仍然無法用於處理快速變化的環境。而另一方面，深度強化學習是一個由深度學習與強化學習所組成的方法，深度學習可以在無需人工干預的情況下學習特徵，而強化學習透過試誤法訓練並可以快速地做決策。在人工智慧的領域中，由於 DeepMind 所提出的 AlphaGo 與 AlphaStar 在實際的問題上獲得成功，使得深度強化學習變得更廣為人知。由於實際的零工式排程問題往往伴隨著不同的到達時間，環境的資訊也並非固定的。為了使求解的問題更接近現實的零工式排程問題，在本論文中，我們提出一個基於 Actor-Critic 的深度強化學習演算法以求解擁有不同到達時間的零工式排程問題。深度強化學習手法可以快速地做決策，因此我們可以將它應用於動態環境下。我們透過 OR-library 中零工式排程問題的指標資料來評估我們所提出的方法在單一環境中的預測結果並與其他現有的方法比較。另一方面，我們也針對擁有不同到達時間且在訓練的過程中沒有看過的環境進行預測，以驗證我們提出的方法泛化能力。

關鍵字

零工式排程問題；深度強化學習；深度學習；強化學習

並列摘要

Job Shop Scheduling Problem (JSSP) is an important research topic in manufacturing industries as the results affect the completion time and cost of the process. Traditional approaches tended to cope with the JSSP by using mathematical programming. However, the problem is intractable when confronted with large-scale problems. Besides, it is difficulty to generalize mathematical programming to dynamic environments that are always present in the practical settings of JSSP. The meta-heuristic could obtain approximation solutions with less computational effort than mathematical programming, but still fail to deal with the dynamic situations. On the other hand, deep reinforcement learning (DRL) approach, the combination of deep learning (DL) and reinforcement learning (RL), can benefit from DL to learn feature representations without human intervention. Moreover, RL is trained by a trial-and-error mechanism, making it possible to quickly response to a dynamic environment and make decisions rapidly. In artificial intelligence, DRL has become more and more popular because of the success in realistic problem such as the AlphaGo and AlphaStar proposed by DeepMind. The practical JSSP is often accompanied by different arrival times, and the information of the environment is not always fixed, meaning that dealing with dynamic environments is an important research topic in JSSP. To make the developed model cope with these issues in JSSP, this thesis proposes to develop a method that is based on an Actor-Critic DRL architecture to attack JSSP comprising different arrival times. The proposed model could deal with the JSSP in dynamic environments by benefiting from the characteristics of DRL, namely quick decision making. We use the benchmark data of JSSP from OR-library to evaluate our proposed method and compare with other approaches. On the other hand, we also make predictions for environments with different arrival times that have not been seen during training to verify the generalization capability of the proposed model

並列關鍵字

job shop scheduling problem ； deep reinforcement learning ； deep learning ； reinforcement learning ； actor-critic ； or-library

參考文獻

[1] Michael L. Pinedo. Scheduling: Theory, Algorithms, and Systems (fifth ed.) New York: Springer, 2016, pp. 1–10.

Google Scholar

[2] Jeffrey D. Ullman. “NP-complete scheduling problems”. In: Journal of Computer and System sciences 10.3 (1975), pp. 384–393.

Google Scholar

[3] Darrell Whitley. “A genetic algorithm tutorial”. In: Statistics and computing 4.2 (1994), pp. 65–85.

Google Scholar

[4] Vladimir N Temlyakov. “Greedy approximation”. In: Acta Numerica 17 (2008), pp. 235–409.

Google Scholar

[5] Henry Pierreval and Nasser Mebarki. “Dynamic scheduling selection of dispatching rules for manufacturing system”. In: International Journal of Production Research 35.6 (1997), pp. 1575–1591.

Google Scholar

國際替代計量

基於 Actor-Critic 的深度強化學習演算法應用於不同到達時間之零工式排程問題

全文下載

主題瀏覽