過去幾年來,零工式排程在生產排程中一向都是最受矚目的問題,該問題需要考量到製程的執行順序並在特定的機台上執行,這些限制使得零工式排程在解題上的難度相較其他問題大幅增加。最早,零工式排程問題多以數學規劃法進行求解,但此方法的解題時間會隨著問題需求增加而跟著指數成長,因此在實務應用上較為困難,取而代之的改以啟發式演算法在允許時間範圍之內求得近似最佳解。隨著電腦科學的發展,DeepMind團隊以強化學習訓練的AlphaGo在圍棋賽局中擊敗職業選手,今年更以AlphaStar在高維度動作的遊戲中贏得比賽,這些強化學習在現實世界中的成就給予了我們解題上不同的選擇。 強化學習在線下需要做長時間的訓練,但在線上時它可以因應環境狀態並根據過往經驗快速做出決策,其訓練上最大的困難是要在複雜的環境中定義互動時的獎勵。本篇論文會以生成對抗模仿學習對零工式排程問題進行求解,該模型由強化學習所延伸出來,並結合了生成對抗網路與模仿學習的優勢使得我們不必如同以往的強化學習方式對零工式排程中選擇工作的獎勵進行定義,而是以生成對抗網路中的判別器直接給予獎勵。本篇目的在於驗證生成對抗模仿學習模型在零工式排程問題上的解題表現並與其他模型做比較
Job Shop scheduling Problems (JSSPs) have been one of the most remarkable issues in production scheduling over decades. The restriction of given order in operations and the operations must be processed on a specific machine make JSSPs difficult to be solved. JSSPs could be formulated as optimization problems with constraints, explaining why many previous researches have proposed to use mathematical programming to obtain optimal solutions. However, it is easy to become an intractable problem when dealing with large-scale problem by using for mathematical programming, so it is difficult to apply mathematical programming to most real-world problems. On the other hand, meta-heuristic algorithms make it possible to search for sub-optimal solutions or even optimal solutions in some problems within acceptable time limits. The advance of deep learning has made promising results in many domains. For example, DeepMind developed the AlphaGo with deep reinforcement learning (DRL), and won professional players in the Go. Furthermore, AlphaStar won the eSports competition in high-dimensional action games this year. These achievements in DRL inspire us to use DRL to deal with JSSPs. Although the training of DRL requires a lot of time, it can make quick decisions in different environment states based on the experience when applying to the problems. When applying DRL to JSSPs, one of the challenges is to define rewards in such complicated environments unless the model could explore the solution space very well. In this thesis, we will solve JSSPs based on Generative Adversarial Imitation Learning (GAIL), which is extended from RL. It combines the characteristics of Generating Adversarial Networks and Imitation Learning. The rewards in GAIL is returned by the Discriminator directly. The purpose of this thesis is to verify the performance of GAIL on JSSPs and compare it with other models.