隨著機器學習(ML)/深度學習(DL)相關領域的快速發展下, MLOps機器學習自動化流程也逐漸興起。 MLOps目的在於將 ML/DL 產品的開發、部署及維護過程中,在開發階段能實現持續整合、部署時能實現持續部署,並利用其中標準化的ML Pipeline來縮短不同團隊間的溝通成本。因此,許多基於Kubernetes的MLOps開源框架開始逐漸出現。然而,這些系統基於Kubernetes的MLOps框架的目標主要著重於建立通用且易使用的ML Pipeline環境供使用者基於ML/DL容器化任務做使用,並未對Kubernetes本身的容器排程系統進行優化。而未針對ML/DL任務進行排程優化的缺點就是Kubernetes預設的Scheduler只考慮個別的容器化任務根據資源狀況的排程,沒有對整體容器化組成的ML/DL任務排程進行考量。因此本研究設計基於 FaaS MLOps 系統之ML 任務優化機制,其中使用了基於Kubernetes的ML Ops框架ML FaaS做為MLOps系統平台,設計ML/DL Task的排程優化系統,並將其整合至ML FaaS中,以解決在使用MLOps框架進行大量機器學習的專案開發時所遇到的問題,透過客製化的ML/DL Task Scheduler以替換Kubernetes 預設排程器,以解決Kubernetes中的排程策略無法滿足機器學習開發環境中的問題。
With the rapid development of machine learning (ML)/deep learning (DL) related fields, MLOps machine learning automation process is gradually emerging. In the process of deployment and maintenance of ML/DL products, the purpose of MLOps is to achieve continuous integration during the development, continuous deployment, and to reduce the communication cost between different teams by using the standardized ML Pipeline. As a result, many Kubernetes-based MLOps open source frameworks are starting to emerge. However, the goal of these Kubernetes-based MLOps frameworks is to create a common and easy-to-use ML Pipeline environment for users to use for ML/DL containerized tasks, without optimizing Kubernetes' own container scheduling system. The drawback of not optimizing the scheduling of ML/DL tasks is that the default scheduler of Kubernetes only considers the scheduling of individual containerized tasks according to the resource situation, but not the scheduling of ML/DL tasks for the overall containerized composition. Therefore, this study designs the ML task optimization mechanism of the FaaS-based MLOps system. In this study, ML FaaS, a Kubernetes-based MLOps framework, is applied as the MLOps system platform to design a scheduling optimization system of ML/DL Task and integrate it into ML FaaS to solve the problems encountered when using the MLOps framework for massive ML project development. The ML/DL Task Scheduler is customized to replace the Kubernetes default scheduler to solve the problem that the scheduling policy in Kubernetes cannot satisfy the machine learning development environment.