容器化TensorFlow於異質CPU/GPU叢集之實作

容器虛擬化技術透過作業系統提供之介面抽象化了作業系統，並提供了容器中行程彼此之間的獨立性，但也造成非傳統的特殊硬體資源（如GPU）存取上的困難，使得NVIDIA以及NERSC等積極開發nvidia-docker、Shifter等支援GPU的容器虛擬化技術。然而，如何大規模的佈署、建置，並且有彈性地分派調整容器化的應用程式，在不同節點、不同型號的CPU/GPU叢集中整合，是提供服務時需要考量的重點。本論文嘗試透過Kubernetes建置GPU叢集來進行人工智慧相關運算和應用程式的執行以及管理，使大量的運算可以平行並且動態的執行在異質CPU/GPU節點之間，並比較不同的容器虛擬化技術(Shifter)以及不同Tensorflow運行架構之間的差異。

關鍵字

Linux container ； Docker ； Shifter ； Kubernetes ； TensorFlow ； NVIDIA GPU

並列摘要

Container virtualization virtualizes the operating systems inside the containers by the interfaces provided by the host, and provides the isolation between the processes in individual containers, which leads to, however, the inconvenience of accessing some unconventional hardware resources (such as the Graphic Processing Unit) and the effort of NVIDIA and NERSC on nvidia-docker and Shifter. Nonetheless, when providing cloud computing services, it is essential to construct and deploy the containerized applications and services dynamically across the nodes in a cluster of heterogeneous CPUs and GPUs. In this paper, we tried to build a GPU cluster with Kubernetes in order to manage the nodes and execute the applications regarding the machine learning computations, making the jobs executed in parallel across the nodes, and comparing the difference between different container virtualization techniques (Shifter) and TensorFlow architectures.

並列關鍵字

Linux container ； Docker ； Shifter ； Kubernetes ； TensorFlow ； NVIDIA GPU

國際替代計量

容器化TensorFlow於異質CPU/GPU叢集之實作

全文下載

主題瀏覽