透過您的圖書館登入
IP:3.21.76.0
  • 學位論文

基於GPU共享與零碎資源再利用的作業調度方法

A Heuristic Approach with Fine-Grained Scheduling Based on GPU Sharing for Deep Learning Jobs

指導教授 : 鍾武君
本文將於2027/08/30開放下載。若您希望在開放下載時收到通知,可將文章加入收藏

摘要


目前針對多重深度學習訓練作業共享GPU集群的調度方法,鮮少在討論GPU共享的調度設計,且依賴性能預測模型的演算法存在系統開銷的問題,再加上目前先進的演算法無法細粒度的調度作業,使空閒的GPU資源無法有效利用,導致現有的解決方案仍有改善空間。本論文基於暫停和恢復機制,可保存模型訓練狀態和遷移,提出輕量級的採樣分析方法預測每一作業完成時間,並在GPU共享的前提下,解決大量異質作業提交導致大型作業的飢餓問題,達到資源碎片再利用的目的。本論文基於Microsoft Philly真實集群的紀錄,透過TF-Slim工具進行基準測試得到的數據,以及設置深度學習訓練模擬實驗 ,進而評估四種影像分類模型的GPU平均利用率及作業時間。實驗使用三組隨機種子隨機產生100個模擬作業,分別在間隔一秒到達及基於卜瓦松分布到達下,比較二種無GPU共享的方法和五種基於GPU共享技術的性能。根據模擬實驗結果顯示,相較於無GPU共享的依序調度,本論文提出的方法可提升約4.1倍的資源利用率,以及減少約3.6倍的完工時間。

並列摘要


Current scheduling methods for multiple Deep Learning Training (DLT) jobs on GPU clusters rarely discuss the scheduling design of GPU sharing. A system overhead is raised while the presented algorithms are relying on the performance prediction models. Additionally, current approaches cannot schedule resources for DLT jobs in a fine-grained level, which prevents the cluster system from the effective utilization of idle GPU resources. As a result, existing solutions still have rooms for the improvement. Based on the Suspend and Resume mechanisms, this paper proposes a lightweight sampling and analysis method to predict the completion time of DLT jobs. The starvation problem is also solved for the large-scale jobs caused by lots of heterogeneous job submissions under the premise of GPU sharing, so as to achieve the purpose of reusing resource fragments. Experiments are simulated based on the traces collected from real Microsoft Philly clusters and the benchmark data obtained by TF-Slim tool. Three random seeds are used to randomly generate 100 jobs for the simulation. Performance evaluations are compared with two methods without GPU sharing and five methods based on GPU sharing under the arrival rate of the one-second and the Poisson distribution. Results show that our approach improves the resource utilization by around 4.1 times and reduces the completion time by around 3.6 times when compared to the sequential scheduling without GPU sharing.

參考文獻


[1] Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. 2016. "Deep Residual Learning for Image Recognition," in Proceedings of The IEEE Conference on Computer Vision and Pattern Recognition, pp. 770-778.
[2] Christian Szegedy, Wei Liu, Yangqing Jia, Pierre Sermanet, Scott Reed, Dragomir Anguelov, Dumitru Erhan, Vincent Vanhoucke, and Andrew Rabinovich. 2015. "Going Deeper with Convolutions," in Proceedings of The IEEE Conference on Computer Vision and Pattern Recognition, pp. 1-9.
[3] Karen Simonyan and Andrew Zisserman. 2014. "Very Deep Convolutional Networks for Large-Scale Image Recognition," arXiv Preprint arXiv:1409.1556.
[4] [Accessed in 2013 ]. "gTTS (Google Text-to-Speech)." https://pypi.org/project/gTTS/.
[5] Ravivanshikumar Sangpal, Tanvee Gawand, Sahil Vaykar, and Neha Madhavi. 2019. "Jarvis: An Interpretation of AIML with Integration of gTTS and Python," in 2nd International Conference on Intelligent Computing, Instrumentation and Control Technologies (ICICICT): IEEE, pp. 486-489.

延伸閱讀