利用深度學習方法解決具動態流量的分散式執行工作在平行系統的資源配置問題

在平行運算系統中，如何有效地將平行應用程式之溝通模式對應至所運行之網路拓樸是優化平行計算系統上的一門重要的課題，特別是當運行規模擴大或是溝通成本增加時，好的擺放方式將大幅地提升運用程式之運行效能。在過去，許多學者已經對此問題進行了廣泛的研究並提出不同的解決方案。過去的做法中大多將此問題轉化平行應用程式之溝通模式和網路拓樸成兩個靜態圖，並提出不同的數學方法或演算法來找出兩個靜態圖之對應關係。然而實際應用中，我們會發現因為平行應用程式大多擁有動態的溝通模式，所以靜態圖並不容易獲得，並且靜態圖無法精準的描述平行應用程式的實際溝通模式。因此我們提出一個深度強化學習 (deep reinforcement learning) 解決方案，透過動態的溝通資訊和較佳的評估模型，進而找到好的平行應用程式與運行之網路拓樸的對應關係。我們深入的比較我們提出的方案與其他現有的方法和函式庫。我們所提出的解決方案能夠實現較佳或可以比擬的平行應用程式效能。特別是在實際平行應用程式上，我們的解決方案在環面 (torus) 拓樸和 dragonfly 拓樸中分別取得十一和十六百分比的平均效能提升。相比之下，其他方法的平均效能提升均不到百分之六。

關鍵字

任務對應；平行應用程式；深度強化學習；演算法

並列摘要

Efficient mapping of application communication patterns to the network topology is a critical problem for optimizing the performance of communication bound applications on parallel computing systems. The problem has been extensively studied in the past, but they mostly formulate the problem as finding an isomorphic mapping between two static graphs with edges annotated by traffic volume and network bandwidth. But in practice, the network performance is difficult to be accurately estimated, and communication patterns are often changing over time and not easily obtained. Therefore, this work proposes a deep reinforcement learning (DRL) approach to explore better task mappings by utilizing the performance prediction and runtime communication behaviors provided from a simulator to learn an efficient task mapping algorithm. We extensively evaluated our approach using both synthetic and real applications with varied communication patterns on Torus and Dragonfly networks. Compared with several existing approaches from literature and software library, our proposed approach found task mappings that consistently achieved comparable or better application performance. Especially for a real application, the average improvement of our approach on Torus and Dragonfly networks are 11% and 16%, respectively. In comparison, the average improvements of other approaches are all less than 6%.

並列關鍵字

task mapping ； parallel applications ； deep learning ； algorithm

參考文獻

[1]Addanki, R., Bojja Venkatakrishnan, S., Gupta, S., Mao, H., and Alizadeh, M.Placeto: LearningGeneralizableDevicePlacementAlgorithmsforDistributedMachine Learning.arXiv e-prints(June 2019).

Google Scholar

[2]Akbudak, K., Kayaaslan, E., and Aykanat, C. Hypergraph partitioning basedmodels and methods for exploiting cache locality in sparse matrix-vector mul-tiplication.SIAM Journal on Scientific Computing 35, 3 (2013), C237–C262.

Google Scholar

[3]Bello, I., Pham, H., Le, Q. V., Norouzi, M., and Bengio, S. Neural combina-torial optimization with reinforcement learning.

Google Scholar

[4]Bhatele, A., Jain, N., Isaacs, K. E., Buch, R., Gamblin, T., Langer, S. H., andKale, L. V. Optimizing the performance of parallel applications on a 5d torusviataskmapping. In201421stInternationalConferenceonHighPerformanceComputing (HiPC)(2014), pp. 1–10.

Google Scholar

[5]Bhatele, A., and Kale, L. V. Heuristic-based techniques for mapping irreg-ular communication graphs to mesh topologies. In2011 IEEE InternationalConference on High Performance Computing and Communications(2011),pp. 765–771.

Google Scholar

國際替代計量

利用深度學習方法解決具動態流量的分散式執行工作在平行系統的資源配置問題

全文下載

主題瀏覽