將循序程式自動轉移至異質系統架構

異質系統架構（Heterogeneous System Architecture, HSA）是一個由HSA基金會（HSA Foundation）提出的異質計算硬體架構。該架構之統一記憶體架構（HSA Unified Memory Architecture, hUMA）使得資料得以共享於異質裝置中，其提供之使用者層級排隊模型（HSA Queuing Model, hQ）亦能以低成本將程式調度於不同異質裝置上執行，這些特色使得應用程式得以使用更有效率的異質計算。然而，今日之大多數異質計算卻無法得力於hUMA與hQ，甚至大部分市場上的應用程式都以傳統之循序執行模型來實作。此論文目的為建構一個全自動化的框架以自動轉移循序應用程式至HSA平台上，其包含使用多面體記憶體相依分析、階段化調度預測以及記憶體存取合併優化。此框架亦使用hUMA及hQ所帶來之好處，於符合HSA標準之機器上達成低成本之工作調度。在AMD Carrizo機型上（符合HSA標準），我們的框架最快可以使一個循序應用程式在同一機器上加速至原先之8.66倍。在傳統認為工作量不夠大而無法得力於非HSA異質計算之許多情形中，我們的框架仍能帶來一定程度的加速。此外，其所帶來之加速程度，在同一台Carrizo機器上有時甚至超過人為使用不論HSA平台或非HSA平台轉移之結果。此架構使得許多以循序模型實作之既有傳統應用程式能夠因為HSA的異質計算而達到效能的提升。

關鍵字

自動轉移；異質系統架構；共享虛擬記憶體；細顆粒系統共享虛擬記憶體

並列摘要

Heterogeneous System Architecture (HSA) is a hardware architecture for heterogeneous computing proposed by the HSA Foundation. Its Unified Memory Architecture (hUMA) enables data sharing between heterogeneous devices and its user-level Queuing Model (hQ) enables low overhead kernel launching. With such features, applications could enjoy more efficient and effective heterogeneous computing. However, most of today's heterogeneous-computing applications have not leveraged the hUMA and hQ features. Moreover, the majority of applications on the market are implemented in traditional sequential models. This thesis looks at building a fully automatic framework to migrate sequential applications to HSA. The framework includes polyhedral-guided memory aliasing analysis, a staged dispatching predictor, and memory coalescing optimization. It also takes advantages of hUMA and hQ to achieve low overhead job dispatching on HSA-compliant systems. On an AMD Carrizo machine (HSA-compliant), a sequential application runs through our framework could be 8.66x faster on Carrizo than before. In several cases where workloads are considered insufficient to benefit from conventional or non-HSA heterogeneous computing, our framework could still deliver significant speedups. In addition, the performance obtained through our framework can sometimes exceed the performance gain from manual tuning for both HSA and non-HSA platforms, running on the same Carrizo machine. With this framework, many existing applications coded in traditional sequential models could get performance boost from HSA-based heterogeneous computing.

並列關鍵字

automatic migration ； Heterogeneous System Architecture ； shared virtual memory ； fine-grained system SVM

參考文獻

[1] M.Amini,B.Creusillet,S.Even,R.Keryell,O.Goubier,S.Guelton,J.O.Mcmahon, F.-X. Pasquier, G. Péan, and P. Villalon. Par4All: From Convex Array Regions to Heterogeneous Computing. In IMPACT 2012 : Second International Workshop on Polyhedral Compilation Techniques HiPEAC 2012, Paris, France, Jan. 2012. 2 pages.

Google Scholar

[2] S. Baghdadi, A. Größlinger, and A. Cohen. Putting Automatic Polyhedral Compi- lation for GPGPU to Work. In Proceedings of the 15th Workshop on Compilers for Parallel Computers (CPC’10), Vienna, Austria, July 2010.

Google Scholar

[3] M. M. Baskaran, J. Ramanujam, and P. Sadayappan. Automatic c-to-cuda code generation for affine programs. In Compiler Construction, 19th International Con- ference, CC 2010, Held as Part of the Joint European Conferences on Theory and Practice of Software, ETAPS 2010, Paphos, Cyprus, March 20-28, 2010. Proceed- ings, pages 244–263, 2010.

Google Scholar

[4] A.Beletska,W.Bielecki,A.Cohen,M.Palkowski,andK.Siedlecki.Coarse-grained loop parallelization: Iteration space slicing vs affine transformations. Parallel Com- puting, 37(8):479–497, 2011.

Google Scholar

[5] U. Bondhugula, A. Hartono, J. Ramanujam, and P. Sadayappan. A practical au- tomatic polyhedral parallelizer and locality optimizer. In Proceedings of the ACM SIGPLAN 2008 Conference on Programming Language Design and Implementation, Tucson, AZ, USA, June 7-13, 2008, pages 101–113, 2008.

Google Scholar

國際替代計量

將循序程式自動轉移至異質系統架構

全文下載

主題瀏覽