透過您的圖書館登入
IP:3.15.27.232
  • 學位論文

資料平行GPU架構之記憶體存取最佳化

Memory Access Optimization for Data-parallel GPU Architectures

指導教授 : 王勝德

摘要


全域記憶體的存取往往會造成數百個週期的延遲,使得運作在異質多核心系統上的應用程式效能可能因存取全域記憶體機會增加而顯著降低。本論文提出一種對於記憶體存取的數學建模,它能夠去擷取一群執行緒對於全域的存取,我們也提出一個測量在GPU記憶體系統低效率逐步存取程度的因子。基於一系列對於全域記憶體存取的分析,我們提出一個針對在GPU下記憶體存取問題的方法。多種執行核心的估算結果顯示,在不修改原始碼的前提下,執行核心使用我們所建議的工作群組大小比起廠商所提供的會得到較佳的效能。

並列摘要


Global memory accesses always cause the latency with hundreds of cycles, so that the performance of heterogeneous applications might degrade significantly if global memory accesses increase. In this thesis, we present a mathematical modeling that captures the memory accessing to the public within a group of threads and a metric identifying the degree of inefficient serial accesses in the GPU memory system. Based on the analysis of serial accesses in the memory system caused by global memory accessing within a work-group and among work-groups, we propose an approach to the memory access problem in GPUs. Evaluation on various kernel functions shows that kernels running with the work-group size suggested by our methodology outperforms the work-group size provided by hardware vendors. Heterogeneous applications executing on GPUs can gain the better performance without any code modification except by the memory access optimization with work-group sizing as suggested by our methodology.

並列關鍵字

Memory Access Optimization

參考文獻


[4] TOMPSON, Jonathan; SCHLACHTER, Kristofer. An introduction to the opencl programming model. Person Education, 2012.
[6] SU, Lisa T. Architecting the future through heterogeneous computing. In: Solid-State Circuits Conference Digest of Technical Papers (ISSCC), 2013 IEEE International. IEEE, 2013. p. 8-11.
[8] JANG, Byunghyun, et al. Exploiting memory access patterns to improve memory performance in data-parallel architectures. Parallel and Distributed Systems, IEEE Transactions on, 2011, 22.1: 105-118.
[12] CHE, Shuai, et al. A characterization of the Rodinia benchmark suite with comparison to contemporary CMP workloads. In: Workload Characterization (IISWC), 2010 IEEE International Symposium on. IEEE, 2010. p. 1-11.
[14] KHAN, Faiz, et al. Using javascript and webcl for numerical computations: A comparative study of native and web technologies. In: Proceedings of the 10th ACM Symposium on Dynamic languages. ACM, 2014. p. 91-102.

延伸閱讀