透過您的圖書館登入
IP:18.226.181.45
  • 學位論文

快取記憶體一致性協定之平行化效能分析工具

A Parallel Performance Analysis Framework for Cache Coherence Protocols

指導教授 : 洪士灝

摘要


多核心處理器可以大幅提升平行程式的效能,但是要開發一個能夠完全發揮多核心處理器的潛力的軟體事實上是伴隨著許多艱困的問題。平行化的效益是很容易受程式行為影響,例如:多執行緒之間的資源共享進而影響到快取記憶體一致性協定的硬體效能,會讓軟體失去平行化的好處。為了分析該硬體的效能以及了解如何受軟體行為影響,利用精準的模擬器(simulator)能夠幫助開發者進行分析,但其缺點在於對複雜的系統模擬速度低落,並且因為使用單一執行緒的模擬方式,改善其模擬速度有限且無法獲得多核心處理器的好處。 在本篇論文中,我們將會介紹一個創新的快取記憶體一致性協定之平行化效能分析方法,此方法結合模擬技術以及數值分析方法以達到快速的效能估算。實驗結果顯示,我們的平行化分析方法相較於目前廣泛使用的基於記憶體存取觸發的模擬方式,最高可獲得加速13倍的效益。我們更進一步將此分析方法與平行化的系統功能模擬器(emulator)整合已提拱完整的系統效能分析,而非只限制於使用者空間的應用程式。 在最後我們會展示如何運用我們的效能分析工具去調校多執行緒程式的效能,將以一個OpenMP的程式作為範例。效能調校後的結果顯示:在我們的多核心處理機上,並以16條多執行緒平行執行之下,最高可獲得約一倍的效能提升。

並列摘要


Multi-core platform offer large performance potential for parallel software, but developing these softwares is very challanging. The performance of cache coherence protocol due to the data sharing in multi-threaded applications plays the important role that impacts the scalability. To analyze the cache performance in multi-core system, detail simulation can give the accurate results but it is too slow for complex systems since it serialized the simulation of many cores and the performance is bounded by the computation power of single core. In this thesis, we propose a novel multi-core cache performance analysis approach that combine the simualtion and analytic method fast performance estimation in parallel. The experimental results show that our approach performs about 13 times faster that the memory-access-based approach. We further integrate this parallel scheme into a parallel full-system emulator for system wide performance analysis but not only the user space applicaitons. To demonstrate the performance analysis framework, we show a case study that optimize a OpenMP program, the maximum performance improvement of the application is up to about 100\% under the configuration of using 16 OpenMP threads on our 48-cores host machine.

參考文獻


[2] Z. Wang, R. Liu, Y. Chen, X. Wu, H. Chen, W. Zhang, and B. Zang, “Coremu:
a scalable and portable parallel full-system emulator,” in Proceedings of the 16th
F. Larsson, A. Moestedt, and B. Werner, “Simics: A full system simulation platform,”
Hower, T. Krishna, S. Sardashti, R. Sen, K. Sewell, M. Shoaib, N. Vaish, M. D. Hill, and
1–7, Aug. 2011. [Online]. Available: http://doi.acm.org/10.1145/2024716.2024718

延伸閱讀