透過您的圖書館登入
IP:18.191.132.36
  • 學位論文

平行程式於記憶體共享架構之自動相位檢測

Automatic Phase Detection for Parallel Applications on Shared Memory Architectures

指導教授 : 楊佳玲
若您是本文的作者,可授權文章由華藝線上圖書館中協助推廣。

摘要


週期精確之軟體模擬器對於計算機結構設計相當重要。它允許設計者在早期設計時可以嘗試不同的計算機結構。然而,週期精確軟體模擬器,其模擬速度相當緩慢。而多核心處理器架構因較單核心系統有更多之CPU與其他系統元件,因此模擬速度更加緩慢,使得改善多核心系統之模擬器的效能極為重要。 在本論文中,我們探討取樣模擬的技術,並提出以此技術加速多核心系統模擬之機制。透過辨認程式中重複出現的行為,考慮每個具代表性的特徵點,來加速模擬的時間。對於偵測程式重複行為的問題,傳統上對單一執行緒程式,採用程式碼簽章的方式。然而對於平行程式,程式的行為不再只受執行指令的影響,執行緒間彼此的互動,也成為影響表現的重要因素。 因此在傳統程式碼簽章的方法之外,我們還利用紀錄執行緒間資料共享的模式,與共用資源爭搶的情況,來幫助偵測程式重複出現的行為特徵。藉由採計特徵點的行為與程式完整執行結果比較,我們所設計之多核心模擬加速方法,能將錯誤率控制在 2% 以下。

並列摘要


Cycle-accurate software-based simulation is critical for architecture design since it allows an architect to explore various architectural design points at the early stage of design cycles. However, simulation speed has always been an issue for cycle-accurate simulation. With the popularity of multi-core processors, improving multi-core simulation performance is critical to allow fast advances in multi-core architecture researches. In this work, we look into simulation sampling techniques to speed up multi-core architecture simulation. Techniques have been proposed that automatically group similar portions of a programs’s execution into phases, where samples classified as the same phase have homogeneous behavior. Conventionally, a program is looked over code signatures to extract information about the phases and only the representative intervals are executed to analyze architectural selections. However, such methodologies are becoming inadequate in multi-core category. Because application’s behavior is not dominated by the instructions only but also the communication structures between threads. Hence, in this work we propose to utilize the interaction between threads for parallel program phase detection. Our results reveal that the inclusion of such information can increase the accuracy of the phase detection significantly (The error rate of IPC is below 2%).

參考文獻


[4] M. Annavaram, R. Rakvic, M. Polito, J.-Y. Bouguet, R. A. Hankins, and B. Davies. The fuzzy correlation between code and performance predictability. In MICRO 37: Proceedings of the 37th annual IEEE/ACM International Symposium on Microarchitecture, pages 93–104, Washington, DC, USA, 2004. IEEE Computer Society.
[5] V. Aslot. Performance characterization of the specomp benchmarks.
[7] R. Balasubramonian, D. Albonesi, A. Buyuktosunoglu, and
S. Dwarkadas. Memory hierarchy reconfiguration for energy and performance in general-purpose processor architectures. In MICRO 33: Proceedings of the 33rd annual ACM/IEEE international symposium on Microarchitecture, pages 245–257, New York, NY, USA, 2000. ACM.
[10] A. Dhodapkar and J. E. Smith. Dynamic microarchitecture adaptation via codesigned virtual machines. 2002.

延伸閱讀