透過您的圖書館登入
IP:3.129.45.92
  • 學位論文

利用同步程序之交易層級之平行多核心指令集模擬方法

A Synchronization-Function-Based TLM Approach for Parallel Multi-Core Instruction-Set Simulations

指導教授 : 蔡仁松

摘要


為了因應日漸普及的多核心的運算平台,一個多核心指令集模擬器是十分重要的。現今我們可以利用平行運算的技術來增加多核心指令集模擬器的速度,但往往會遇到準確性不佳的問題,這是因為多核心之各別模擬速度不同,造成多核心之間之交互作用結果不正確,為了有效解決此問題,我們發明了一個以同步程序為基礎之高效能交易層級方法,可以用於平行多核心平台之指令集模擬,所謂的同步程序即多執行緒程式之間用來協調彼此執行順序之程序,此方法以交易層級模型為基礎,我們將交易邊界設定為每次同步程序之呼叫,此邊界同時也是不同核心之間之交互作用點,因此兩次同步程序呼叫之間的眾多指令可被視為一筆交易,透過一個「阻擋/非阻擋」「 發送/接收」 之同步程序模型,以及適當的時間同步方法, 每筆交易的時間及順序就能正確且有效率的被維護,另一方面,若一筆交易牽涉到多核心之間之溝通,我們將之稱為「公開交易」,若一筆交易沒有牽涉到多核心之間之溝通,我們將之稱為「私下交易」,「公開交易」的時間及次序需要被維持,而「私下交易」的順序則不會影響模擬的準確性,藉由這個特性,此方法的性能又能進一步提升。我們的實驗結果顯示,這個方法可以達到每秒549百萬指令的模擬速度,此為最新「共享參數」方法的三倍快,並且能和「週期精準」方法一樣,得到準確的時間及功能。

並列摘要


We describe a highly efficient transaction-level modeling (TLM) technique for parallel Multi-Core Instruction-Set simulations (MCISS). We set all the calls of synchronization functions—which dictate interactions among applications on different CPU cores—as the transaction boundary. Using a generic blocking/non-blocking send/receive modeling approach for synchronization functions and proper timing synchronization, we can precisely determine the temporal order of each transaction and hence efficiently calculate accurate simulation results. Our experiments show that the proposed approach attains a simulation speed of up to 549 MIPS, which is three times faster than the state-of-art shared-variable-access approach while producing accurate timing and functional results equal to those from cycle-accurate approaches.

參考文獻


[19] Downey, A. B. The Little Book of Semaphores, Version 2.1.5, available at http://www.greenteapress.com/semaphores
[2] M. Rosenblum , “Using the simOS machine simulator to study complex computer systems,” in ACM Trans. Modeling and Computer Simulation, Jan 1997, pp. 78-103.
[3] M.-H. Wu, C.-Y. Fu, P.-C. Wang, and R.-S. Tsay, “An effective synchronization approach for fast and accurate multi-core instruction-set simulation,” in EMSOFT ’09: Proceedings of the seventh ACM international conference on Embedded Software, 2009, p. 197.
[4] M.-H. Wu, P.-C. Wang, C.-Y. Fu, and R.-S. Tsay, “A high-parallelism distributed scheduling mechanism for multi-core instruction-set simulation,” in 2011 48th ACM/EDAC/IEEE Design Automation Conference (DAC), 2011, pp. 339-344.
[5] Jason E. Miller et al., “Graphite: A distributed parallel simulator for multicores,” in HPCA ’10: Proceedings of the 16th International Symposium on High-Performance Computer Architecture, Jan. 2010.

延伸閱讀