多核心平台上之考慮快取記憶體之工作排程策略

隨著製程的進步,多核心處理器已經成為實現高效能處理器的一主要方向。在多核心處理器的架構中，每一個處理核心(processor core)可以配有一獨立的私有快取記憶體(private cache)，而多個處理核心更可以同時分享一大型的快取記憶體。由於整體系統的執行效能和快取記憶體的工作效率有著高度的關聯性，最佳化資料的存取模式將可以提升系統的效能，而一經過良好設計的工作排程(task scheduling)將能有效的達成此一目標。然而，多核心系統上的快取記憶體組織的高複雜度增加了以人工方式來最佳化工作排程的困難度。因此，開發一個良好的自動化工作排程最佳化工具是有其必要性的。在這篇論文當中，我們試著提出一新工作排程策略，其考慮以增進快取記憶體依存性(cache affinity)，減少記憶體用量(memory footprint)及同步流量(coherence traffic)的方式來減少快取記憶體上的容量失誤(capacity miss)及同步失誤(coherence miss)，進而提升快取記憶體的工作效率。我們並將此一策略實現於一平行程式模組,Threading Building Blocks,的工作排程器中。程式開發者可以透過應用程式介面(application programming interface)來給定每一工作之資料使用大小及分享關係。實驗結果顯示，相較於其他工作排程策略，我們所提出的工作排程策略可以有效的減少程式執行時間，達到較高的系統效能。

關鍵字

多核心；工作排程；快取記憶體

並列摘要

As the technology shrink and the increasing of the number of transistors on a single chip, multi-core processors have become major implementations to build high-performance processors. In multi-core processors, the processing cores may have separate private caches and/or share a large common cache. Since the system performance highly depends on the cache utilization, the data access pattern should be optimized to improve performance. A good task scheduling is an effective way to optimize data access pattern. However, the cache organizations of multi-core systems are quite complex and it is hard to optimize the scheduling manually. Therefore, a good tool is required. In this paper, we try to minimize capacity and coherence misses through affinity improvement, footprint reduction and coherence traffic minimization. We propose a scheduling policy which integrates these techniques to reduce cache misses effectively. We also implement the policy in the scheduler of a parallel programming model, Thread Building Blocks(TBB). Programmers can specify the footprint and sharing group of each task through API provided by TBB easily, and the scheduler would optimize the cache utilization accordingly. We believe that this tool can ease the programming complexity by hiding the details for cache utilization optimization to provide high performance.

並列關鍵字

Multi-core ； Task scheduling ； Cache

參考文獻

[4] Intel Threading Building Blocks, http://www.threadingbuildingblocks.org/.

[5] J. H. Anderson, J. M. Calandrino, and U. C. Devi. Real-time scheduling on multicore platforms. In RTAS ’06: Proceedings of the 12th IEEE Real-Time and Embedded Technology and Applications Symposium, pages 179–190, Washington, DC, USA, 2006. IEEE Computer Society.

[7] S. Borkar. Design challenges of technology scaling. Micro, IEEE,19(4):23–29, Jul-Aug 1999.

[8] J. R. Bulpin and I. A. Pratt. Hyper-threading aware process scheduling heuristics. In ATEC ’05: Proceedings of the annual conference on USENIX Annual Technical Conference, pages 27–27, Berkeley, CA,USA, 2005. USENIX Association.

[11] J. Clabes, J. Friedrich, M. Sweet, J. DiLullo, S. Chu, D. Plass,J. Dawson, P. Muench, L. Powell, M. Floyd, B. Sinharoy, M. Lee,M. Goulet, J. Wagoner, N. Schwartz, S. Runyon, G. Gorman, P. Restle,R. Kalla, J. McGill, and S. Dodson. Design and implementation of the power5 microprocessor. In DAC ’04: Proceedings of the 41st

國際替代計量

多核心平台上之考慮快取記憶體之工作排程策略

全文下載

主題瀏覽