多核心平台上記憶體架構之設計與分析

在過去的幾十年中，系統單晶片 ( SoC ) 提供開發人員加入更多功能在單一晶片上。但是，摩爾定律 (Moores Law) 指出晶片上電晶體個數在每兩年內會倍增，所以晶片設計的複雜度將會面臨劇烈的挑戰。毫無疑問的，提高設計模組的複雜度是迫切需要的。現今，單一核心的發展已經遇到頻率無法提高與功率消耗的問題。所以整合多核心 (Multi-core) 架構被設計出來取代傳統單核心架構。多核心架構的優勢在於計算能力的表現，低功率消耗，適用於多執行緒 (Multi-thread) 應用程式。然而，在多核心架構上對於記憶體頻寬的需求仍會增加。在1994年，Wulf和McKee提出電腦效能的提升將會停止，事實也證明了在1986 ~ 2000間，CPU效能以年均55%的成長遠勝於記憶體效能以年均10%的成長，此現象將導致記憶體效能將成為電腦效能提升的瓶頸。也因此有許多的工程師致力於提升記憶體控制器跟記憶體之間的效能。除此之外，在採用mesh或torus的多核心架構會有核心與記憶體之間的距離過長的現象，就此現象我們提出了一個架構能縮短記憶體的存取在NoC上所消耗的時間。此架構所採用的方法為將核心做分組，並提供專屬於該組核心的記憶體通道，此通道另一端連結於一個多埠的Crossbar Switch用於重排記憶體的存取至正確的記憶體控制器，我們稱此方法為CS-based approach。我們另外採用了Standard Co-Emulation - Modeling Interface (SCE-MI) 來連結軟體與硬體以實現完整的平台架構。CS-based approach相較於一般的方法在SPLASH-2程式效能表現上有著1.18 ~ 1.74倍的顯著成長，而Crossbar Switch所需額外的gate count約為7k。

關鍵字

多核心；多通道記憶體控制器；動態隨機存取記憶體

並列摘要

In past decades, system on a chip gives explorers add more functions on a single chip. But Moore's Law indicates transistor counts doubled approximately every two years. The design complexity also encounter sharp challenge. Undoubtedly, raising the abstraction level of modeling and simulation is urgent need. Nowadays, single processor development has encounter bottleneck of rising frequency and energy efficiency problem. So the emerging many-core architecture has been designed for replacing traditional centralized single core design. Multi-core processor's advantages are high performance computing, low power, and suitable to multi-thread applications. However, the demand for memory bandwidth is still increased. In 1994, Wulf and McKee through the improvement of computer's performance would stop. Factual proof that from 1986 to 2000, CPU speed improved at an annual rate of 55% while memory speed only improved at 10%. In other words, memory speed would become the bottleneck in computer performance. Therefore, many engineers dedicate to improve the efficiency between memory controller and DRAM. 　In addition, the many-core architecture which use mesh or torus architecture between cores has a phenomenon that the distance from the core to DRAM may be very far. Based on the above motivation, we present an architecture which has better efficiency of memory access, and a mechanism reduces memory access's routing time on NoC. This mechanism clusters processors and as-signs exclusive memory channel to the cluster. The architecture uses a multi-port Crossbar Switch to re-schedule DRAM requests from memory channels to DRAM. We call the architecture that memory requests routing by Crossbar Switch as CS-based approach. In contrast with Original approach that memory requests routing by NoC. To implement the architecture, we adopt SCE-MI to bridge ESL many-core platform with RTL memory sub-system. Experiment of SPLASH2 applications demonstrates remarkable speed up that ranges from 1.18 to 1.74 times. And the extra Crossbar Switch is about 7k gate count.

並列關鍵字

Many-core ； Muti-channel memory controller ； DRAM

參考文獻

[5] C. C. Liu, I. Ganusov, M. Burtscher, and S. Tiwari, Bridging the processor-memory performance gap with 3d ic technology, Design & Test of Computers, IEEE, vol. 22, no.

[6] I. Loi and L. Benini, An ecient distributed memory interface for many-core platform with 3d stacked dram, in Proceedings of the Conference on Design, Automation and

[9] P.-Y. Chen and C.-T. Huang, RTL Realization of NoC-Based Multi-Core Platform, in Master Thesis, Department of Computer Science, National Tsing Hua University, Hsinchu, Taiwan, Oct. 2011.

[14] Sai Manoj P. D., Kanwen Wang, Hantao Huang and Hao Yu, Smart I/Os: A Data-pattern Aware 2.5D Interconnect with Space-Time Multiplexing.

[15] O. Mutlu and T. Moscibroda, Parallelism-aware batch scheduling: Enhancing both performance and fairness of shared dram systems, in ACM SIGARCH Computer Architecture News, vol. 36, no. 3. IEEE Computer Society, 2008, pp. 6374.

被引用紀錄

韓人恆（2005）。兒童教育連鎖產業拓展中國大陸市場競爭策略分析：以吉的堡教育集團為例〔碩士論文，淡江大學〕。華藝線上圖書館。https://doi.org/10.6846/TKU.2005.00811

劉景捷（2005）。雞肉加工產業經營策略之研究－以大成長城公司為例〔碩士論文，淡江大學〕。華藝線上圖書館。https://doi.org/10.6846/TKU.2005.00700

黃峙銘（2015）。跨國製藥藥廠在台灣之成長競爭策略研究，以A公司為例〔碩士論文，國立清華大學〕。華藝線上圖書館。https://doi.org/10.6843/NTHU.2015.00436

江盈儒（2013）。SOA雲端運算服務應用於記憶體印刷電路板產業CRM探討之研究〔碩士論文，中原大學〕。華藝線上圖書館。https://doi.org/10.6840/cycu201300382

李進益（2006）。我國全民健保財務平衡機制之研究〔碩士論文，中原大學〕。華藝線上圖書館。https://doi.org/10.6840/cycu200600124

國際替代計量

多核心平台上記憶體架構之設計與分析

全文下載

主題瀏覽