在過去的幾十年中,系統單晶片 ( SoC ) 提供開發人員加入更多功能在單一晶片上。但是,摩爾定律 (Moores Law) 指出晶片上電晶體個數在每兩年內會倍增,所以晶片設計的複雜度將會面臨劇烈的挑戰。毫無疑問的,提高設計模組的複雜度是迫切需要的。現今,單一核心的發展已經遇到頻率無法提高與功率消耗的問題。所以整合多核心 (Multi-core) 架構被設計出來取代傳統單核心架構。多核心架構的優勢在於計算能力的表現,低功率消耗,適用於多執行緒 (Multi-thread) 應用程式。然而,在多核心架構上對於記憶體頻寬的需求仍會增加。在1994年,Wulf和McKee提出電腦效能的提升將會停止,事實也證明了在1986 ~ 2000間,CPU效能以年均55%的成長遠勝於記憶體效能以年均10%的成長,此現象將導致記憶體效能將成為電腦效能提升的瓶頸。也因此有許多的工程師致力於提升記憶體控制器跟記憶體之間的效能。除此之外,在採用mesh或torus的多核心架構會有核心與記憶體之間的距離過長的現象,就此現象我們提出了一個架構能縮短記憶體的存取在NoC上所消耗的時間。此架構所採用的方法為將核心做分組,並提供專屬於該組核心的記憶體通道,此通道另一端連結於一個多埠的Crossbar Switch用於重排記憶體的存取至正確的記憶體控制器,我們稱此方法為CS-based approach。我們另外採用了Standard Co-Emulation - Modeling Interface (SCE-MI) 來連結軟體與硬體以實現完整的平台架構。CS-based approach相較於一般的方法在SPLASH-2程式效能表現上有著1.18 ~ 1.74倍的顯著成長,而Crossbar Switch所需額外的gate count約為7k。
In past decades, system on a chip gives explorers add more functions on a single chip. But Moore's Law indicates transistor counts doubled approximately every two years. The design complexity also encounter sharp challenge. Undoubtedly, raising the abstraction level of modeling and simulation is urgent need. Nowadays, single processor development has encounter bottleneck of rising frequency and energy efficiency problem. So the emerging many-core architecture has been designed for replacing traditional centralized single core design. Multi-core processor's advantages are high performance computing, low power, and suitable to multi-thread applications. However, the demand for memory bandwidth is still increased. In 1994, Wulf and McKee through the improvement of computer's performance would stop. Factual proof that from 1986 to 2000, CPU speed improved at an annual rate of 55% while memory speed only improved at 10%. In other words, memory speed would become the bottleneck in computer performance. Therefore, many engineers dedicate to improve the efficiency between memory controller and DRAM. In addition, the many-core architecture which use mesh or torus architecture between cores has a phenomenon that the distance from the core to DRAM may be very far. Based on the above motivation, we present an architecture which has better efficiency of memory access, and a mechanism reduces memory access's routing time on NoC. This mechanism clusters processors and as-signs exclusive memory channel to the cluster. The architecture uses a multi-port Crossbar Switch to re-schedule DRAM requests from memory channels to DRAM. We call the architecture that memory requests routing by Crossbar Switch as CS-based approach. In contrast with Original approach that memory requests routing by NoC. To implement the architecture, we adopt SCE-MI to bridge ESL many-core platform with RTL memory sub-system. Experiment of SPLASH2 applications demonstrates remarkable speed up that ranges from 1.18 to 1.74 times. And the extra Crossbar Switch is about 7k gate count.