透過您的圖書館登入
IP:18.222.67.251
  • 期刊

Nymph:以可合成Verilog HDL設計之新型32核心多處理器

Nymph: A Novel 32-Cores Multicore Processor Designed by Synthesizable Verilog HDL

摘要


現今高階電腦系統內都需要有一高效能處理器,用以快速完成使用者所下達任務。以往提高處理器效能的方法,主要是以製程技術以及深度管線化,提升處理器工作頻率。然而高工作頻率亦帶來難以解決的散熱問題。因此近年來,高效能處理器的設計重點,已從提高單一程式執行效率,轉向提高系統總產出量。其中多核心處理器就是一種可行方案,也就是以更多處理器做更多的任務,達到高產出量。本論文設計了一多核心處理器架構,名為Nymph,其中包含了單核心處理器的實現,以及串連32個處理器的互聯網路,並實際以DSP stone Benchmark驗證其功能正確性,更進一步探討其效能增益與瓶頸。Nymph多核心架構內部包含32顆以MIPS指令集架構為基礎的處理器,整合8個記憶體模組,構成一共享記憶體的架構。為求面積成本與傳輸效率間的平衡,互聯網路由8x8;Crossbar與Bus;組合而成;整個系統以Crossbar;連接八個Cluster,而Cluster;內部透過Bus;溝通,每個Cluster;包含四個核心及一個記憶體。本論文所提及之架構,均以RTL;Verilog實現。為能繼續進行後續的晶片開發,除了完成模型的製作之外,更著重使其能符合Verilog可合成設計的準則。設計完成後,進行架構的Verilog模擬,根據模擬結果,相較於單核心處理器,本多核心架構最高可達到18倍的效能。

並列摘要


A high performance processor is necessary in the modern computer system to accomplish the complex missions of users. The major techniques to improving processor's working frequency for high performance come from advancing semiconductor technology and deep pipelining stages of the processor in the past. However, high working frequency brings unsolvable cooling problems. For this reason, the design of the high performance processor focuses on the high throughput but not high working frequency. The multicore processor is one of the workable solutions because it can do more works by multiple cores in the processor. Accordingly, a novel multicore processor, named Nymph, is proposed to illustrate the implementation of single-core processor and the interconnection network connecting with 32 processors. It has been examined by using DSPStone benchmark to verify the correctness of the function and analyze the efficiency benefit and limit.The inside architecture of Nymph includes 32 processors based on MIPS ISA and the combination of the eight memory modules. In order to reach the balance between the cost and transmission efficiency, interconnection network is composed by 8x8 crossbar and Bus. The crossbar connects with eight clusters which communicate by bus inside. Each cluster includes four cores and one memory. The architecture this paper mentioned is implemented by synthesizable RTL Verilog HDL so that it can be implemented into a chip by typical ASIC flow. According to simulate result, Nymph architecture comparing to single-core processor will reach eighteen times performance speedup.

被引用紀錄


許詔傑(2013)。Grid-Tree: 適用於多核心系統之新型晶片網路〔碩士論文,中原大學〕。華藝線上圖書館。https://doi.org/10.6840/CYCU.2013.00111

延伸閱讀