隨著系統晶片技術的改進,多核心處理器正變得越來越重要,其通信基礎設施實現在晶片網路上,晶片網路包含了大量的交換機和互連,形成一個橫跨晶片的結構。現代的多核心系統隨著越來越多的晶片組件,變得特別容易故障。在晶片網路上,即使是一個小小的通道斷掉了,也可能會導致部分通訊停止,甚至整個卡住,使得整個晶片沒有辦法使用。因此提高其良率變成很重要的課題。 我們提出了一個方法來分析和改進的晶片網路架構的容錯,在未來的技術中,這是一個必要的步驟。我們提出了一個自我修復的方法。我們在相鄰運算元件之間增加通道用以實現容錯晶片網路,這將顯著提高系統的良率。我們用一個簡單的RC公式來計算新通道的導線延遲,其線延遲小於1ns,因此該延遲是我們負擔的起的。此外,這個新通道取代了原本相鄰運算元件之間的傳輸方式,使之可以減少傳輸時間。我們在SystemC平台上設計和分析容錯晶片網路中,此多核心平台有16個運算單元,對於這些運算單元我們增加兩到四個OCP接口用來跟相鄰的運算單元做連接。實驗結果顯示容錯晶片網路的擁有不錯的良率,而SPLASH2測資證明當有通道斷掉時,其額外增加的延遲只有約1%到2%左右。
As improvement of System-on-Chip (SoC) technology, many-core processors are becoming more and more important. Their communication infrastructures will be implemented with Networks-on-Chip (NoC). Networks-on-Chip (NoC) contains a large number of switches and interconnects that form a structure spanning across the chip. Unfortunately, with increasing numbers of on-chip components expected to be defective in near-future chips, modern parallel systems, such as many-core system, become especially vulnerable to these faults. Just a single channel broken in the Network-on-Chip (NoC) may cause part of the communication stop and even deadlock, rendering the chip useless. Network-on-Chip (NoC) may also be needed for improving the chip yield. In this thesis, we present an approach for analyzing and improving fault tolerance aspects in NoC architecture. This is a necessary step to be taken in order to implement reliable systems in future technologies. We propose a self-repair method. Adding a local channel between adjacent Processing Elements (PEs) to implement fault-tolerant NoC, which will signi cantly improve the yield of the system. We use a simple RC formulation to calculate the wire delay of local channel. The wire delay is under 1ns and, so it is a ordable for using local channel. Also this local channel only connects adjacent PEs, which is closed on the mesh. We dont need to worry about the complexity of routing on the chip. Besides, the local channel between adjacent PEs can reduce the transaction time. We design and analyze fault-tolerant NoC on the many-core ESL simulation platform in SystemC. This ESL-platform have sixteen Processing Elements (PEs) based on NoC. We add two to four OCP interface in each Processing Element (PE) for local channel between adjacent PEs. Detail of this architecture will show in the thesis. The experimental results show the yield of fault-tolerant NoC and the latency overhead when the channels are broken. The SPLASH2 application represent that the latency overhead is about 1% to 2% when there are channels broken.