透過您的圖書館登入
IP:18.225.98.93
  • 學位論文

低延遲、傾斜與串音時脈樹合成策略之研究

A Study of Clock Tree Synthesis Strategies for Delay, Skew and Crosstalk Reduction

指導教授 : 蔡加春 李宗演

摘要


在奈米半導體積體電路製程技術,內部連線延遲比邏輯閘的延遲更會影響電路的臨界路徑和晶片的效能,而時脈週期亦取決於內部連線的延遲。然而,當晶片的工作頻率超過1GHz時,時脈連線的電感效應就不能再被忽視,並且此電感效應已經被證明會影響電路的效能,因此,計算連線延遲應以RLC連線延遲模型為考量。 時脈延遲與時脈傾斜是影響晶片系統設計的兩個主要因素。一個系統晶片是由多個IP或模組所組成,其時脈網路可被分割成多重時域的時脈,且每一個子時脈網路亦包含數個IP和模組,而我們也可適時地切換不同時域的時脈,以降低系統晶片的功率消耗。這樣的時脈網路結構下,也因此造成時脈繞線在晶片系統設計的複雜性,在時脈合成中如何達到最小的時脈延遲與時脈傾斜及最低的串音影響,將是在晶片系統設計扮演關鍵的地位。本論文中,我們提出了在晶片系統實體設計中之一些時脈繞線策略,包含有RLC延遲模型的連線分析、時脈中點的數值計算法、結合灰色理論和DME的時脈繞線法、RLC時脈樹中插入緩衝器的建構和減少時脈樹的串音干擾等。 首先,我們以目前的RLC延遲模型做連線分析,利用RLC樹的二階轉換函數與數值延遲模型,配合動差與結合LU矩陣的分解,提出另一種延遲模型分析,並使用最小平方曲線逼近法,得到兩個不同阻尼因素的經驗延遲公式。經實驗結果,在RLC線段方面,我們延遲時間比Elmore、CPC、IFN 和 LW等延遲模型絕對誤差總平均準確15.91%;在RLC時脈樹方面,我們比LW和 IFN延遲模型總平均絕對誤差準確了3.24%. 接著,我們提出時脈中點的數值計算法,把分佈均勻的RLC等效線段轉換成RC等效線段的延遲模型,如此能使兩子樹的時脈中點尋求可由數值計算法成功地由下往上找出零傾斜的合成樹,利用此程序可以遞迴方式而建構出零傾斜多層的RLC 時脈樹。我們使用DME的繞線方法配合標準例子和HSPICE比較延遲的準確度,結果在時脈傾斜和時脈延遲的絕對誤差分別只有0.016% 與 0.51%。 進一步,我們完成了GDME利用灰色關聯度和DME時脈繞線成功地建構出RLC時脈樹,首先使用灰色關聯度分析在晶片系統中的時脈端聚集,藉著各IP時脈輸入端的座標、負載電容、內部的延遲和傾斜等因素來計算時脈端的配對,然後採用DME由下往上及由上往下建構出RLC 時脈樹,經由標準例子測試,實驗結果顯示GDME和HSPICE在時脈傾斜和延遲的比較分別只有0.017% 和 0.2%的誤差,和其它DME方法在總連線長度比較總平均改進3.58%。 另外,我們提出緩衝器驅動的零傾斜RLC時脈樹,結合了RLC延遲模型、零傾斜中點法和緩衝器插入等,我們利用單元緩衝器來截斷時脈繞線在成長建構中所產生的非零傾斜的遞增現象,而建構出零傾斜的RLC時脈繞線樹,經由標準例子測試,實驗結果顯示具有插入緩衝器方法在時脈延遲可改進達97%,和LTM-MMM-AWA/DME and LTM-GMA-AWA/DME比較在總連線長度分別改進10%和2%,和IDME比較在最大時脈延遲改善23.04%。 最後,我們提出耦合管理演算法來減少時脈繞線的串音,使用兩個RLC時脈繞線先分析出有無受串音影響的時脈傾斜和時脈延遲的參數,並以串音參數大小之判斷值結合重新繞線的方法,完成最小串音的時脈繞線,實驗證明顯示此方法在考量串音影響之時脈延遲和傾斜能分別有效地改善4.4% and 20%

並列摘要


In nanometer IC (integrated Circuit), interconnection delay rather than gate delay dominants critical path delay and chip performance. Clock period is consumed by interconnection delay. Moreover, current chips operated over 1GHz. Inductance in clock routing wires can no longer be ignored. They have been proven to affect the circuit performance. Thus, the calculation of propagation delay has to employ an RLC delay model. Clock delay and clock skew are critical for SoC (system-on a chip) design. Since an SoC consists of a number of IPs (Intellectual properties) and modules, its clock system may be partitioned into multiple domains each covering several IPs and modules. We can turn off certain domains to reduce power consumption in idle modules. Consequently, clock routing in SoC is complicated and how to synthesize a clock routing with minimal clock delay, clock skew, and crosstalk is critical for SoC. In this dissertation, we propose several clock routing strategies including interconnection analysis of RLC delay model, numerical-based tapping point search, mixed Grey theory and DME clock routing, buffered RLC clock tree construction, and coupling-aware crosstalk reduction for clock synthesis in SoC physical design. Firstly, we give the interconnection analysis for few existed RLC delay models and propose a numerical delay model based on second-order transfer function of an RLC tree. Combining LU decomposition matrices associated with matching moments, we derive two empirical delay formulas for different damping factors using the least squares curve fitting to obtain time domain responses of all the branches of an RLC tree. Compared with SPICE simulation, experimental results show that our delay model has the accuracy of 15.91% in total absolute average error to single RLC section than other delay models, Elmore, CPC, IFN and LW, and the accuracy of 3.24% in total average error to all the sinks in an RLC interconnected tree than LW and IFN models. Secondly, we present a new numerical method based on uniformly distributed RLC model for tapping point search. The model simplifies an RLC wire to be the equivalent RC-based delay model such that a tapping point for two RLC-based subtrees can be accurately formulated in numerical approach. With the bottom-up recursion for two-based subtrees, tapping points can be successively determined by the numerical formulas to form a new zero-skew merged tree. This procedure is recursively operated and propagated to upgraded levels to get a zero-skew multi-level RLC clock tree. Benchmarks are tested by our approach associated with DME algorithm in linear running time and experimental results compared with Hspice show the absolute average errors of only 0.016% and 0.51% in skew ratio and critical delay, respectively. Thirdly, we combine Grey relation with DME, called GDME, to successfully construct the RLC clock tree. Grey relational analysis is first used to predefine the clustering match of clock sinks in SoC. The parameters of each IP’s clock sink, location, capacitive load, intrinsic delay, and intrinsic skew are accounted into the determination of each pair-sink matching. Then, DME algorithm based on bottom-up and top-down phases is applied to construct a clock tree. Benchmarks are evaluated by our GDME algorithm and experimental results compared with Hspice show the absolute average errors of 0.017% and 0.2% in terms of skew and delay, respectively. The results compared with other DME methods have the improvement of up to 3.58% on average in total wire length. Fourthly, we propose a zero-skew driven buffered RLC clock tree construction. The techniques of RLC model, exact-zero skew, and buffer insertion are counted into our approach. We insert unit-size buffers into each level of clock tree to interrupt the non-zero skew upward propagation and, thus can enable the reliable construction of a buffered RLC clock tree with zero skew. Experimental results for testing benchmarks show the improvement of up to 97% in terms of path delay than that of no any buffer insertion. The results compared with LTM-MMM-AWA/DME and LTM-GMA-AWA/DME approaches have the savings of 10% and 2% respectively on average in total wirelength and, compared with IDME method achieve an average improvement of up to 23.04% in terms of clock delay. Finally, we investigate a coupling-aware algorithm to reduce the crosstalk of clock routings. We conduct two-clock RLC-based routings and give empirical experiments without/with considering crosstalk interaction to prove that clock delay and clock skew would be degenerated due to crosstalk interaction between routing interconnections. The proposed coupling-aware algorithm is used to reroute two clock routings with crosstalk minimization. Experimental results show that clock delay and clock skew can be improved up to 4.4% and 20%, respectively, than that of no any consideration of crosstalk reduction.

參考文獻


[97] Chia-Chun Tsai, Jan-Ou Wu, Yen-Chun Lu, Wen-Ta Lee,
[80] P. Saxena and C.L. Liu, “Crosstalk minimization
tree design methodology for ASIC designs,”
“Application of Grey relational analysis to minimal
[49] Xiao-Chun Li, Jun-Fa Mao, and Hui-Fen Huang,

延伸閱讀