在本篇論文中,我們提出列共用的策略,結合過往共用因子(Factor Share)與分散式算數(Distributed Arithmetic)來建立多標準下低成本 DCT ( Discrete Cosine Transform )與 IDCT ( Inverse Discrete Cosine Transform ) 運算,此電路可支援多影像標準轉換,MPEG-4、H.264 與VC-1,包括 8 x 8 、8 x 4 、4 x 8以及 4 x 4 轉換。除此之外,我們利用 DCT 與 IDCT 係數矩陣的相似,在電路中以時間交錯的方式重複使用同一塊係數矩陣電路。不僅降低正向與逆向餘弦運算所需之面積成本,還可以連續運行一維與二維的 DCT 或 IDCT 轉換,維持高輸出率( Throughput Rate ),滿足即時 (Real-Time) 影像編碼的需求。並且我們提出了一個全新的平行一核心的電路,結合兩種架構的優點,低成本面積且高輸出率,延遲只需要68時脈,完成2個block 128筆的資料運算,此架構以 TSMC 0.18-um 的電路合成,在 Slow Model 下可顯示到125 MHz之操作頻率,且面積 39.5K 的邏輯閘可以達到500 M pixel/sec 之輸出率,我們的架構可支援 HDTV (1920 x 1080P@60Hz) 即時影像編碼。
In this thesis, a row share strategy, that combined factor share and distributed arithmetic are proposed to build low-cost DCT (Discrete Cosine Transform) and IDCT (Inverse Discrete Cosine Transform) transforms. The proposed architecture can support multi-standard transform, such as MPEG-4, H.264, and VC-1 including 8 x 8、8 x 4、4 x 8 and 4 x 4 transforms. Besides, based on the similarities of DCT and IDCT transforms, we reuse the same circuits to manipulate DCT and IDCT by interlaced sorting methods. Not only the cost of area is saved, but 1D DCT(IDCT) and 2D DCT(IDCT) are also operated continuously to reach the high throughput rate and meet the demands of real-time system. A new parallel structure core circuit is proposed to have the advantages of high-throughput rate and low-cost area compared with previous works. The proposed core requires 68 cycles in latency for 128 data consisted of 2 8x8 blocks. The proposed design uses a TSMC 0.18-um 1P6M CMOS process to implement this chip. In simulation, the operating frequency is 125MHz in slow model and achieves 500MHz throughput rate with 39.5K gate counts. The proposed core can support HDTV(1920 x 1080P@60Hz) in real-time video encoder.