透過您的圖書館登入
IP:3.133.12.172
  • 學位論文

適用於HEVC之270MHz 4Kx2K@60fps整數像素移動估測設計

A 270MHz 4Kx2K@60fps Integer Pel Motion Estimation Design for High Efficiency Video Coding

指導教授 : 張添烜

摘要


在視訊編碼過程中,整數移動估測(ME)是最複雜的,並且也是即時影像編碼的瓶頸,尤其在最新的影像編碼標準HEVC中,因為遞迴式的編碼結構,更大的預測單位大小(PU),和高等移動向量預測方法(AMVP) ,使ME具有相當高的複雜度和大量的記憶體頻寬。 為了要符合即時編碼的需求,本篇論文將會展示一個有效率的整數移動估測積體電路設計。我們的設計首先會省略任何大於16×16非正方形的PU的AMVP,並且採用一個針對PU大小為16×16,16×8,8×16,和8×8的五搜尋步驟的預測性強化區域搜尋法(EPZS),這兩個方法可以大幅降低搜尋點數數量達78.1%並且維持一定的編碼效果。而硬體架構上則使用交錯不同PU大小的AMVP和預測性EPZS排程,而大於16×16的AMVP結果則由16×16為計算單位組成,這些方法可以提高硬體使用效率並且解決的資料相依性的問題。而為了提高快速演算法的資料重複利用和硬體的簡單性,我們使用兩組8-way集合連結快取記憶體特性的暫存器,分別用於AMVP和PEPZS,且使用較小的tag位置標示。 從結果可以得到我們提出的演算法與HEVC的HM 6.0對照的BDrate表現,在YUV成分分別有1.3%,1.4%,及1.6%的降低。我們設計的硬體以TSMC 90nm的技術合成,需要279K邏輯閘數目量及8K位元組的晶片內建記憶體,在工作頻率為270MHz的情況下,以處理畫面大小為4Kx2K,每秒60張畫面的影片。

並列摘要


Motion estimation (ME) processing is the most complex part and the bottle neck of a real time video encoder due to its heavy complexity, and large memory bandwidth, especially for the latest video coding standard, High Efficient Video Coding (HEVC), due to its recursive coding structure, larger prediction unit (PU) size, and advanced motion vector predictors (AMVP). To meet real time demands, this thesis presents an efficient VLSI ME implementation. This design first skips non-square size AMVP for PU size larger than 16×16 and then adopts a 5-step predictive EPZS (Enhanced Predictive Zonal Search) algorithm only for PU size 16×16, 16×8, 8×16, and 8×8 to reduce the search points significantly by 78.1% while maintain the coding performance. The architecture design uses interlaced AMVP and predictive EPZS scheduling for different PU size and the 16×16 PU based partial AMVP computation for PU size larger than 16×16 to maximize hardware utilization and overcome the data dependency problem. To maximize data reuse while keep design simple for such fast algorithm, the proposed design uses separated 8-way set associative cache based search buffers for AMVP and predictive EPZS with reduced tag address indexing. The simulation result illustrates the BDrate performance drop by 1.3%, 1.4%, and 1.6% for Y, U, and V component separately, when compared to HEVC reference software HM 6.0. The presented design with 90 nm CMOS process costs 279K logic gates and 8K bytes of on-chip memory and is capable of processing 4Kx2K 60fps video when running at 270 MHz.

參考文獻


[5] R. Li, B. Zeng, and M. L. Liou, “A new three-step search algorithm for block motion estimation,” IEEE Trans. Circuits Syst. Video Technol., vol. 4, pp. 438–442, Aug. 1994.
[6] S. Zhu and K.-K. Ma, “A new diamond search algorithm for fast blockmatching motion estimation,” IEEE Trans. Image Processing, vol. 9, pp.287–290, Feb. 2000.
[7] J. Y. Tham, S. Ranganath, M. Ranganath, and A. A. Kassim, “A novel unrestricted center-biased diamond search algorithm for block motion estimation,” IEEE Trans. Circuits Syst. Video Technol., vol. 8, pp. 369–377, Aug. 1998.
[8] C. Zhu, X. Lin, and L. P. Chau, “Hexagon-based search pattern for fast block motion estimation,” IEEE Trans. Circuits Syst. Video Technol, vol. 12, pp. 349–355, May 2002.
[9] A. M. Tourapis, “Enhanced predictive zonal search for single and multiple frame motion estimation,” in Proc. VCIP, Jan. 2002, pp. 1069–1079.

延伸閱讀