透過您的圖書館登入
IP:18.116.118.198
  • 學位論文

實現H.264可變區塊移動估測單元之高效能超大型積體電路架構

An Efficient VLSI Architecture for H.264 Variable Block Size Motion Estimation

指導教授 : 黃文吉
若您是本文的作者,可授權文章由華藝線上圖書館中協助推廣。

摘要


本論文針對H.264可變區塊移動估測單元提出了一個有效率和彈性化的VLSI架構,可以針對4×4區塊大小及其整數倍數區塊大小的區塊,執行全區搜尋區塊比對演算法。本架構將會把在原始畫面中的每一個16×16大小的巨區塊(Macroblock)切割成16個沒有互相重疊4×4大小的子區塊,稱為基本子區塊(primitive subblocks),本架構中包含16個模組和一個可變區塊移動估測處理器(VBSME processor)。每一個模組中,我們利用串接一維心脈陣列(cascading 1D systolic array)來針對不同基本子區塊的執行區塊比對的動作,這樣的串接一維心脈陣列可以讓本架構有高度的計算吞吐量、高度的彈性化和百分之百的處理單元利用率,每一個基本子區塊皆會同時執行移動估測的動作,並利用這16個基本子區塊組合出41個不同大小的子區塊,在本架構中,我們利用可變區塊移動估測處理器(VBSME processor)由基本子區塊計算所得之絕對誤差總合(SAD)同時計算出所有41個子區塊的絕對誤差總合(SAD)。本論文所出的新架構和已發表的H.264可變區塊移動估測架構相比有著較低的計算延遲和高度的計算吞吐量。

並列摘要


This paper proposes a novel flexible VLSI architecture for the implementation of variable block size motion estimation(VBSME).The architecture is able to perform a full motion search on integral multiples of 4×4 block sizes. To use the architecture, each 16×16 marcoblock of the source frames should be partitioned into sixteen modules and one VBSME processor. Each module, realized by cascading 1D systolic arrays, is responsible for the block-matching operations of a different primitive subblock. The realization has the advantages of high throughput, high flexibility and 100﹪ processing element (PE) utilization. The motion estimation of all the primitive subblocks are performed in parallel. These primitive subblocks are used to form 41 subblocks with different sizes. We use the VBSME processor to concurrently compute the sums of absolute differences (SADs) of all the 41 subblocks from the SADs of the primitive subblocks. This new architecture has lowest latency and highest throughput over other existing VBSME architectures for the hardware implementation of H.264 encoders.

參考文獻


[2] J.F. Shen, T.C. Wang and L.G. Chen, “A novel low-power full-search block-matching motion-estimation design for H.263+,” IEEE Trans. Circuits and Systems for Video Technology, pp.890-897, 2001.
[3] L. de Vos and M. Schobinger, “VLSI architecture for a flexible block matching processor,” IEEE Trans. Circuits and Systems for Video Technology, Vol.5, pp.417-428, 1995.
[4]Z. He, M. L. Liou, Philip.C.H. Chan, and R. Li,“An Efficient VLSI Architecture for New Three-step Search Algorithm,” Proceeding of the 38th IEEE Midwest symposium on Circuits and Systems, vol.2, pp.1228-1231, 1996.
[5]P. Pirsch, “VLSI Architectures for Video Compression-A Survey,” Proc. IEEE, Vol.83, pp.220-246, 1995.
[6] A.P. Chandrakasan and R.W. Brodersen, “Minimizing Power Consumption in Digital CMOS Circuits,” Proceedings of the IEEE, Vol. 83, pp.498-523, 1995.

延伸閱讀