透過您的圖書館登入
IP:216.73.216.100
  • 學位論文

一個針對全文自適應二進位算術編碼解碼器的高效能全硬體化設計研究

A High-Performance Fully Hardwired Architecture Design for Context-Based Adaptive Binary Arithmetic Codec

指導教授 : 林永隆

摘要


H.264/AVC 的主規格中使用全文自適應二進位算術編碼(CABAC)以獲取較高的壓縮比,但和傳統的可變長度編碼來比,需要較多的計算運算量,而且受限於位元運算間的高度相依性,因此為了能處理超高解析度的影像例如QFHD(3840x2160),只對部分的運算作硬體化的加速是沒辦法達到及時運算的要求,因此需要對CABAC做全硬體化的實現。 在對要處理的影像資料單元(syntax element)和影像資料單元內的位元子(bin)作分佈的分析後,我們發現係數方塊(coefficient block)內的影像資料單元(syntax element)及移動向量差(MVD)的影像資料單元佔最多位元子(bin),因此在我們所實現的CABAC架構必須能有效率的處理這些影像資料單元(syntax element),之後我們又分析CABAC演算法中資料的相依性,我們結論出在二進位算術編碼器(BAE)適合用管線(pipeline) 架構,二進位算術解碼器(BAD)則不適用。 在全文自適應二進位算術編碼器中,我們設計了六級管線化(pipeline)架構的二進位算術編碼器(BAE),此二進位算術編碼器(BAE)最多每個週期可處理八個位元子(bin),為了跟的上二進位算術編碼器(BAE)的處理能力,我們提出了許多加速處理能力的方法來加速產生位元子(bin)和位元子(bin)相對應的全文模型值(context index),此外我們又提出一個新奇的架構來縮短再正規化變數值(renormalization)和產出位元串時電路的最長路徑,經過實驗分析後,我們的設計每個週期可對1.33個位元子(bin)作編碼,因此每秒可處理295百萬位元子(bin),當跑在222 MHz的工作頻率時我們設計的效能可以及時處理每秒30張QFHD(3840x2160)的影像,故而我們的設計可支援到H.264/AVC主規格的程級5.1,我們現在也順利地將所設計的CABAC編碼器整合到我們的H.264/AVC編碼系統之中。 在全文自適應二進位算術解碼器中,我們提出了一個高效能的二進位算術解碼器(BAD),針對較常出現的影像資料單元(syntax element)此解碼器每個週期可處理兩個位元子(bin),為了增加此解碼器(BAD)的使用率,我們又提出一個有效的預測方法來預測第二個位元子(bin)的型態,此外透過重新安排全文模型記憶體(context memory)的資料編排方式來減短解碼器(BAD)電路的最長路徑,經過實驗分析後,我們的設計每個週期可對1.25個位元子(bin)作解碼,當跑在238 MHz的工作頻率時他的效能足以及時處理每秒30張QFHD(3840x2160)的影像,故而我們的設計可支援到H.264/AVC主規格的程級5.1,我們現在也順利地將所設計的CABAC解碼器整合到我們的H.264/AVC解碼系統之中。

並列摘要


Context-based Adaptive Binary Arithmetic Coding (CABAC) adopted by H.264/AVC main profile achieves high compression ratio in comparison with a traditional variable-length coding. However, it incurs high computational complexity and its throughput is limited by bit-level data dependency. Moreover, for ultra high resolution applications, e.g. QFHD (3840×2160), a partially hardwired architecture cannot meet the real-time requirement. Therefore, it is necessary to implement the CABAC function in a fully hardwired architecture. After analyzing the syntax elements (SE) distribution and the bin distribution of different types of SEs, we found that bins of coefficient block SEs and motion vector difference (MVD) SEs account for most of the bins. Therefore, we realized that our design must process these SEs efficiently. Furthermore, by analyzing the data dependency of CABAC algorithm, we concluded that a pipelined architecture is suitable for the binary arithmetic encoder (BAE), but not for the binary arithmetic decoder (BAD). For the CABAC encoder, we designed a six-stage pipelined BAE which can encode up to eight bins per cycle. In order to keep up with the BAE throughput, we propose several acceleration methods to speed up the generation of bins and context indices. We further propose a novel architecture that shortens the critical path of renormalization and bit-stream generation. Simulation results show that our design can encode 1.33 bins per cycle, and it achieves a throughput of 295 Mbin/sec. It can real-time encode QFHD (3840×2160) video at 30fps for H.264/AVC main profile, level 5.1 when running at 222 MHz. We have successfully integrated the proposed CABAC encoder into an H.264/AVC encoder system. For the CABAC decoder, we propose a Two-Bin BAD engine to generate two bins in one cycle for the frequent SEs. In order to boost the BAD utilization, we propose a prediction method to enhance the prediction accuracy of the second bin. Furthermore, we reallocate the context memory to shorten the critical path delay of the Two-Bin BAD circuit. Experimental results show that our CABAC decoder can generate 1.25 bins per cycle. Its throughput is capable of real-time decoding QFHD video for H.264/AVC main profile, level 5.1 when running at 238 MHz. We have successfully integrated the proposed CABAC decoder into an H.264/AVC decoder system.

參考文獻


[4] D. Marpe, H. Schwarz, and T. Wiegand, “Context-based Adaptive Binary Arithmetic Coding in the H.264/AVC Video Compression Standard,” in IEEE Transactions on Circuits and Systems for Video Technology, pp. 620-636, July. 2003.
[5] K. Sayood, Introduction to Data Compression. San Francisco: Morgan-Kaufmann, 2006.
[8] Y. J. Chen, C. H. Tsai, and L. G. Chen, “Architecture Design of Area-efficient SRAM-based Multi-bin Arithmetic Encoder in H.264/AVC,” IEEE International Symposium on Circuits and Systems, pp. 2621-2624, May 2006.
[10] P. S. Liu, J. W. Chen, and Y. L. Lin, “A Hardwired Context-based Adaptive Binary Arithmetic Encoder for H.264 Advanced Video Coding,” IEEE International Symposium on VLSI Design, Automation, and Test, pp. 1-4, April 2007.
[11] R. R. Osorio and J. D. Bruguera, “High-throughput Architecture for H.264/AVC CABAC Compression System,” IEEE Transactions on Circuits and Systems for Video Technology, pp. 1376-1384, November 2006

延伸閱讀