可支援2048x1024高解析數位視訊之H.264/AVC標準功能解碼器設計與實現

H.264/MPEG-4 AVC是由Joint Video Team(JVT)所最新發展的視訊壓縮標準。相較於以往的標準MPEG-4, H.263, MPEG-2而言，H.264分別可以減少39%, 49%, 64%的資料量。由於H.264在壓縮率上的優異表現，它已被多種商業應用廣泛的採用，包括了數位電視方面（歐規DVB-T，日本HDTV）、下一代的DVD光碟標準（Blu-ray DVD與HD-DVD）與網路串流（Apple QuickTime）。 H.264在壓縮效能有很大進步，然而它所需要的運算量以及演算法的複雜度，都相當的高。在我們目標的視訊規格之下，需要超過83 GOPs的運算量以及超過70GBytes/sec的頻寬才能及時解碼。此外，由於演算法引入更複雜的運算單元（包括了先進的預測方法及去波塊雜訊濾波器等），更加提升了系統的複雜度。因此，要實現上述應用高解析度的解碼裝置，我們必需提出有效的系統設計。傳統的視訊解壓縮系統大多是以巨集方塊為管線設計之基礎。然而如果在設計H.264解碼系統時，仍使用傳統的設計法，會導致內建記憶體的浪費。在設計上，每個解碼模組也因為需要更強的演算特性而增加了挑戰性：對於高規格的系統，亂度解碼模組成為系統的瓶頸，但由於它的資料相依限制，使得設計者無法以傳統的平行處理去加速它；對於移動補償模組來說，頻寬要求因為可變方塊大小與四倍精準移動向量而大量上升，比起先前的MPEG-4 SP來說，頻寬上升了超過兩倍；以畫面為基礎的去區塊運算降低了硬體的使用率，雙重方向性的濾波導致了複雜的資料流向。我們提出了一個複合性的排程系統以解決以上的問題。平衡的排程與適當的平行度可解決大量且複雜的運算問題；解碼工作被適當的安排在次巨集方塊/巨集方塊/畫面階層的排程上，因此系統內建記憶體與外接記憶體頻寬可大量的減低。有效的模組設計也一併提出：亂度解碼模組可流暢的解出每個編碼符號而不用浪費額外的工作時脈，以達到高產出的解碼效能，CAVLD模組以很低的硬體代價就可升級到更高的解碼效能；移動補償模組方面受惠於我們提出的濾波視窗資料共用與濾波視窗大小動態調整的技巧，可以節省約60%的頻寬，利用我們提出的移動補償順序，資料重覆利用的暫存器可以得到很高的使用率；去區塊濾波器打破以畫面為基礎的演算法，以巨集方塊為單位進行工作，提高硬體利用率，此外我們提出了轉置暫存器矩陣配合上一維濾波器的架構，可解決複雜資料流向的問題。本論文利用TSMC 0.18μm 1P6M製程技術實做H.264解碼晶片。根據合成與佈局繞線結果，這顆原型晶片輯閘總數大約為22萬，核心大小為2.19x2.19mm2，最大的操作頻率可達120MHz。它可支援H.264/MPEG-4 AVC標準功能Level 4.1的即時解碼；每秒可處理接近25萬個巨集方塊；可即時解碼2048x1024每秒30張的視訊資料。它總共只需要約10KBytes的內建記憶體，以及16Mbytes的外接記憶體。當操作於120MHz，1.8伏特時，功率的消耗為186.4mW。與其它H.264解碼器相較，本論文提出之架構只需要較少的邏輯閘數與較小的記憶體需求。當畫面規格下降至176x144，每秒15張時，本架構只需要1.18mW的功率消耗。因此對於大畫面高規格或是手機無線通訊的應用，本架構都是合適的選擇。

關鍵字

多媒體；解碼器；視訊

並列摘要

H.264 is the newest video coding standard developed by the Joint Video Team (JVT). Compared with MPEG-4, H.263, and MPEG-2, H.264 can reduce 39%, 49%, and 64% of bit-rate, respectively. Because of its superior performance, H.264 has been widely adopted by commercial applications including digital TV broadcasting (European DVB-T and Japanese HDTV), next-generation DVD (Blu-ray DVD and HD-DVD), and network streaming (Apple QuickTime). The coding efficiency improvement of H.264 comes at the price of huge computation and complexity. For our targeted specification (baseline profile level 4.1), the computation of more than 83 Giga-instructions per second and the bandwidth of more than 70 Giga-bytes per second are required. Moreover, new functions such as advanced prediction schemes and deblocking filter increase the complexity of the system. To fulfill the requirements of H.264 high definition applications, an efficient system design is very necessary. Traditional video decoding hardware designs are mostly based on macroblock pipeline. However, if this traditional design methodology is directly adopted in H.264 decoder design, much on-chip memory is wasted. New features of coding tools also make the module-wise design very challenging. For ultra high-end applications, the entropy decoder becomes the throughput bottleneck, while intuitive parallel processing techniques are not applicable to speed up the entropy decoder due to its context-based adaptive nature. Because of variable block sizes and quarter-pixel-precision motion vector features, the motion compensated inter prediction module consumes bandwidth of more than three times that of previous standard MPEG-4 SP. The frame-based deblocking operation seriously degrades system hardware utilization and the deblocking filtering has to be supported in two directions (horizontal and vertical) leading to complex data flow and control. We propose a hybrid task pipelining system to address these crucial issues. Balanced pipelining schedules and proper degrees of parallelism are contributed to deliver the huge and complex computation capability. Block-level, macroblock-level, and macroblock/frame-level pipelining schedules are arranged for CAVLD/IQ/IT/INTRA_PRED, INTER_PRED, and DEBLOCK, respectively. As a result, the resulted internal pipeline memory as well as the bandwidth consumption can be significantly reduced. Moreover, efficient modules are provided. The entropy decoder unit smoothly decodes bitstream into symbols without bubble cycles thus high decoding throughput can be achieved, and the proposed CAVLD unit can be extended to higher parallelism with low area overhead because only the Level table and the Run table are modified. The proposed memory access scheme of Interpolation Window Reuse (IWR) and Interpolation Window Classification (IWC) of the motion compensated inter prediction unit saves 60% of external memory bandwidth, and the proposed processing order of 4x4-blocks for inter prediction enables high utilization of the reuse buffer. DEBLOCK unit breaks the frame-level deblocking operation to macroblock-level operations so that the hardware utilization can be greatly increased. Our proposed transpose array combined with 1-D filter solves the complex data flow and control problem. A prototype chip is implemented using Artisan standard CMOS cell library with TSMC 0.18um 1P6M technology. The total gate count is about 217K synthesized at 120 MHz. It can support H.264/MPEG-4 AVC decoding in baseline profile level 4.1 with five reference frames. The maximum processing capability is 246K macroblocks per second or 2048x1024 4:2:0 30Hz video. Totally about 10 Kbytes on-chip memory and 16 Mbytes off-chip memory are required. The core size is 2.19x2.19 mm2. The average power dissipation is 186.4 mW when operating at 120 MHz with 1.8 V power supply. Compared to other H.264 decoder works, the proposed design requires less gate count and less on-chip memory. Therefore it is a good choice to be integrated into high definition video decoding applications. When the specification is down to QCIF (176x144), 15Hz video, our chip can deliver real-time decoding at 725 KHz with 1.8 V power supply and only consumes power of 1.18 mW. This low power feature makes our design also suitable for the mobile applications.

並列關鍵字

H.264 ； MPEG-4 ； decoder

參考文獻

[4] Joint Video Team, ITU-T Recommendation H.264: Advanced video coding for

[5] T. Wedi, “Motion compensation in H.264/AVC,” IEEE Transactions on Circuits

Transmission, 2002.

prediction,” IEEE Transactions on Circuits and Systems for Video Technology,

[8] M. Flierl and B. Girod, “Generalized B pictures and the draft JVT/H.264 video

國際替代計量

可支援2048x1024高解析數位視訊之H.264/AVC標準功能解碼器設計與實現

全文下載

主題瀏覽