Optimization of Memory Access for H.264/AVC Decoder on Embedded DSP Core

隨著時代的進步，科技也日益發達，人們對於多媒體的需求也越來越高。當多媒體影像的解析度越來越高的時候，我們就必須要用更大的記憶體空間去儲存或者是用更有效率的影像編碼技術去壓縮影像資料。隨著最新的影像編碼技術H.264/AVC的出現，它提供了比之前編碼壓縮技術如MPEG-2、H.263更好的壓縮效率與影像品質。相對的H.264/AVC計算複雜度也較之前的編碼壓縮技術高上許多，因此如果要以軟體方式實現一個H.264/AVC即時解碼器的話，將會需要一個運算能力更強大的處理器與更有效率的演算法。而主要H.264/AVC解碼器的效能瓶頸則是在記憶體的頻寬上，比起處理器時脈的快速發展，記憶體的存取速度卻沒有類似的提升；這個原因侷限了H.264/AVC解碼器的效能。有幾種方法可以去改善記憶體頻寬的問題，第一個就是提升記憶體的存取速度;第二個是使用階層式記憶體架構，藉由快取記憶體(cache)方式來增快資料存取的速度；最後一個則是改善演算法來降低在匯流排中傳送資料的數量。本論文在此提出一個新的H.264/AVC解碼流程，它可以針對具有快取記憶體的嵌入式系統，大量減少資料在外部匯流排傳輸的次數。藉由整合去區塊效應濾波器(Deblocking Filter, DF)與離散餘弦反轉換(Inverse Discrete Cosine Transform, IDCT)、反轉量化(Inverse Quantization, IQ)，可以大量減少資料在匯流排上之傳輸。但因為這樣的整合，我們必須使用額外的記憶體去記錄Intra預測子(Intra predictor)；相較於之前的方法，透過我們提出的架構可以節省約44%的Intra預測子。最後我們把提出來的架構實現在Starfish SoC平台上，這是一個由國立清華大學與國立交通大學共同合作設計出來的一個低功率高效能的數位訊號處理器。實驗結果顯示透過我們的方法，當H.264/AVC解碼器在解碼時，外部匯流排傳送的資料量可以減少35.5%。

關鍵字

H.264/AVC ；解碼器最佳化；記憶體存取；嵌入式數位訊號處理器

並列摘要

H.264/AVC standard provides enhanced coding efficiency for a wide range of application. It gives better compression efficiency than other existing video coding standard. But the computation complexity of H.264/AVC decoder is higher than others, thus a software-based real-time decoder requires a powerful processor and more efficient algorithms. The major performance bottleneck of software-based H.264/AVC decoder is memory bus bandwidth. Because the H.264/AVC reference software spends too much time for memory access and data transfer, so it’s necessary to deal with memory bandwidth. There are three ways to deal with performance bottleneck. One is to increase the memory bandwidth. Another is using the memory hierarchy structure to speedup the memory access time. The other way is to reduce number of data transfer on external memory bus. This thesis proposes a method for H.264/AVC software-based decoder to reduce the number of memory accesses especially for memory cache based DSP processor. Our method incorporates deblocking filter with IDCT&IQ process, thus we could reduce unnecessary load/store from external memory. According to this decoding flow, we have to add extra predictor memory for intra prediction. But we can save nearly 44% predictor memory compared with former scheme. Furthermore, we implement the H.264/AVC baseline profile decoder on the Starfish DSP platform. The Starfish DSP is a low power and high performance embedded DSP core developed by National Tsing Hua University (NTHU) and National Chiao Tung University (NCTU). The experimental results show that the cycles of memory access for the data transfer reduced by 35.5%.

並列關鍵字

H.264/AVC ； decoder optimization ； Memory Access ； Embedded DSP

參考文獻

[8] J. Lou, A. Jagmohan, D. He, L. Lu and, M.T. Sun, “High Speed H.264 High Profile Deblocking using Statistical Analysis and Logic Optimization”, IEEE International Conference on Multimedia and Expo, pp. 1918-1921, July 2007.

[9] H. Yadav and K. R. Rao, “Optimization Of The Deblocking Filter In H.264 Codec For Real Time Implementation”, IEEE International Symposium on Communications and Information Technologies, pp 932-936, Sept. 2006.

[11] Q. Xue, J. Liu, S. Wang, and J. Zhao, “H.264/AVC baseline profile decoder optimization on independent platform”, IEEE International Conference on Wireless Communications, Networking and Mobile Computing, vol. 2, pp. 1253-1256, Sept. 2005.

[12] C. H. Kuo, G. C. Huang, L. C. Chang, and B. D. Liu, “Source code flow optimization for H.264/AVC video decoder implementing on a low-cost embedded system platform”, IEEE Region 10 Conference, pp. 1-4, 2007.

[13] Y. H. Moon, I. K. Eom, and S. W. Ha, "Efficient memory architecture for fast total_zeros decoding in H.264/AVC CAVLC decoder", IEEE International Conference on Multimedia and Expo, pp 65-68, April 2008.

國際替代計量

Optimization of Memory Access for H.264/AVC Decoder on Embedded DSP Core

全文下載

主題瀏覽