雙核心系統晶片平台上H.264解碼器的多個程式模型分析

H.264/AVC [1]是一個近幾年非常受歡迎的國際標準，它是由 ITU-T VCEG (Video Coding Experts Group) 與ISO/IEC MPEG (Moving Picture Experts Group)所共同制定的高度壓縮視訊編碼標準。相較於之前的標準Ｈ.264/AVC有較高的運算複雜度，因此我們使用了非對稱(asymmetric)雙核心平台實做H.264/AVC decoder，期望能透過dual-core的效能實做出快速的h.264/AVC decocer。在dual-core的平台實作有效率的H.264/AVC解碼器有以下幾點須考慮 1. software partition：哪些程式應該在MPU上執行，哪些程式應該在DSP上執行，平均分配會需要較多的平行控制。 2. Data movement：processor之間常需要做資料交換，例如在DSP上decode完的資料必須搬到ARM上做display，因此就需要做資料搬移的動作，我們採用DMA或MPU搬移資料。 3. synchronization：ARM跟DSP有固定的function需要在特定的時機點執行，因此需要做兩邊的平行控制，以通知對方做特定工作，我們使用interrupt和polling兩種方式。因此我們考量以下幾點實做三個h.264 decoder programming model於dual-core SoC platform上，並利用software pipeline最佳化程式流程並且提出效能分析與優缺點比較。以下為我們分析的三個programming model 1. MPU decode full Entropy – polling programming model 2. MPU decode full Entropy – interrupt programming model 3. MPU decode partial Entropy – interrupt programming model 我們的實驗環境為ARM上有執行OS的情況，polling的 programming model較不適合使用DMA，因為DMA的設定需在kernel mode執行，而polling的程式與buffer都存在user space，使用DMA需花費時間在user space與kernel space資料搬移以及system call呼叫，因此在我們的論文中polling隱含不使用DMA，interrupt會使用DMA做資料搬移。經實驗結果發現DSP真正執行演算法的時間只有1.8s，而等待資料時間太久導致解碼速度減慢，因此使用MPU decode full Entropy – interrupt programming model能有效提升解碼效能，但由於interurpt次數太多需花較多的overhead以及需做user space和kernel space資料搬移，此外實做部分的Entropy decode程式在DSP上會導致code size太大而使解碼時間增加很多，但透過程式流程的改善可以有效減少cache miss次數。

關鍵字

h.264 ；雙核心；程式模型；雙核心溝通

並列摘要

H.264/AVC [1] is an extremely popular international standard of digital vieo compression in recent years, which is developed by ITU-T VCEG (Video Coding Experts Group) and ISO/IEC MPEG (Moving Picture Experts Group ) . H.264 / AVC has higher computed complexity compared with previous standard, so we use the asymmetric dual-core SoC platform to implement H.264/AVC decoder. On the dual-core platform, there are three points that shoud be think about with a view to implement efficient H.264/AVC decoder 1. software partition: Which procedures should execute on MPU? Which procedures should execute on DSP? Average allocation needs ore synchronizing control. 2. Data movement: dual-core needs to do a lot of data exchange, for example the restructed data that DSP decoded need to be moved to external memory in order to display later. We adopt DMA or MPU to move the materials. 3. synchronization: MPU and DSP both have some procedures which shoulde be executed in the specific opportunity, so need to do synchronizing control. We adopt polling or interrupt. We consider this three points to implement three h.264 decoder programming models on dual-core SoC platform and utilize software pipeline to increase parallelism. The following is three programming models that we put forward 1. MPU decode full Entropy - polling programming model 2. MPU decode full Entropy - interrupt programming model 3. MPU decode partial Entropy - interrupt programming model In our experiment environment, we are running an embedded linux on MPU. MPU decode full Entropy - polling programming model is relatively unsuitable to use DMA, because the setup of DMA shoule execte in kernel mode. However, the procedure and buffer of MPU decode full Entropy - polling programming model are all in user space, so using DMA need to spend a lot of time copying data and doing system call. Consequently, in the fist programming model we don’t use DMA to move data. The experimental results show that the time that DSP spend on decoding is only 1.8s. Other time is spended on waiting data. Therfore, MPU decode full Entropy - Interrupt programming model can improve the efficiency of decoding, but need additional overhead of onterrup processing and data copy between user space and kernel space. In MPU decode partial Entropy - interrupt programming model, the code size of DSP instruction is too big. It leads to a lot of cache misses and makes decoding time increase.The number of cache misses can be reduced through the design of decoding flow.

並列關鍵字

h.264 ； dual-core ； programming model ； dual-core communication

參考文獻

[2] To-Wei Chen, Yu-Wen Huang, Tung-Chien Chen, Yu-Han Chen, Chuan-Yung Tsai and Liang-Gee Chen,” Architecture Design of H.264/AVC Decoder with Hybrid Task Pipelining for High Definition Videos.” In Proceedings of IEEE International Symposium on Circuits and Systems (ISCAS 2005), Kobe, Japan, 2005.

[4] Xing Qin, Xiaolang Yan” A Memory and Speed Efficient CAVLC Decoder” in

[5] Yong Ho Moon, Gyu Yeong Kim, and Jae Ho Kim, Member, IEEE” An Efficient Decoding of CAVLC in H.264/AVC Video Coding Standard” in IEEE Transactions on Consumer Electronics, Vol. 51, No. 3, AUGUST 938 2005

[7] Jian-Liang Luo “Implementation and Optimization of H.264 baseline profile decoder on PACDSP dual core platform”

[8] Cheng-Nan Chiu, Chien-Tang Tseng, and Chun-Jen Tsai “TIGHTLY-COUPLED MPEG-4 VIDEO ENCODER FRAMEWORK ON ASSYMETRIC DUAL-CORE PLATFORMS”

國際替代計量

雙核心系統晶片平台上H.264解碼器的多個程式模型分析

未授權

主題瀏覽