透過您的圖書館登入
IP:3.144.48.3
  • 學位論文

Accelerating H.264 Decoder on an Mulit-core Platform

在多核心平台上加速H.264解碼器

指導教授 : 李政崑

摘要


The increasing demand of high performance with applications in embedded devices pose challenges on embedded systems. A natural way to tackle this problem is the use of multi-core systems. In this work, we use H.264 decoder as a case study to show how one can tackle this problem with embedded multi-core systems. The target platform is the domestic (System-on-Chip)SoC, which is called PACDUO and it is with a ARM MPU and two PACDSPs. In addition, the Android platform is also ported on it. In the past, we have developed a set of intrinsics on the PACDSPs, which can let the user to write more ecient codes. In other words, we can exploit SIMD by using these intrinsics to accelerate our programs. Although we can accelerate programs with SIMD, we still have to dispatch the functions we attempt to speed up to PACDSPs. These functions should be independent according to its input data as well as output data. Then the data must be well provided on the external memory in case of insucient local memory of PACDSPs. To gain the performance of the H.264 decoder, the following should be taken in consideration: rst, Remote Procedure Call(RPC) overheads will raise if times of waking up PACDSPs are increased; secondly, data movement plays an important role on performance due to lack of local memory of PACDSPs; last but not least, complicated data dependencies among H.264 decoding process would hinder from parallelization. To accelerate the H.264 decoder, we propose a method including thread-level, data-level, and function-level parallelism. Creating two threads to execute decoding procedure and rendering procedure will exploit thread-level parallelism. Then, in the decoding procedure, we deploy independent data to be processed on the PACDSPs to exploit data-level parallelism. Lastly, partitioning the function in the rendering procedure to PACDSPs to take advantage of function-level parallelism. In experiments, we show the frame rates of each combination on the target platform, and discuss the performance of them. One-ARM reaches 10.93 fps while in our ultimate combination, it reaches 14.26 fps. Furthermore, supposed just looking at the performance that C compiler with intrinsic functions gains, it reaches about 3.56x based on one-ARM, whose compiler is arm-gcc. Besides, this work delivers a high performance application written in C language instead of assembly language. In the past, there are only H.264 decoder kernels of assembly version on PACDUO. It goes without saying that the performance of programs written in assembly language is the best, while in C language the performance degrades. Moreover, some applications written in C language have worse performance on this multi-core platform, PACDUO. We use H.264 decoder to show that even written in C language, the application still gets good performance on PACDUO.

並列摘要


在嵌入式系統的領域中,許多應用軟體開始夾帶大量的計算。而基於嵌入式系統的單一處理器效能並不彰,所以現今方法大多使用多顆處理器來同時處理這些運算。 此篇論文中,我們探討H.264 的解碼器在多核心架構的嵌入式系統上表現如何。我們所使用的平台是工研院研發的PAC DUO,其上有一顆ARM 的記憶體單元處理器以及兩顆數位訊號處理器PACDSP。除此之外,我們將H.264 的解碼器放到Android 的平台上面去跑以期能夠更接近現世的潮流。 過去,實驗室曾在PACDSP 上面研發一套編譯器。此套編譯器上有提供一組Intrinsic 的功能,讓使用者能夠寫出相當有效率的程式碼。換句話說,利用這組Intrinsic,使用者能夠輕易的達成SIMD 的效果,同時加速整個程式。為了要提昇H.264 在此平台的效能,我們將以下的因素全都納入考量:第一,遠端程序呼叫(RPC),每當程式需要從MPU 去呼叫PACDSP 時,此功能就會被呼叫一次,所以MPU 與PACDSP 之間的溝通越頻繁,此功能所花費的時間就越多,同時會拖慢整個程式;第二,由於PACDSP 本身的記憶體相當有限,在MPU 與 PACDSP 之間資料的傳遞也會是個關鍵;第三,H.264 解碼器本身的資料處理就附帶大量的關聯性,此特性會對整個程式的平行化有所阻礙。 我們不論在執行緒層級(thread-level)上,抑或是資料層級(data-level)以及功能層級(function-level)上都做了平行化。在執行緒層級(thread-level)上,我們創兩個執行緒分別執行解碼(decoding)部份以及繪圖(rendering)部份。在資料層級(data-level)上,平均分配資料給兩顆PACDSP 去執行繪圖部份。最後在功能層級(function-level)上,我們將繪圖部份其中的一些功能切到PACDSP 去執行,使其加速。 而實驗部份,我們測出在不同的加速技巧組合下,分別得出的每秒張數(frame rate),並且深入的坦討。只有一顆MPU 的版本有10.93 的每秒張數,而在我們許多加速技巧的最終版本則是得到14.26 的每秒張數。此外,如果只針對編譯的的Intrinsic 所產生的加速效果來看,則是有3.56 倍的成長。

並列關鍵字

H.264 Embedded System SIMD DSP Multi-core

參考文獻


[4] Michael Bleyer Florian H. Seitner, Ralf M.Schreier, and Margrit Gelautz. Evaluation of data-parallel splitting approaches for h.264 decoding. Mobile Computing
and Multimedia, 2008.
[5] E. G.T. Jaspers E. B. van der Tol and R. H. Gelderblom. Mapping of h.264 decoding on a multiprocessor architechture. Proceedings of SPIE, 2003.
[1] Wikipedia, mpeg. http://en.wikipedia.org/wiki/FFmpeg.
[2] Gisle Bjontegaard Thomas Wiegand, Gary J. Sullivan and Ajay Luthra. Overview of the h.264 / avc video coding standard. IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2003.

被引用紀錄


江俊賢(2006)。大學生時間使用、時間態度及其應用行動設備的可能性〔碩士論文,國立臺灣師範大學〕。華藝線上圖書館。https://www.airitilibrary.com/Article/Detail?DocID=U0021-0712200716122893
余曉菁(2016)。行動學習融入高中國文教學之行動研究:以高中一年級為例〔碩士論文,國立中正大學〕。華藝線上圖書館。https://www.airitilibrary.com/Article/Detail?DocID=U0033-2110201614050309

延伸閱讀