透過您的圖書館登入
IP:18.221.146.223
  • 學位論文

平行化無預掃H.264解碼器

Coarse Grain Parallelization of the H.264 Decoder without a Start-Code Scanner

指導教授 : 陳中平
共同指導教授 : 洪士灝(Shih-Hao Hung)

摘要


Fine grain methods for parallelization of the H.264 decoder have good latency perfor- mance and less memory usage. However, they could not reach the scalability of coarse grain approaches although assuming a well-designed entropy decoder which can feed the increasing number of parallel working cores. We would like to introduce a GOP (Group of Pictures) level approach due to its high scalability, mentioning solution approaches for the well-known memory and latency issues. Our design revokes the need to a scanner for GOP start-codes which was used in the earlier methods. This approach lets all the cores work on the decoding task. Our experiments showed that the memory initialization op- erations may degrade the scalability of parallel applications substantially. The multicore cache architecture appeared to be a critical point for getting the desired speedup. For FHD resolution video, we observed a speedup of 7.51 with 8 processors having separate caches, and a speedup of 14.46 using 15 processors when a cache is shared by 2 processors.

關鍵字

平行化 H.264 解碼器

並列摘要


Fine grain methods for parallelization of the H.264 decoder have good latency perfor- mance and less memory usage. However, they could not reach the scalability of coarse grain approaches although assuming a well-designed entropy decoder which can feed the increasing number of parallel working cores. We would like to introduce a GOP (Group of Pictures) level approach due to its high scalability, mentioning solution approaches for the well-known memory and latency issues. Our design revokes the need to a scanner for GOP start-codes which was used in the earlier methods. This approach lets all the cores work on the decoding task. Our experiments showed that the memory initialization op- erations may degrade the scalability of parallel applications substantially. The multicore cache architecture appeared to be a critical point for getting the desired speedup. For FHD resolution video, we observed a speedup of 7.51 with 8 processors having separate caches, and a speedup of 14.46 using 15 processors when a cache is shared by 2 processors.

參考文獻


[7] Cor Meenderinck , Arnaldo Azevedo , Ben Juurlink , Mauricio Alvarez Mesa , Alex
[21] Lee, C., Ho, C. S., Tsai, S.-F., Wu, C.-F., Cheng, J.-Y., Wang, L.-W., et al. (1996).
[1] P. Stenstrom, Chip-multiprocessing and Beyond, Proc. Twelfth Int.Symp. on High-
[3] A. Bilas, J. Fritts, and J. Singh, Real-time parallel mpeg-2 decoding insoftware,
[4] A. Rodriguez, A. Gonzalez, and M. P. Malumbres, Hierarchical parallelization of

延伸閱讀