圖形處理器平行計算及其在視訊編碼之應用

在此論文中，我們針對圖形處理器(Graphics Processing Unit)平行計算架構進行研究，將基於中央處理器(Central Processing Unit)執行的循序演算法，轉換成適用於圖形處理器執行的平行演算法，並透過圖形處理器的多核心平行處理特性與記憶體快取來加速視訊編碼的速度。視訊編碼的工作流程中，無論是最新一代的視訊編碼標準HEVC(High Efficiency Video Coding)或是上一代的H.264/AVC，都包含著耗時且需要大量重複計算的工作區塊，像是動作預估(Motion estimation)區塊，而這些區塊正好適合圖形處理器來平行處理。我們利用NVIDIA提供的CUDA(Compute Unified Device Architecture)平行運算程式模型來設計平行演算法，並將其嵌入在HEVC與H.264/AVC的軟體編碼器中。然而，HEVC編碼器的計算複雜度遠高於H.264/AVC的編碼器，因此我們設計一種機制讓GPU可以通知HEVC編碼器在何種編碼區塊可以直接切割，藉此加速整體編碼速度。本篇論文針對兩種視訊編碼標準實驗在不同的應用情境，H.264/AVC編碼實驗在無人機航拍視訊畫面，而HEVC編碼實驗在超高畫質的視訊畫面。在航拍視訊畫面實驗中，原始的H.264/AVC編碼器處理1300張1080P的航拍畫面共需13小時，然而透過圖形處理器加速的編碼器則僅需3小時的總編碼時間，並且將動作預估的時間從8小時縮減至5分鐘。在超高畫質的視訊畫面實驗中，原始的HEVC編碼器編碼300張4K畫質的視訊畫面需要將近25小時，其中的動作預估就佔了14.5小時，而透過圖形處理器加速的編碼器僅需11小時的總編碼時間，除了將原先的動作預估時間從14.5小時降至1.7分鐘外，我們提出的快速編碼區塊切割的機制可以讓HEVC編碼器更進一步省下3小時的編碼時間，並且PSNR僅下降0.01 dB。

關鍵字

圖形處理器；平行演算法；高效率視訊編碼；高階影片編碼；動作預估；無人機航拍；超高畫質

並列摘要

In this thesis, we study the parallel computing on GPU (Graphics Processing Unit), and transform the sequential algorithm based on CPU (Central Processing Unit) into the parallel algorithm based on GPU. To leverage the feature of multiple core and memory cache in GPU, we can speed up the execution of video coding. In the video coding flow, whether the latest video coding standard－HEVC (High Efficiency Video Coding) or preceding standard H.264/AVC, they have numerous time consuming works, such as motion estimation, which can use GPU for computing parallel. We propose the parallel motion estimation algorithm based on CUDA (Compute Unified Device Architecture), which is a parallel programming model created by NVIDIA. Then, the proposed algorithm is embedded on the encoder of HEVC and H.264/AVC. However, the computational complexity of HEVC encoder is far higher than H.264/AVC encoder. Therefore, we design a mechanism that GPU is able to signify HEVC encoder which encoding block can be split instantly. This thesis experiments different applied situation depend on two kinds of video coding standard, the H.264/AVC encoder is experimenting on UAV (Unmanned Aerial Vehicle) video, and the HEVC encoder is experimenting on UHD (Ultra High Definition) video. In the 1080P aerial video experiment, original H.264/AVC encoder takes 13 hours to encode 1300 frames, however, the proposed GPU-based encoder only takes 3 hours to encode, and the execution time of motion estimation model is reduced from 8 hours to 5 minutes. In the UHD experiment, original HEVC encoder takes about 25 hours to encode 300 frames with 4K resolution, and the motion estimation part occupies 14.5 hours, however, the GPU-based HEVC encoder only takes 11 hours to encode. Apart from speeding up the execution time of motion estimation from 14.5 hours to 1.7 minutes, the proposed encoder can even save 3 hours encoding time by the mechanism of fast encoding block splitting, and the PSNR declines about 0.01 dB.

並列關鍵字

GPU ； Parallel algorithm ； HEVC ； H.264/AVC ； Motion estimation ； CUDA ； UAV ； UHD

參考文獻

[1] YUV. (n.d.). Wikipedia. [Online]. Available:https://en.wikipedia.org/wiki/YUV. Accessed June 30, 2016.

Google Scholar

[2] T. Wiegand, G. J. Sullivan, G. Bj?ntegaard, and A. Luthra, "Overview of the H.264/AVC video coding standard,"IEEE Transactions on Circuits and Systems for Video Technology, vol. 13, no. 7, pp. 560-576, July 2003.

Google Scholar

[3] I. E. Richardson, The H.264 Advanced Video Compression Standard, 2nd ed. New York: Wiley, 2010.

Google Scholar

[4] G. J. Sullivan, J.-R. Ohm, W.-J. Han, and T. Wiegand, "Overview of the High Efficiency Video Coding (HEVC) standard,"IEEE Transactions on Circuits and Systems for Video Technology, vol. 22, no. 12, pp. 1649-1668, Jan. 2013.

Google Scholar

[5] F. Bossen, B. Bross, K. Sühring, and D. Flynn, "HEVC Complexity and Implementation Analysis,"IEEE Transactions on Circuits and Systems for Video Technology, vol. 22, no. 12, pp. 1685-1696, Jan. 2013.

Google Scholar

國際替代計量

圖形處理器平行計算及其在視訊編碼之應用

未授權

主題瀏覽