高效率視訊編碼(HEVC)是現今最新的視訊標準,其中運動估測(Motion Estimation, ME)模組所需的時間約佔HEVC整體編碼的70%。此外,HEVC可利用多重參考畫面ME模組(Multiple Reference Frame ME, MRF-ME)來達到更精準的畫面預測,但也導致更龐大的計算量和花費更多的時間。為了降低ME模組的計算複雜度,論文首先提出以GPU為基礎之彈性編碼樹單元(Coding Tree Unit: CTU)快速ME平行演算法,並直接運用在MRF-ME模組,來進一步加速HEVC整體編碼時間。論文將GPU規劃成3個核心(Kernel)函數來進行平行處理與運算,Kernel 1執行CTU最小區塊(88)的絕對差值總和(Sum of Absolute Differences, SAD)計算,接著Kernel 2進行各種不同大小區塊(88 ~ 6464)的SAD合併,最後Kernel 3找出各個區塊的最佳匹配區塊,來進一步加速HEVC的ME模組搜尋時間。 從實驗結果發現,在不同的量化參數(quantization parameter: QP)下,論文所提快速MRF-ME平行演算法與原始的HEVC測試平台(HM16.7)相比較,當在MRF=4與MRF=8時,論文所提快速平行演算法的整體時間改善率(Time Improve Ratio, TIR)分別約為96.68%和97.78%。
In the newest high efficiency video coding (HEVC) standard, the motion estimation (ME) takes around 70% of the encoding time in HM encoder. In order to reduce the complexity of the ME module in HEVC, this dissertation proposes a flexible coding tree unit (CTU)-level parallel ME method using a graphics processing unit (GPU). The proposed method can be combined with fast CTU-level multiple reference frame (MRF) motion estimation (MRF-ME) to further reduce the encoding time. Firstly, we decompose ME algorithm into three kernels to achieve a highly parallel computation with a low external memory on GPU. Secondly, the kernel 1 executes a GPU program of calculating the sum of absolute differences (SAD) of small coding unit (SCU 88). Thirdly, the kernel 2 merges the variable block size from SCU (88) to large coding unit (LCU 6464). Lastly, the kernel 3 compares minimum SAD to find the best matching block. Simulation results show that the proposed method can achieve an average time improving ratio of MRF-ME module about 96.68% and 97.78% when compared to HM16.7 under MRF=4 and MRF=8, respectively.