H.265視訊標準採用更彈性的編碼模組,分別為編碼單位(coding unit: CU)、預測單位(prediction unit: PU)和轉換單位(transform unit: TU)等三個模組,其中CU與TU分別採用深度(depth)為4的編碼四分樹(coding quad-tree: CQT)與深度為3的殘差四分樹(residual quad-tree: RQT)來進行視訊分割與編碼,PU則是隨著CU尺寸與預測模式的不同而改變,因此編碼時各模組可根據影像的複雜度做調整,來得到最佳的編碼效益。雖然H.265能有效提升視訊編碼效能,但它的計算複雜度也大幅增加,以至於很難達到即時(real-time)的視訊應用。 為了降低H.265編碼的計算複雜度,最近G. Correa等學者提出利用資料探勘(data mining)之快速編碼方法[7],他們首先將H.265視訊編碼分類成PU、RQT和CQT三種結構,再透過資料探勘分別尋找出適合的屬性(attributes),並從屬性中提取出相對應的數據,最後儲存成ARFF檔案格式(attribute-relation file format);接著將ARFF檔案輸入機器學習(machine learning)軟體工具WEKA,透過WEKA所提供的C4.5演算法[17],來訓練出PU、RQT和CQT各自的決策樹(decision tree);最後將決策樹分別應用到H.265的編碼過程,來大幅減少整體編碼之計算複雜度。然而,G. Correa在PU結構中,並沒考慮到SkipFlag屬性的重要性,以及在CQT結構中也沒考慮到鄰近區塊的關聯性(correlation),導致PU和CQT決策樹的準確性(accuracy)降低。 為了改善G. Correa的方法,本論文將在PU結構加入SkipFlag的屬性和在CQT結構中加入鄰近區塊具不同權重關聯性的屬性,來進一步改善決策樹的準確性。此外,由於G. Correa所提的三種決策樹,並沒有考慮到PU結構中最耗時的運動估測(motion estimation: ME)模組,為了進一步加速H.265編碼過程,本論文提出快速運動向量決策演算法(fast motion vector decision algorithm: FMVDA)。首先,我們由H.265視訊編碼的實驗觀察,發現運動向量(MV)具有很高的時空關連性(temporal-spatial correlation);接著,透過鄰近區塊來尋找適合的屬性,分別為鄰近區塊的最佳位元率失真(rate-distortion optimization: RDO)、MV大小、MergeFlag和SkipFlag,再透過資料探勘技術建立MV決策樹;最後,將決策樹運用到ME模組,藉由決策樹判斷來快速獲得MV,大幅減少ME模組運算次數,完成H.265快速編碼器。 此外,本論文採用ADI的ADSP-BF609開發板來實現所提出之快速H.265編碼器。首先,進行DSP內部記憶體配置的最佳化,我們將運算複雜度高的PU模組從L3配置到L2中,提高PU模組的執行效率。接著,使用ADSP-BF609開發板專用指令,將原程式碼進行修改與優化,並進行實驗模擬與分析。最後,由實驗結果可以發現,論文所提方法與H.265測試平台(HM16.7)相比時間改善率(time improving ratio: TIR)平均約75%~91%,而在DSP實現時TIR平均約63%~87%。另外,當論文所提方法與G. Correa的方法比較時,TIR平均可增加約22%~27%,我們所提方法除了能加速H.265編碼過程外,更可以得到與HM16.7差異不大的影像品質。
H.265 achieves significantly better coding efficiency than those of existing video coding standards. This is because H.265 adopts some new coding structures including coding unit (CU), prediction unit (PU) and transform unit (TU). The CU can be split by coding quad-tree (CQT) structure of depth = 4 and the TU can be split by residual quad-tree (RQT) of depth = 3. The optimal partitions of PU are according to the size of CU and the different prediction modes. Although H.265 can achieve the highest coding efficiency, it requires a very high computational complexity such as that its real-time application is limited. In order to reduce the computational complexity of H.265 encoder, G. Correa et al. proposed fast HEVC encoding decisions using data mining [7] recently. They firstly classified the coding structure into PU, RQT and CQT, and used data mining to find appropriate attributes for PU, RQT and CQT, respectively. Then, the corresponding to data is extracted from individual attribute, and is saved as attribute-relation file format (ARFF). And then, the ARFF is performed on WEKA of machine learning tool to train the PU, RQT and CQT decision trees using the C4.5 algorithm, respectively [17]. Finally, they applied the created decision trees to reduce the computational complexity of H.265 encoder. However, G. Correa et al. ignored the important attribute of SkipFlag in PU structure and the correlation attributes of neighboring block in CQT structure. This leads to reduce the accuracy for these decision trees. To further improve the accuracies of PU and CQT decision trees, we add the attribute of SkipFlag in PU structure and also consider the attribute of correlation existing neighboring blocks in CQT structure. In addition to train three decision trees for PU, RQT and CQT structure, there still exists a high computational complexity for ME module of H.265. Therefore, we propose a fast motion vector decision algorithm (FMVDA) to further speed up encoding process of H.265 encoder. Firstly, we find that there is a high temporal-spatial correlation of motion vector (MV) existing successive frame. And then, we find appropriate attributes including rate-distortion optimization (RDO), MV, MergeFlag and SkipFlag from neighboring blocks. Then, we train the MV decision trees using these selected attributes. Finally, FMVDA employs the created MVs decision tree to ME module, and achieves a fast H.265 video encoder. In addition, to further achieve the DSP realization for the proposed fast H.265 encoder, we embed the codec on the ADSP-BF609. We re-allocate the function of consuming module from L3 DDR-RAM to L1 and L2 SRAM to speed up the encoding time of H.265. Simulation results show that the proposed method can achieve an average time improving ratio (TIR) 75%~91% when compared to H.265 (HM16.7). Compared with G. Correa’s method, the proposed algorithm can achieve an average TIR about 22%~27%. It is clear that the proposed method can efficiently increase the speed of H.265 encoder with insignificant loss of image quality.