新一代的高效率視訊編碼(HEVC)標準,採用更彈性的編碼單位,分別為編碼單位(CU)、預測單位(PU)和轉換單位(TU),比現今視訊標準提供更高壓縮率與品質。為了適應異質網路與不同介面,JCT-VC於2014年7月制定可調HEVC標準(SHVC),SHVC包括空間可調、時間可調與品質可調,最大特色是提供單一位元流來支援不同的解析度、畫面率和畫面品質。SHVC主要是由多層HEVC組成,包括一層基礎層(BL)和多層加強層(EL),SHVC雖然能有效的提升編碼效能,但運算複雜度卻大幅增加,以至於無法即時應用在各種網路頻寬。 SHVC編碼採用多層迴路架構,每層均以HEVC編碼,並進行層間參數預測。HEVC以編碼樹單位(CTU)來進行CU四分樹(quadtree)分割,CU採用從64x64到8x8來進行不同深度分割,再進入PU執行畫面內/畫面間的預測編碼。HEVC修剪最佳Quadtree時,將對每一深度的PU執行SKIP、intra 2Nx2N、intra NxN、inter 2Nx2N、inter NxN、inter 2NxN和inter Nx2N等7種不同模式的最佳預測,由於inter 2Nx2N、inter NxN、inter 2NxN和inter Nx2N預測均需進行運動估測(ME),這使得SHVC系統的運算複雜度變得相當高。 為了降低SHVC編碼的運算複雜度,最近林等人提出高效能SHVC視訊編碼器[22],利用鄰近已編CTU之最佳Quadtree樹形作為待編CTU的候選Quadtree樹形,並設計一時空搜尋次序演算法(TSSOA)來快速找到最好的Quadtree;另外莊等人也提出一快速深度決策演算法(FCUDRD)[23],利用鄰近已編CTU之最深 Quadtree的深度資訊,訂出合適的Quadtree動態深度範圍,來排除先前畫面和鄰近CTU不常使用的深度。TSSOA和FCUDRD分別利用了最佳Quadtree樹形與最大深度的資訊,減少CU運算的次數來加速SHVC編碼時間。然而,在修剪最佳Quadtree的過程,每一CU還須進行ME來尋找最佳PU預測模式,這導致TSSOA和FCUDRD加速編碼的效能受限。為了進一步提高TSSOA和FCUDRD演算法的效能,本論文提出快速PU預測演算法(fast PU prediction algorithm: FPUPA)與快速運動向量(MV)預測演算法(fast MV prediction algorithm: FMVPA),來降低PU模組的運算量。首先利用TSSOA所找出的最佳候選Quadtree樹形,將候選Quadtree樹形內各PU模式作為待編CU的候選PU模式,再經實驗推導出一合適的RDcost (rate-distortion cost)臨界值,若RDcost小於臨界值則直接決定PU模式,大幅減少PU模組的運算過程。當決定PU預測模式後,需再經ME模組來獲得MV,為了減少ME運算次數,我們利用MV的高時空關聯性,根據各深度相同MV機率分佈來決定搜尋順序,相似於FPUPA,若MV的RDcost小於臨界值則直接決定MV,大量減少ME運算次數,進一步加速SHVC編碼時間,最後結合FPUPA和FMVPA來完成快速SHVC編碼器。 此外,本論文採用ADI的ADSP-BF548模擬板來實現所提快速SHVC編碼器,首先進行DSP內部記憶體最佳化配置,將運算複雜高的PU模組從L3配置到L2中,提高PU模組的執行效率。接著使用ADSP-BF548模擬板專用指令,將原程式碼進行修改與優化,並進行實驗模擬與分析。由實驗結果可以發現,在不同的量化參數(QP)下,論文所提方法與原始SHVC (SHM4.0)相比時間改善率(time improving ratio: TIR)高達66%~88%;若和TSSOA相比TIR可增加約7%~15%,而與FCUDRD相比TIR可增加約14%~25%。論文所提方法除了能進一步加速SHVC編碼時間外,更可以得到和SHM 4.0差異不大的影像品質。
The newest high efficiency video coding (HEVC) achieves significantly better coding efficiency than existing video coding standards. HEVC adopts some new coding structures including coding unit (CU), prediction unit (PU) and transform unit (TU). To upgrade the HEVC used in heterogeneous access networks, the JVT-CT has finished a scalable extension of HEVC (SHVC) in July 2014. The SHVC can achieve the highest coding efficiency but requires a very high computational complexity such that its real-time application is limited. Based on the HEVC, the SHVC scheme supports both single-loop and multi-loop solutions by enabling different inter-layer prediction mechanisms. The HEVC adopts the coding tree unit (CTU). Each CTU allows recursive splitting into four equal CU (64×64~8×8). And then, the PU performs the intra/inter prediction processes. When pruning the best CTU coding quadtree, the inter prediction module executes 7 different prediction modes including SKIP, intra2N×2N, intraN×N, inter2N×2N, inter2N×N, interN×2N and interN×N to find the best mode. Especially, in the inter2N×2N、inter2N×N、interN×2N and interN×N prediction need perform motion estimation (ME) and motion compensation (MC). Since ME process is performed using all the possible depth levels and prediction modes which lead to requiring a very high computational complexity in SHVC encoder. In order to reduce the computational complexity of SHVC encoder, recently, Lin et al. proposed temporal-spatial searching order algorithm (TSSOA) [22] to find a good candidate quadtree of the current CTU. On the other hand, Jhuang et al. also proposed fast CU depth range decision (FCUDRD) algorithm [23] based on the maximal and minimal values of depth levels to determine current CU depth. However, every CU still need ME module to find the best PU mode for TSSOA and FCUDRD methods. In order to further improve the performance of TSSOA and ACUDRD, we propose fast PU prediction algorithm (FPUPA) and fast MV prediction algorithm (FMVPA). Firstly, we use TSSOA to find the PU mode of best candidate quadtree, and then the PU mode is considered as the best PU mode of the current CTU. The MV of the CU may be similar to the MV of the co-located CU and the spatial four neighbor CUs due to temporal and spatial correlation. Secondly, five causal neighboring MVs of the CUs are considered as the good candidate MV of the current CU. Finally, we combine FPUPA and FMVPA into the SHVC system to further speed up the encoding process. In addition, to further achieve the DSP realization for the proposed fast SHVC encoder, we embed the codec on the ADSP-BF548. We re-allocate the function of consuming module from L3 DDR-RAM to L1 and L2 SRAM to speed up the encoding time of SHVC. Simulation results show that the proposed methods can achieve an average time improving ratio (TIR) about 66%~88% when compared to SHVC (SHM4.0). In addition, compared with TSSOA algorithm, the proposed method can further achieve an average TIR about 7%~15%. Compared with FCUDRD algorithm, the proposed method can further achieve an average TIR about 14%~25%. It is clear that the proposed algorithm can efficiently increase the speed of SHVC encoder with insignificant loss of image quality.