多視角視訊編碼模式分析與快速模式決策之研究

論文摘要視訊技術不斷地進步，三維(Three-dimensional, 3D)影像在下一個世代扮演著不可或缺的角色。然而，3D影像相較於傳統之二維(Two-dimensional ,2D)影像，資料量更龐大，運算複雜度高。因此，如何提升編碼速度及提高壓縮效能是重要議題。目前已列入JVT(Joint Video Team)H.264延伸計劃中之3D影像的壓縮技術是以多視角視訊壓縮(Multi-view Video Coding, MVC)為主，以H.264編碼模式為基礎，使用7種不同編碼區塊，減少時間軸上多餘的移動預估，及加入畫面內編碼模式預測減少空間多餘性。此外，多視角視訊影像序列中，因視角間畫面有很大的相關性，因此MVC增加鄰近視角影像當成編碼的參考畫面以減少編碼多餘性。MVC在編碼模式決策中，使用位元率與失真率最佳化之技術從畫面間(inter)編碼模式與畫面內(intra)編碼模式中，需在畫面間以及畫面內做全區域的搜尋找出最適合編碼模式，此作法雖然提高編碼效能，卻也增加運算複雜度。然而，經實驗觀察分析，大部分的影像序列中，背景幾乎被判別為Skip模式，因此本論文利用此特性，辨識出區塊是不須執行畫面間或畫面內編碼模式的搜尋，進而減少編碼時間。為了提高MVC編碼速度、降低計算複雜度並維持畫面品質，本論文針對基本視角(view0)與立體視角(view1-7)，提出全域之多視角快速區塊編碼模式決策機制。提出之基本視角編碼模式決策演算法，將依照畫面編碼順序分成B1與B2,B3兩類，因B1與參考畫面差異度較大，故畫面內編碼模式所佔比例較高，因此本論文利用Skip模式其RD-cost值大小，先行判斷是否為畫面內模式，以減少運算複雜度。B2,B3因參考畫面距離較近差異度小，因此畫面內編碼模式所佔比例較低，所以本論文利用相鄰區塊間編碼模式的相關性，判斷畫面複雜程度，較高者則判定為畫面內編碼模式。另外，針對立體視角編碼模式決策演算法，由於相鄰視角之攝影機對同一場景拍攝，因此視角與視角間畫面上有相似的區域，本論文提出在畫面相似度較高區域，可參考鄰近視角相對應位置的區塊模式，而相似度較低的區域，則參考時間軸(Temporal)上影像特性，根據畫面中不同區域做快速區塊模式決策，以達到減少編碼時間的目的。模擬結果顯示，本論文所提出的演算法在編碼時間減少上，基本視角最高可節省78%，立體視角最高可節省65%。在多視角視訊壓縮系統上，利用本論文所提出的演算法，可以解決編碼時間冗長運算複雜度高的問題，達到快速區塊模式決策目的。

關鍵字

多視角視訊壓縮；快速模式決策

並列摘要

Abstract Currently, 3D video has become the major topic in television system that provides the viewer with reality experience. However, the high transmission bandwidth, the huge data storage, and the complex computational time of 3D video make it difficult to be realized for home TV user. Thus, an efficient compression algorithm for 3D video is a major task. Joint Video Team (JVT) developed a multi-view video coding (MVC) that based on H.264 codec which uses the RDO selection for the best coding mode from inter modes and intra modes. Although, the selection of the best coding mode by RDO improves the compression performance but the computational complexity is increased. The exhaustive search for all inter modes and intra modes for inter-frame coding cause the encoder to take a large computational time while less than 4% blocks are chosen as intra-mode in real sequences. Therefore, this thesis proposed a fast mode decision algorithm to improve the coding efficiency of MVC, and achieves low computational complexity while maintain good quality of the reconstructed frame. This thesis used the structure of hierarchical B pictures for the proposed algorithm. The coding mode selection is divided into two main operations: basic view (view 0) and multi-view (view 1-7). For the basic view, the mode selection is categorized into two types: 1) B1, 2) B2B3. In B1, the value of RD-cost for Skip mode is employed to determine the best coding mode. In B2B3, this thesis uses the neighboring macroblock’s coding mode correlation to determine the most suitable coding mode. For the multi-view (view1-7) coding selection, the similar area between reference and current frame is analyzed. If MB belongs to the similar region between the reference and current frame, the MB in the similar regions will be chosen as inter-mode, else the temporal direction will be employed to determine the coding mode for the current macroblock. Experimental results show that the encoding time for the basic view and multi-view is saved up to 78%, and 80%, respectively, and the quality of multi-view video is almost remained.