透過您的圖書館登入
IP:18.218.48.62
  • 學位論文

多視角視訊編碼技術之演算法和硬體架構及系統設計之研究

Multiview Video Coding: Algorithms, VLSI Architectures, and System Design

指導教授 : 陳良基
共同指導教授 : 簡韶逸(Shao-Yi Chien)
若您是本文的作者,可授權文章由華藝線上圖書館中協助推廣。

摘要


三維立體視訊能夠藉由同時將不同視角的影像傳送至左右兩眼,進而提供完整的場景實境效果。在一個三維視訊系統中,由於有大量的資料量必須做有效的壓縮處理,多視角視訊編碼因此伴演了很重要的角色。在本篇論文中,我們將從演算法、硬體架構及系統設計等三個不同的層面來研究多視角視訊編碼技術。 在演算法的相關研究中,我們針對三維視訊編碼系統中的預測核心部分以及彩度校正部分分別提出了快速演算法。所謂的預測核心,包含了程動估計和位移估計兩種運算,分別可以消除時間和空間上的多餘資料量。然而,預測核心需要大量的運算量,造成軟硬體設計上的困難,因此,我們提出了內容感知預測演算法降低運算複雜度。運用位移估計先在不同視角間找到相對應的方塊其內容包含同樣物件內容,我們可以從已編碼後的視角影像中,有效地取得編碼資訊,如此可以在大多數的視角影像頻道中節省98.4—99.1%的運算量,伴隨而來的代價只有0.03—0.06dB的PSNR下降量。此外,我們也針對了不同視角間亮度及彩度不對稱的問題,提出位移補償線性回歸的亮度及彩度校正演算法,同時同覆利用了移動向量的資訊,提升了0.4dB的編碼效率。 在多視角視訊編碼中,預測核心向來是運算複雜度最高的部分,如此的運算量,無法利用現今的處理器達成即時編碼的需求,因此利用硬體加速有其必要性。在論文的第二部分,我們分別針對雙視角以及多視角視訊編碼系統提出了兩個預測演算法及其對應的硬體架構。首先,我們提出了聯合預測演算法應用在雙視角視訊編碼系統中,以增加編碼效率以及降低運算量。聯合預測演算法利用了雙視角影像的特性,成功降低了80%的運算量,同時也增進了影像的品質。我們針對此演算法所設計以及實作的晶片大小約2.13×2.13 mm2,包含了137K個邏輯閘以及20.75Kbits的內嵌記憶體。與傳統的架構相較,此架構只使用了11.5%的內嵌記憶體以及3.3%的處理單元即可達到相同的規格。接著,我們將研究的廣度由雙視角擴展為多視角系統,我們改進了移動估計中的向量中心預測演算法,以增進移動向量的準確度。該演算法節省了96%的運算複雜度,同時視訊品質只降低了0.045dB。所提出的演算法的搜尋範圍在水平與垂直方向最大可支援到[-256, +255]/[-256, +255],因此可以滿足四倍大高解析度影像的需求。我們以快取處理的概念設計相對應的硬體架構,利用快取的機制可以有效地節省39%的外部記憶體頻寬。我們所實作的晶片只需230K個邏輯閘以及8KB的記憶體,在300MHz的操作頻率下,可以即時的處理四倍大高解析度的視訊影像。 接著,論文的第三部分針對多視角編碼系統提出了頻寬分析的方法以及單晶片編碼器的硬體架構。首先,我們針對各種不同且複雜的多視角編碼結構提出了一套新的系統頻寬分析方法,用圖論中的「次序限制」處理多視角編碼結構中影像間的預測相依性。根據此分析,我們即可針對不同的編碼結構達到硬體資源的最適分配。接下來,我們提出了多視角視訊編碼器的系統架構,可應用於未來三維立體電視和四倍高畫質解析度電視中。其中包含了一個新的八級巨集區塊管線架構以及系統排程,並且搭配快取式預測核心,能夠即時的處理單視角4096×2160p、三視角1920×1080p、到七個視角1280×720p的編碼運算。此外,為了維持高解析度影像的畫質,本架構可支援的搜尋範圍是傳統架構的4—64倍大。與傳統架構相比,本架構可以節省79%的系統頻寬和94%的內嵌記憶體。我們所實作的晶片利用CMOS 90奈米的製程,大小為11.46 mm2,藉由新的系統排程、高度的平行化、演算法最佳化、以及模組化的時脈閘控等技術,本晶片能夠達到1億1200萬像素/秒的運算能力。本晶片為世界上第一顆多視角視訊單晶片編碼器。 簡而言之,我們對多視角編碼技術的貢獻主要有三個方向。我們所提出演算法可運用在多視角編碼系統的核心;此外,我們也針對雙視角以及多視角視訊編碼的預測核心率先提出了硬體架構以及晶片;並且,整合了論文前兩部分,我們所提出的多視角視訊編碼系統為世界上最前瞻的設計。利用我們提出的多視角編碼技術,三維立體視訊便能夠實現在許多實際的生活應用中,我們由衷地希望我們的研究成果能對人類生活的便利性帶來劃時代的突破。

並列摘要


Three-dimensional (3D) video can provide a complete scene structure to human eyes by transmitting several views simultaneously. In 3D video systems, multiview video coding (MVC) plays a critical role because huge amount of data make compression necessary. In this dissertation, MVC is studied at three different levels: algorithm level, VLSI architecture level, and system design level. According to the research relevance between hardware-oriented algorithms and VLSI architectures, several proposed algorithms and their corresponding architectures are discussed together. Therefore, this dissertation is divided into three parts: efficient MVC algorithms, algorithm and architecture co-design of MVC prediction core, and system analysis and architecture design of MVC encoder. In the first part of this dissertation, two new fast algorithms are developed for prediction core and color correlation. The prediction core mainly consists of motion estimation (ME) and disparity estimation (DE) to remove temporal and inter-view redundancy, respectively. We develop content-aware prediction algorithm (CAPA) for computational complexity reduction. By utilizing DE to find corresponding blocks between different views, the coding information can be effectively shared and reused from the coded view channel. It can save 98.4--99.1% computational complexity of ME in most view channels with negligible quality loss of only 0.03--0.06 dB in PSNR. In addition, we also develop an illuminance and chrominance correlation algorithm to deal with the color mismatch between views. The proposed algorithm is based on motion compensated linear regression which reuses the motion information from the encoder and provides better coding gain by up to 0.4 dB. Prediction core is always the most computation-intensive part in an MVC system. Therefore, hardware acceleration is necessary for real-time processing requirement. In the second part of this dissertation, two prediction algorithms and their corresponding VLSI architectures are proposed for stereo and MVC, respectively. First, we propose joint prediction algorithm (JPA) for high coding efficiency and low computational complexity. JPA utilizes the characteristics of stereo video and successfully reduces about 80% computational complexity while enhances the video quality. A corresponding architecture of JPA is then designed and implemented on a 2.13x2.13mm^2 die with only 137K logic gates and 20.75 Kbits on-chip SRAM. The JPA architecture is an area-efficient design because only 11.5% on-chip SRAM and 3.3% processing elements are used compared with conventional architectures. Second, we extend the design space from stereo video to MVC systems. A predictor-centered prediction algorithm is modified to enhance the motion vector (MV) accuracy. 96% computational complexity can be saved with the penalty of quality loss of only 0.045dB. The proposed algorithm supports search range up to [-256,+255]/[-256, +255] in horizontal/vertical directions, so it can fulfill the requirement of quad full high-definition (QFHD) videos. A corresponding architecture is then presented based on the cache processing concept. The proposed cache architecture saves 39% off-chip memory bandwidth with a rapid prefetching algorithm. The implementation results show that the proposed cache-based prediction architecture requires little hardware cost: only 230K logic gates and 8KB on-chip SRAM (2.1x2.1mm^2 with 90nm process) are required to process QFHD videos in real-time at the working frequency of 300 MHz. The third part of this dissertation describes system analysis and architecture design of MVC encoder. The accumulated know-how and design experience of the previous parts are integrated. First, a new system bandwidth analysis scheme is proposed for various and complicated MVC structures. The precedence constraint in the graph theory is adopted for deriving the processing order of frames in an MVC system. The suitable hardware resource allocation can be easily determined with the proposed analysis scheme. Second, the system architecture of MVC encoder for 3D/QFHD applications is presented. An eight-stage macroblock pipelined architecture with proposed system scheduling and cache-based prediction core supports real-time processing from one-view 4096x2160p to seven-view 720p videos. In addition, the search range is four to sixty-four times larger than the previous work to maintain HD video quality. The proposed architecture saves 79% system memory bandwidth and 94% on-chip SRAM. A prototype chip is implemented on a 11.46mm^2 die with 90nm CMOS technology. The 212M pixels/s throughput and 407M pixels/Watt are achieved by proposed system scheduling, high degree of parallelism to reduce memory access, algorithmic optimization, and module-wise clock gating, etc. The proposed architecture is the worldwide first reported MVC single-chip encoder. In brief, we believe that with the MVC technologies proposed in this dissertation, 3D video processing can be realized in many real applications. We sincerely hope that our research contributions can create a new era for digital multimedia life.

參考文獻


[3] C. Zhang and T. Chen, “A self-reconfigurable camera array,” in Eurographics
in Wikipedia.
H.261, Mar. 1993.
11172-2, 1993.
H.262, 1996.

延伸閱讀