透過您的圖書館登入
IP:3.137.170.183
  • 學位論文

高畫質多視角立體視訊編碼系統預測核心之演算法與硬體架構設計

Algorithm and Architecture Design of Prediction Core in High Definition Stereo and Multiview Video Coding System

指導教授 : 陳良基

摘要


由於顯示技術的進步,將影像顯示的定義從原本的二維平面提升到極為仿真的三維平面漸漸的不再只是紙上談兵,各類的多視角以及立體顯示相關的運用,例如三維立體電視以及自由視角電視等等,也如同雨後春筍般地不斷發展。而為了支援這些應用,資料傳輸的格式需要從以往的單一視角改變成為同時傳輸多個不同視角的影像串流,因此所需傳輸的資料量也就隨著視角的個數而倍數成長。另外,由於近年來高畫質1080條掃瞄線,或是720條掃瞄線已成為視訊應用中不可忽略的規格,多視角立體視訊相關的應用也不例外。對於高畫質多視角所需的運算量以及所需頻寬將大幅超越現有的單視角應用。為了實現多視角立體視訊系統,必須要有高效率的多視角視訊壓縮技術。而目前最廣泛被研究、且最壓縮率最高的技術標準是MPEG 3D Audio/Video Group以H.264為骨架所發展出來的多視角視訊壓縮。由於同時使用了一般視訊壓縮所使用的以移動補償為基底的演算法和在多視角視訊壓縮時的位移補償演算法作為預測核心,壓縮率相較於一般的H.264視訊壓縮標準來得更好。但由於預測核心的演算法所需的比對範圍會隨著畫面規格的上升而增加,對於高畫質多視角視訊的應用來說,使用綜合移動-位移補償演算法的壓縮技巧也提升了高畫質多視角視訊即時資料處理以及壓縮的實現困難度。 在本論文中,首先將介紹目前關於多視角以及三維立體視訊的研究進展以及相關背景知識,並且探討為何過大的頻寬以及預測核心運算複雜度會在多視角立體視訊壓縮中成為主要的設計困難點。接下來的部分則是專注在探討關於多視角立體視訊壓縮的預測核心的演算法。本論文首先將從演算法的層級出發,經由分析在多視角視訊壓縮中所使用的預測模式的分佈以及在不同視角間的相關性,本論文提出了能夠在多視角視訊壓縮時提前預測目前巨集方塊的預測模式的演算法。而根據此演算法更進一步對於不同視角所運算出的移動向量作分析,便能在多視角視訊壓縮時提前預測目前巨集方塊的移動向量。根據本論文所提出的演算法將可消除98.4%到99.1%的多視角視訊壓縮的預測核心運算複雜度,並且與使用最高複雜度的綜合移動-位移補償全域搜尋演算法相比,僅降低0.03-0.06 dB的PSNR品質。若與一般之單視角視訊壓縮相比,則有0.09-1.44 dB的PSNR品質增進,並且僅需原本的51.4-64.1%的運算複雜度。 接下來的章節則會開始討論如何以硬體實現多視角視訊壓縮的預測核心,首先是在多視角視訊壓縮時的頻寬分析,經由引入圖論中的優先限制演算法,不同的編碼架構能夠根據所對應的優先限制來選擇能夠得到最小頻寬的演算法。接下來則是以先前所提的多視角預測核心演算法為基底,再與在單一視角時的預測子-基準移動估計演算法結合,本論文進一步的提出了能夠針對單一視角以及多視角綜合移動與位移估計且支援快取記憶體架構的完整硬體通用演算法流程。並且進一步的改進了原先的移動向量預測的演算法,使得在高畫質的影像處理中所需要的硬體面積以及頻寬能夠更進一步的下降。使用本論文所提出的硬體演算法,能將在實作HDTV規格之視訊壓縮預測核心硬體架構時所需的記憶體大小降低至18.3-20.4%,而所需的系統頻寬則可降低至53.2-95.8%。 在本章節的最後,將介紹根據先前所介紹的硬體通用演算法所實作的硬體架構設計以及最大解析度支援至超高畫質4096x2160畫素單一視角以及高畫質1920x1080立體視角,1280x720多視角的『多視角立體視訊及超高畫質H.264/AVC整數點移動估計加速器』的晶片成果。

並列摘要


Multiview and stereo video can bring the viewers a 3D and real perceptual experience by transmitting different video sequences simultaneously on the display. By special multiview displays, different views are projected to different eyes of viewers. As the display technology growing, more and more related applications, like 3D-TV and free-viewpoint TV (FTV) are closer and closer to be realized. Further, the requirement of high quality video is emerged in these years. The high definition (HD) video specifications, like 1920x1080 pixels and 1280x720 pixels, are strongly recommend for the advanced video applications including the multiview video applications. In order to make the multiview applications practicable, an efficient multiview video coding (MVC) scheme is needed. As the reference software and research platform, the joint multiview video model (JMVM) is released by the MPEG 3DAV Group. In the JMVM, the H.264/AVC is adopted to the base layer. Moreover, the hybrid motion and disparity compensated prediction is used to further enhance the rate-distortion performance. These constraints are raising the difficulty of the real-time MVC algorithm and architecture implementation than typical H.264 In this thesis, content-aware prediction algorithm with inter-view mode decision for MVC is proposed first. After analyzing and reusing the motion information from the neighboring views, the computational complexity in a MVC prediction core can be reduced to 98.4--99.1\% for ME in most view channels with negligible quality loss of 0.03--0.06 dB in PSNR. Compared with simulcast coding, the proposed algorithm provides coding gain of 0.09--1.44 dB with only 51.4--64.1\% computational complexity. It indicates that the computational redundancy is effectively removed. Second, hardware-oriented algorithm and architecture analysis and implementation are introduced. A system bandwidth analysis scheme of MVC with precedence constraint is proposed at first in this chapter. By adopting the precedence constraint concept from the graph theory, the bandwidth problem in MVC can be solved by selecting the most suitable data-reuse scheme. Then, the proposed MVC motion estimation algorithm is modified and combined with the hardware-oriented predictor-based motion estimation algorithm for the general H.264 encoder. After that, a complete hardware-oriented solution for the prediction core in both single view and multiview video encoder is proposed. Then, by improving the motion vector prediction scheme, the hardware resource requirement on the multiview video coding prediction engine can be further reduced even in high definition or Super-HD cases. With the proposed hardware-oriented algorithm, the on-chip memory requirement is reduced to 18.3--20.4\% and the system bandwidth is reduced to 53.2--95.8\% comparing with the level-C and level-C+ data-reuse scheme. Based on the proposed algorithm and architecture, a "High Definition Multiview Video and Super-HD H.264 Video Integer Motion Estimation Accelerator" single chip design with the largest resolution super-HD 4096x2160 pixels, single view; 1920x1080 pixels, stereo view; and 1280x720 pixels, multiview, are introduced in the end of this thesis.

參考文獻


[2] C. Zhang and T. Chen, “A self-reconfigurable camera array,” in Eurographics
[3] Itaru Kitahara and Yuichi Ohta, “Scalable 3d representation for 3d video display
in a large-scale space,” in Proceedings of Virtual Reality, 2003, 2003.
[4] Peter Hohenstatt, Leonardo da Vinci, 1998.
[8] A. Smolic and P. Kauff, “Interactive 3-D video representation and coding technologies,”

延伸閱讀