透過您的圖書館登入
IP:3.139.62.103
  • 學位論文

立體視訊編碼系統預測核心之演算法與硬體架構設計

Algorithm and Architecture Design of Prediction Core for Stereo Video Coding Systems

指導教授 : 陳良基
共同指導教授 : 簡韶逸(Shao-Yi Chien)

摘要


雙眼視覺系統同時間能夠顯示兩張不同的影像讓左右兩眼接收,使得使用者有更為真實的觀感。隨著立體視覺顯示器之製造技術的日漸成熟,雙視點以及多視點數位視訊技術也愈來愈受矚目。由於其所需處理的資料量備增,因此立體視訊編碼的壓縮技術也更為重要。立體視訊有著傳統的單視點編碼所沒有的特性,如果使用傳統的編碼技術分別針對各個通道的影像作處理,將會大幅的降低編碼的效率;並且極高的運算複雜度也使得實現系統時遇到瓶頸;此外,為了達到即時的需求,在硬體架構的實現上也面臨著種種設計挑戰。 本篇論文中,在演算法方面,我們首先針對立體視訊編碼系統設計了有效率的預測核心。運用雙眼視覺的特性,我們發展了「聯合預測演算法」,並將此技術應用在預測核心之中。此技術能夠有效的提昇編碼效率,在客觀評比上超越MPEG-4 Temporal Scalability以及MPEG-4 Simple Profile 2至3 dB,並且與分開編碼所需運算量比較,平均減少了80%的運算量左右。 在架構設計方面,我們針對預測核心部分提出了有效率的硬體架構,與應用全搜尋移動估計演算法之硬體架構相比,本架構只需11.5%的記憶體以及1/30的運算單元。另外,新提出的排程節省了23%的內嵌記憶體;新提出的演算法亦有效的節省35%的系統頻寬以及不必要的運算量。原型晶片在81MHz的工作頻率下,能夠即時處理左右兩通道各30張SDTV (720x480) 影像的預測運算,搜尋範圍在垂直/水平方向上,為移動估計(Motion estimation):[-64, +63]/[-32, +31];位移估計(Disparity estimation):[-64, +63]/[-16, +15]。本硬體架構透過 CIC實際製作晶片,採用TSMC 0.18 um 1P6M 製程,晶片大小約2.77895x2.77895 mm2,邏輯閘數為137K,且只需要22.75K位元的記憶體。此為研究文獻上第一顆應用於立體視訊編碼系統的預測核心原型晶片。 此外,在系統整合方面,我們為整個立體視訊系統架設了一個軟硬體整合的操作平台,整合了立體視訊影像擷取裝置、數位視訊編解碼部分、網路傳輸部分,以及立體視訊撥放裝置。預測核心部分利用FPGA作硬體加速;其餘部分用軟體處理。此系統除了夠能即時的展示立體影像,帶給使用者立體視覺的觀感,同時也驗証了本論文所提出之演算法及硬體架構的可行性。

並列摘要


Stereo video can make users sense depth perception by showing two frames to each eye simultaneously. It can give users a vivid information about the scene structure. With the technology of 3D-TV getting more and more mature, stereo and multi-view video coding draw more and more attention. However, to build up a stereo video coding system, many design challenges, such as bad coding efficiency and high computational complexity, must be overcome. In this thesis, the algorithm and the architecture design of the prediction core for stereo video systems are proposed. In algorithm level, the proposed joint prediction scheme is composed of three coding tools. They are joint block compensation, MV (motion vector) -DV (disparity vector) prediction, and mode pre-decision. Joint block compensation utilizes the weighted sum of motion- and disparity-compensated blocks, which can make our system outperform MPEG-4 Temporal Scalability and Simple Profile by 2–3 dB in rate-distortion performance. On the other hand, MV-DV prediction, which utilizes the correlation between MVs and DVs, and mode pre-decision which utilizes the characteristics of stereo video, successfully reduce about 80% computational complexity. Besides, in architecture level, a hardwired prediction core architecture design is proposed to provide a cost-effective solution to prediction core implementation in stereo video systems. Proposed scheduling saves 23% on-chip SRAM, and new algorithm for bandwidth reduction successfully reduces 35% data access bandwidth. Compared with the hardware requirement for the prediction core with full search block matching algorithm (FSBMA), only 11.5% of on-chip SRAM and 1/30 amount of processing elements (PEs) are needed. It shows that it is an efficient architecture design. The prototype chip can achieve real-time requirement under the operating frequency of 81 MHz for 30 D1 frames per second (fps) in the left and the right channel simultaneously, with ME (motion estimation)/DE (disparity estimation) search range of [-64, +63] in horizontal direction and [-32, +31]/[-16, +15] in vertical direction. The prototype chip is fabricated by TSMC 0.18 µm 1P6M CMOS process via CIC. The chip size is 2.77895 × 2.77895 mm2. Two-input NAND gate coun is 137K, and only 22.75K bits SRAM are used for such high specifications. Besides, It is the first prediction core chip designed for stereo video coding systems in the world. Moreover, in system integration level, a prototype of real-time stereo video system is successfully built up. The system integrates a stereo video camera, FPGA accelerator, software optimization, network transmission, and 3D display LCD. The demonstration results prove the proposed algorithm and architecture of the prediction core indeed improve the performance of stereo video coding systems.

參考文獻


[1] MPEG-4 Video Group, Requirements on multi-view video coding, Number
[2] S. Pastoor, “3D-television: a survey of recent research results on subjective requirements,”
[3] Itaru Kitahara and Yuichi Ohta, “Scalable 3d representation for 3d video display
in a large-scale space,” in Proceedings of Virtual Reality, 2003, 2003.
[4] Peter Hohenstatt, Leonardo da Vinci, 1998.

延伸閱讀