透過您的圖書館登入
IP:18.220.11.34
  • 學位論文

三維影像訊號處理之演算法和架構設計

Algorithm and Architecture Design for 3D Video Signal Processing

指導教授 : 陳良基

摘要


隨著數位視訊科技的蓬勃發展,人類對影像品質的追求不斷的提升,從大畫面高解析度的數位電視,已開始進入到能呈現三維立體播放效果的高品質數位電視,隨著各家世界大廠不斷精進的顯示品質,三維立體電視將在我們的日常生活中扮演重要的角色。在此趨勢演進底下,三維立體影像的內容提供將成為新一代顯示器最需克服的難題。然而在拍攝三維立體影像上,以往皆需要特別的儀器設備才能拍攝,如主動式深度感測器與多視角相機,這種方式只適用於新內容的製作。而從二維影像發展以來,有大量的資料皆以二維平面的方式儲存,這些資料並無法在新一代的三維立體顯示器直接顯示出立體效果,因此如何從二維影像去生成三維立體影像,已經成為目前重要的課題。 為了從二維影像還原回三維影像,我們從人眼視覺系統去觀察,在人眼的深度感知中,包含了多種深度線索,如雙眼視差、移動視差、光影、圖像皆可幫助人類去感知畫面的深度。本論文中,我們探討兩大部分,分別是從雙眼影像的視差生成深度、與從單視角影像的深度線索生成深度,並對其提出新型的演算法與架構設計。 在第一部分我們對基於信心傳遞演算法運算雙眼視差的方法做了運算複雜度、頻寬記憶體使用量的分析,信心傳遞演算法為近年在雙眼視差深度生成中,效能表現極高的演算法,但因其需要大量的頻寬、運算量與記憶體,雖然本身擁有可大量平行化的特性,在達到即時深度運算上,仍有許多進步的空間,在本論文中,提出兩種不同的演算法,包含使用區域最佳化的磚塊式信心傳遞演算法、利用平滑代價特性的快速訊息運算法,分別減少了頻寬、記憶體使用量與運算複雜度,對於高畫質的應用,我們也提出了新型的三級管線化架構設計,達到即時運算HDTV720P雙視角影像深度生成之高效能晶片設計。 第二部分,我們探討並嘗試將人眼不同的深度線索資訊取得,並轉換成三維立體電視可以播放的深度生成演算法,我們提出了創新的演算法,包含基於多重深度線索之深度生成法、基於經驗法則之物件化深度指定演算法、與基於人類對光影與色彩之深度感知特性生成三維立體影像之深度演算法,從不同面向來解決這個問題,並提出了一個可展示的系統平台,結合三維立體播放套件,利用多核心中央處理器與CUDA圖形加速器的多執行序運算分工達到HDTV1080P即時影像轉換之展示系統。

並列摘要


Digital video technology has played an important role in our daily life. With the evolution of the display technologies, display systems can provide higher visual quality to enrich human life. Emerging 3D displays provide better visual experience than conventional 2D displays. 3D technology enriches the contents of many applications, such as broadcasting, movie, gaming, photographing, camcorder, education, etc. In this dissertation, the video signal conversion for 3D image and video are discussed in two different parts: depth from stereo vision and single view video 2D-to-3D conversion. The depth from stereo vision estimate depth from the correspondences of stereo views. The 2D-to-3D conversion generates the depth map of 2D video, and then uses the depth map to render 2D video to 3D video. Stereo matching can be formulated as an energy minimization problem on a 2D MRF. Among many MRF global optimization method, belief propagation gives high quality and has highly potential to achieve real-time processing. However, because of costly iterative operations and high memory and bandwidth demand, algorithms such as belief propagation conventionally used for stereo matching are computationally expensive for real-time system implementation. In Part I, the background of stereo matching using belief propagation is first described. Second, two kinds of algorithms, called tile-based belief propagation and fast message computation algorithm, which reduce the complexity of the bandwidth, memory, and computation of general BP are proposed to make the real-time processing become possible. Third, an efficient VLSI architecture of real-time, high-performance stereo matching is presented. The design combines the fast message computation method with the tile-based BP to create a parallel and flexible architecture. The VLSI architecture benefits from the proposed hardware design techniques that help reduce the bandwidth consumption and improve the efficiency of stereo matching. These techniques include a 3-stage pipeline, fully-parallel processing elements for message update, and a boundary message reuse scheme. When operating at 227 MHz, the architecture can generate HDTV720p disparity maps at 30 fps. In Part II, we try to generate depth map from single view content. Three kinds of algorithms are proposed. The first algorithm uses three depth cues based on motion parallax, geometrical perspective, and color. The depth cue based algorithm is computation extensive. Therefore, the second algorithm uses a new concept that applies a prior hypothesis to assign the depth of grouped object without doing the depth cue extraction. The algorithm is suitable for single 2D image. Finally, the third algorithm which uses the human depth perception on color and lighting is proposed. The method has very low computational complexity and low side effect quality. The corresponding real-time demo system is also presented. In summary, this dissertation presents an efficient stereo matching hardware architecture which combined the tile-based BP with the fast message computation method for generation high quality depth map from stereo video. For 2D video to 3D video conversion, three kinds of algorithms are proposed. The algorithms generate depth from depth cues, prior hypothesis, and human depth perception. A demo system of 2D-to-3D conversion system that integrated with 3D vision kit is also implemented. The proposed 2D-to-3D conversion can not only produce high quality depth map for 2D video but also can achieve real-time processing in HDTV specification.

參考文獻


[1] C. Fehn "A 3DTV system based on video plus depth information", 37th Asilomar Conf. Signals, Syst. Comp., 2003.
[2] Andr Redert, Marc Op de Beeck, Christoph Fehn, Wijnand IJsselsteijn, Marc Pollefeys, Luc Van Gool, Eyal Ofek, Ian Sexton, and Philip Surman. “ATTEST: Advanced Three-Dimensional Television System Technologies,” Proceedings of the First International Symposium on 3D Data Processing Visualization and Transmission (3DPVT.02), 2002.
[3] S. Geman and D. Geman, “Stochastic Relaxation, Gibbs Distributions, and the Bayesian Restoration of Images,” IEEE Trans PAMI, vol. 6, no. 6, pp. 721-741, 1984.
[4] Y. Boykov, O. Veksler, and R. Zabih, “Fast Approximate Energy Minimization via Graph Cuts,” in Proc. ICCV, vol. 1 pp. 377-384, 1999.
[6] R. Szeliski, R. Zabih, D. Scharstein, O. Veksler, V. Kolmogorov, A. Agarwala, M. Tappen, and C. Rother, “A Comparative Study of Energy Minimization Methods for Markov Random Fields with Smoothness-Based Priors,” IEEE Trans PAMI, vol. 30, no. 6, pp. 1068-1080, 2008.

延伸閱讀