二維及三維影像轉換之演算法及架構分析

人類對於視覺效果的追求永不停歇。從黑白電視、彩色電視，直至今日的數位電視，是人類對視覺界限的挑戰，也是科技精益求精的表現。在液晶電視及電漿電視如此蓬勃發展的現在，舊有的交錯式掃瞄電視訊號如何還原為漸進式掃瞄以輸出至此類電視上，畫質和速度對於使用者的影響甚巨。而在液晶電視之外，目前各大廠亦皆已開發出能夠呈現三維立體視覺感的液晶螢幕，也因此預言著下一代的顯示介面將以三維立體視覺為主。在如此的潮流下，立體視覺(Stereo 3D Video)內容的提供及壓縮也將會成為新時代電視的必須。傳統取得立體影像方法，皆需靠額外設備提供資料才得以將這第三維度轉換成深度。但其實有大量的影像資料，在過去早已拍攝成為平面影像供平面電視觀看，這些資料在未來勢必也會大量出現，若搭配立體顯示器卻無法發揮顯示器效果，將會是一種無形的浪費。綜觀人眼視覺的概念，即便不靠雙眼，不靠水晶體多處對焦亦能夠對畫面構築出立體視覺。人眼深度機制包含雙眼視覺、圖像理解及圖像認知層面皆能幫助「看」出物體的深度，也因此IMAX實體電影都需透過專家來進行平面影像至立體影像的轉換。若能參透其中道理並移植至電子產品上，那麼在立體影像的擷取端就可以下更少功夫，直接在電子產品裡將所有的平面影像內容都轉換為立體影像。為了追求極致的畫面品質，本論文針對此多種不同的世代交替，在舊世代的影像訊號上轉換至新世代的規格時，影像訊號的重建與還原做信號處理。在二維影像方面，本論文提出將交錯式掃瞄影像轉換至非交錯掃瞄的可適應性動像補償去交錯方法突破傳統以動態適應性去交錯方法的限制，大幅拉高重建後影像的解析度。而在三維影像方面，本論文不僅是對於如何擷取三維影像做了總匯的整理，更提出一套二維影像至三維影像即時轉換系統，我們所要提出的二維至三維影像轉換系統，即是透過人腦在立體視覺的概念：包含有雙眼視覺、圖像理解及圖像認知等層面，將這些概念實體化為演算法並且進行硬體設計。使得家庭多媒體平台得以嵌入這套系統，可直接將平面影像轉換為立體影像，使得立體影像生成不再要從擷取端就做起，更使得過去龐大的影像資料在未來有更上一層樓的能力。此一系統俱備有幾個部份：三種深度重建工具，深度圖融合，以及可調適性的深度影像內差。三種深度重建工具將不同的人類深度線索轉換為演算法使用，深度圖融合控制不同的重建工具，在不同的場景和內容對不同的重建工具進行融合；深度影像內差器再進行深度影像繪製，畫出左右兩眼的影像，以方便輸出至立體顯示器播放。此數種不同的轉換工具，都需要極大量的以點為基礎的運算，在這方面使用硬體加速有其必要性。最後的系統，是一可進行即時的平面影像至立體影像的轉換器。此轉換器可供家庭多媒體平台廠商嵌入其系統，增加出將所有電視影集、運動節目及電影都轉換成立體的功能，充分和立體顯示器結合，發揮其最大效用。

關鍵字

去交錯；電視後處理；二維至三維影像轉換；深度圖生成；三維影像

並列摘要

Human are pursuing the reality of vision devices. The video devices improve from monochrome television to 3D-LCD today. The video signals also vary in all of these devices. In this dissertation, the video signal conversion for 2D and 3D video are discussed in two different parts: de-interlacing and 2D-to-3D conversion. The deinterlacing methods recover the lost data in temporal and spatial domain of a 2D video sequence. The 2D-to-3D conversion produces the whole dimensional data as the depth map of a 2D video, then it converts the depth map and 2D video into 3D video. The transition between interlaced scanned TV signals and progressive scanned TV signals hindered the quality improvement of the new display panels. Post-processing such as de-interlacing has become a great index for a TV decoder showing its performance. In Part I, three kinds of de-interlacing methods are described first: the intrafield de-interlacing, the motion adaptive de-interlacing, and the motion compensated deinterlacing. Second, for better de-interlaced image quality, we proposed an intra-field deinterlacing algorithm named “Extended Intelligent Edge-based Line Average” (EIELA). Its VLSI module implementation is also stated. Third, for near-perfect de-interlaced image quality, a de-interlacing algorithm using adaptive global and local motion estimation/- compensation is proposed. It consists of the global and local motion estimation/compensation, 4-field motion adaptation, the block-based directional edge interpolation, and the GMC/MC/MA block mode decision module. All defects such as jagged effects, blurring, line-crawling, and feathering are suppressed lower than the traditional methods. Moreover, the true motion information is extracted accurately by the 4-field motion estimation and global motion information. In Part II, we first make a detailed survey for different kinds of 3D video capturing methods. There are three kinds of 3D video capturing methods: the active sensor based methods, the passive sensor based methods, and the 2D-to-3D conversion. After analyzing the previous works, a real-time automatic depth fusion 2D-to-3D conversion system is proposed for the home multimedia platform. In Part III, we tried to convert the binocular, monocular, and pictorial depth cue to depth reconstruction algorithms. Five novel algorithms and hardware architecture are presented. The depth reconstruction algorithms can be classified into three categories: the motion parallax based depth reconstruction which utilizes the binocular depth cue, the image based depth reconstruction which uses the monocular depth cue, and the consciousness based depth reconstruction which map the perspective in pictorial depth cue to depth gradient. After the depth reconstruction, a priority depth fusion algorithm is proposed to integrate all the depth maps. Then a multiview depth image based rendering method is presented to provide multiview image rendering technique for the multiview 3D-LCD. One-dimensional cross search dense disparity estimation is proposed for the motion parallax based depth reconstruction. The fast algorithm utilizes the characteristics of the motion parallax and trinocular cameras. As the motion parallax is induced by camera motion, 1D cross search tends to find better and more smooth results for a true depth map. A symmetric trinocular property for trinocular camera stereo matching is also described. Then a 2D full search dense disparity estimation hardware architecture design is designed for the real-time operation of the motion parallax based depth reconstruction. The dense disparity estimation needs to calculate the disparity vectors of each depth pixel. With the features of resolution switching and high specification, the proposed hardware architecture uses a data assignment unit as a small buffer to achieve a IP-based design. The hardware can be switched to three different depth pixel resolution in real-time. Depth from Focus and short-term motion assisted color segmentation are proposed for the image based depth reconstruction. The DfF method adapts the “blurriness” characteristic while taking pictures with large aperture camera. After extracting the object from the blurring areas, the depth of the object is set to the focus distance of the taken picture. The second image-based depth reconstruction method is the depth map generation by short-term motion assisted color segmentation. It achieves a smooth depth map generation both in the spatial and temporal domain. But both methods would face the moving cameras problem and the tuning of various different type image sequences in the future. They should be combined with the depth from geometry perspective and other depth cues to produce more accurate depth map. For the consciousness based depth reconstruction, we have presented a fundamental detection algorithm based on the structural components analysis with robustness. It is suitable for images with distinct object edges. The proposed method for vanishing line and vanishing point detection provides direct analysis from image structure without complicated math calculation. The proposed method is feasible for a particular image sequence without prior temporal information, and guarantees that dominant vanishing lines are detected correctly with high probability and accuracy. The proposed block-based algorithm which still holds the regular block data flow feature is much faster, simpler and efficient. As for the 2D-to-3D conversion procedure, the proposed vanishing line and point detection gives great help for the overall scene knowledge, and the conversion proceeds more easily. After retrieving all the depth maps from different depth cues, we proposed a priority depth fusion method to integrate the three depth maps. It considers the priority of the depth maps in six aspects: the scene adaptability, the temporal consistency, perceptibility, correctness, fineness, and cover area. In order to obtain a comfortable depth map, the six aspects should be deliberated to decide the priority. We also proposed a per-pixel texture mapping depth image based rendering algorithm which can be accelerated by the GPU. The proposed algorithm converts points to vertices. Then an image plane represents the original frame and depth map is constructed. Through the GPU pipeline, the left and right image can be rendered out. Even for a free viewpoint application, as long as the GPU draws more than 49.6Mtriangles per second, the multiview DIBR still can run at real-time. And the proposed DIBR also speed up the previous design for 38 times. After having these algorithms, an automatic depth fusing 2D-to-3D conversion system is described. The proposed system generates the depth map of most of the commercial video with hardware acceleration. With the calculation of GPU, the depth map and the original 2D image are converted to stereo images for showing on the 3D display devices. Huge amount of the 2D contents such as DVD or TV programs are able to convert and show on the 3D display devices on the enduser side. In summary, this dissertation presents an intra-field de-interlacing hardware architecture, named extended intelligent edge based line average and an adaptive local/- global motion compensated de-interlacing method for the de-interlacing of 2D video. For 2D video signals to 3D video signals, an automatic depth fusing 2D-to-3D conversion system is proposed to utilize the human depth cues to convert 2D video to 3D video. There are five algorithms proposed in this 2D-to-3D conversion system using different depth cues: one dimensional cross search dense disparity estimation, depth map generation with short-term motion assisted color segmentation, block-based vanishing line/point detection, per-pixel multiview depth image based rendering, and priority depth fusion. There is also a hardware architecture of 2D full search dense disparity estimation implemented to combine with the whole 2D-to-3D conversion system. The proposed 2D-to-3D conversion not only produces acceptable depth map for 2D video but also renders multiview video from the depth information and the 2D video.

並列關鍵字

De-interlacing ； TV Post-processing ； 2D-to-3D conversion ； depth map generation ； 3D video

參考文獻

early vision. Computational Models of Visual Processing, MIT Press, Cambridge,

Massachusetts, 1991.

[3] A. Almansa, A. Desolneux, and S. Vamech. Vanishing point detection without any

a priori information. IEEE Transactions on Pattern Analysis and Machine Intelligence,

25:502–507, April 2003.

被引用紀錄

Chen, Y. C. (2009). 二維影像對三維影像轉換系統 [master's thesis, National Taipei University of Technology]. Airiti Library. https://www.airitilibrary.com/Article/Detail?DocID=U0006-2407200918315300

國際替代計量

二維及三維影像轉換之演算法及架構分析

主題瀏覽