透過您的圖書館登入
IP:3.134.114.52
  • 學位論文

基於影像分割和多重影像線索之深度估計

Segment-based Depth Estimation of Single Image from Multiple Cues

指導教授 : 李明穗

摘要


在電腦視覺和計算機圖學的領域裡,深度估計是非常重要的步驟,不僅為許多應用的前處理,也能輔助其他影像和視訊的研究,而人眼雖然能憑一張影像就能判斷物體的遠近關係,但對電腦而言,卻始終是難解的問題。若估計的深度圖直接採用深度攝影機的深度值做學習,會難以反應人眼所感受到的視覺效果,因此當3D電影產業缺少原始的深度資訊時,仍舊會以人工的方式標記出2D影像的深度圖,因此,本篇論文提出了基於影像分割並利用單張影像的深度線索改善深度圖的估計方法。 首先,原始影像會透過場景分析,將影像分割成不同的區塊,再進行絕對距離的估計(消失點偵測),以及影像種類的分類,以利後面的步驟分別對三種影像類型做獨特的線索萃取。第二項的主要步驟是透過影像分割後區塊的T字交界處、共有邊緣以及室內房間的骨架和標記的物體,作不同區塊相對關係的深度線索評量。而最後一步是利用前面所取得的相對深度線索做深度排序,再利用絕對資訊作調整後輸出最後的深度圖。 實驗結果的部分,我們將輸出的深度圖和近年的一些論文做量化和品質化的比較,以本論文的方法所生成的深度圖可以正確且合理的反應人眼視覺上所見到影像裡物體的深度差。為了驗證實用性,我們也利用原圖和深度圖生成三維立體圖片以提供更多元的比較。

並列摘要


Depth estimation, also called 3D information reconstruction from a single monocular image is a pivotal problem in computer vision and computer graphics. It also leads to improvements in existing vision tasks, as well as the preprocessing for a variety of real-world applications. However, if the predicted depth is learned by the ground truth gauged by the depth sensor, the depth map does not always correspond to human perception. If the 3D movie industries lack of real depth value of images, they will label the depth map by hand. Consequently, this thesis aims to improve the perceptual depth estimation. A fully-automatic system of depth estimation based on segments is proposed. First, images are partitioned into several segments. Then the vanishing point detection is applied to extract the global information. After classifying images, there are in total three types of scene in our system: images without absolute information, indoor images with vanishing point and outdoor images with vanishing point. The corresponding perceptual depth cues are measured by unique image types. Second, relative depth estimation is applied to find depth cues (the relationship of segments) through T-junctions, shared boundary, spatial layout and object labeling. Finally, the output depth map is generated based on depth ordering and absolute information from vanishing point. The experimental results show that the proposed method can estimate depth successfully. Besides, the resultant depth maps are comparable to other work by different quantitative metrics and qualitative results. In order to verify the practicability, the estimated depth maps are utilized to reconstruct 3D images which meet human perception.

參考文獻


[2] B. C. Russell and A. Torralba, "Building a database of 3d scenes from user annotations," in IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Miami, 2009.
[4] L. Ladický, J. Shi and M. Pollefeys, "Pulling Things out of Perspective," in IEEE Conference on Computer Vision and Pattern Recognition, Washington, 2014.
[5] K. Karsch, C. Liu and S. B. Kang, "Depth Extraction from Video Using Non-parametric Sampling," in European Conference on Computer Vision, Florence, Italy, 2012.
[6] F. Liu, C. Shen and G. Lin, "Deep convolutional neural fields for depth estimation from a single image," in IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Boston, 2015.
[7] X. Chen, Q. Li, D. Zhao and Q. Zhao, "Occlusion cues for image scene layering," Journal of Computer Vision and Image Understanding, vol. 117, no. 1, pp. 42-55, 2013.

延伸閱讀