以具語意之區域為基礎重建單張靜態影像的深度圖

三維立體影像與影音引導人類視覺享受進入了另一個新世代，三維成像與人類真實世界的感官一致，因此預期將會成為新的主流。市面上已有能直接取得三維資訊的攝影機，但造價高昂且普遍性不高。由於希望現有與過往的二維影像與影音也能有立體視覺的效果，二維至三維的轉換技術成為急迫且實用的解決方式。與傳統二維平面影像不同的是，深度資訊是三維立體成像的必要因素。現有的二維至三維轉換的演算法，有一些簡單、低運算量，能應用在即時系統，但經常出現錯誤的結果，或者無法處理單張影像。有一些能產生出合理的結果，缺點是需要花費大量時間在電腦運算上。為了平衡這兩種情況，我們提出了一個有效率地、經由單張影像來重建三維資訊的演算法。在許多用來重建三維資訊的深度特徵中，考慮到廣泛適用性的問題，我們選擇了相對高度特徵(relative height)。輸入一張影像，首先做影像分割將其切割為區塊，這些區塊接著進入到語意(semantic)與表面(surface)的機器學習系統(machine learning system)，為之後的深度估測提供了最主要的資訊。接著，我們找出影像中會被人類最先注意到的主要區塊(salient regions)，並做了地平線的預測，來提升機器學習估測的準確度。此外，人類在影像中常為主要角色，我們添加了以HOG 特徵為基準的人類偵測來補足整個系統。最後使用語意資訊以及相對深度特徵估算出對應的影像深度圖(depthmap)。

關鍵字

二維至三維的轉換；影像切割；場景辨識；相對深度估計

並列摘要

Three-dimensional (3D) image or video has led human vision to a new generation and brought to the next revolution, because the content, provided for 3D display, is much closer to the real world. Since the device which could capture direct 3D content is of high cost and not common at present compared with 2D cameras, also, the tremendous amount of current and past media data in 2D format should be possible to be viewed with a stereoscopic effect. For these reasons, 2D to 3D conversion becomes a practical and urgent solution to meet the requirement for 3D content providers. Different from traditional two-dimensional (2D) content, depth information is necessary for 3D virtual view generation. Many depth cues can be used to reconstruct 3D information from 2D image. To deal with outdoor scene usually with horizon, we choose relative height cue since it is more general. For an input image, we first use an efficient graph-based image segmentation to compute the segmentation regions. These regions are then used to semantic and surface machine learning system, which gives the main information to estimate the depth map. Second, we identify salient regions that are visually more noticeable to provide the semantic information for the subsequent stages. Horizon detection helps us not only to estimate the depth value but also to enhance semantic and surface classification. Moreover, applying human detection to detect human is a big challenge during semantic labeling stage. Depth estimation based on relative height depth cue is accomplished after these processes.

並列關鍵字

2D-to-3D Conversion ； Image Segmentation ； Scene Recognition ； Relative Depth Estimation

參考文獻

[2] D. Scharstein and R. Szeliski, "A Taxonomy and Evaluation of Dense Two-Frame Stereo Correspondence Algorithms," International Journal of Computer Vision, pp. 7-42, 2002.

[3] C. L. Zitnick and T. Kanade, "A Cooperative Algorithm for Stereo Matching and Occlusion Detection," IEEE Transactions on Pattern Analysis and Machine Intelligence, pp. 675-684, 7 2000.

[6] S. Battiato, S. Curti, M. L. Cascia, M. Tortora and E. Scordato, "Depth-Map Generation by Image Classification," in Proc. of SPIE Electronic Imaging, 2004.

[7] S. Battiatoa, A. Caprab, S. Curtib and M. L. Casciac, "3D Stereoscopic Image Pairs by Depth-Map Generation," in Proc. of International Symposium on 3D Data Processing, Visualization and Transmission (3DPVT), 2004.

[8] A. Saxena, M. Sun and A. Y. Ng, "Make3D: Learning 3-D Scene Structure from a Single Still Image," in IEEE Transactions on Pattern Analysis and Machine Intelligence (PAMI), 2008.

國際替代計量

以具語意之區域為基礎重建單張靜態影像的深度圖

主題瀏覽