透過您的圖書館登入
IP:3.17.184.90
  • 學位論文

數位影像與視訊深度估測演算法

Depth Map Inference for Digital Image and Video

指導教授 : 郭天穎
若您是本文的作者,可授權文章由華藝線上圖書館中協助推廣。

摘要


本論文主要針對單視角數位影像與視訊,提出一套深度估測方法。由於現在3D的顯示技術與設備已商品化,但是以3D相機或攝影機製作的內容還是非常缺乏,而現存的影像與視訊多以2D方式儲存,所以需開發一套2D轉3D的影像與視訊技術,以充份利用3D顯示器。 2D轉3D的時候需先推導出畫面中物體的深度資訊,有了深度資訊則可以利用DIBR(depth-image-based rendering)演算法產生雙眼視覺影像。然而2D轉3D的深度估測,是一種不適定性(ill-posed)問題,針對這種問題,並沒有絕對的正確答案,且視訊深度估測易受多種因素影響干擾,例如雜訊、攝影機移動速率變化、非紋理區域、物件遮擋等。 本論文提出一合理且有效之深度估測以解決上述問題;數位影像的深度估測演算法利用相機投影模型與相機EXIF參數產生實際的深度資訊,並且將初始深度與灰暗通道深度融合成為最終深度資訊。視訊的深度估測演算法結合視差深度 (disparity depth)、光流深度(optical flow depth)與相鄰畫面間的傳遞深度(propagation depth)三種深度依據,以卡曼濾波器(Kalman filter)融合成為一個可靠的深度資訊,最後利用超像素點分割以及時間-空間域平滑處理進行深度一致性的調整,降低非紋理區域受到雜訊干擾的影響,獲得最終估測結果。實驗結果證明本文提出之深度估測方法不需要額外的訓練機制或是需要大量計算複雜度的迭代修正過程,即可以產生舒適與連續的深度估測結果。

並列摘要


This paper presents a novel technique of the depth inference for single-view digital image and video. Nowadays, 3D display and equipments are consumerized in the market, but there is still a huge shortage of contents created by the 3D camera and camcorder, whereas the existing 2D contents are available everywhere. Thus, it is necessary to develop a 2D-to-3D conversion technique to generate the contents and to fully make use of the stereoscopic display. In 2D-to-3D conversion, the depth map has to be inferred first, followed by the depth-image-based rendering (DIBR) technique to generate the binocular images. However, the depth inference is a mathematically ill-posed problem, which has more than one valid solution. In addition, depth inference is sensitive to many factors on the 2D contents, such as the noise, camera motion, textureless region and occlusion. This dissertation aims at providing an efficient and reasonable depth inference algorithm to cope the above problems. For digital image, we employ the camera projection model to infer a reasonable initial depth map from the EXIF parameters of pictures, and generate the final depth by refining this initial depth results with the dark channel model. For digital video, we use Kalman filter to combine the disparity depth, the optical flow depth, and the propagated depth, to generate a reliable depth map. It is then refined by adopting the super-pixel segmentation to remove the depth noise in textureless region and the temporal-spatial filter to enhance the temporal coherence of the depth. The experiments show our inferred depth can approximate the ground truth without training and iterating operations, while providing the satisfied visual results.

參考文獻


[2] C. Fehn, “A 3D-TV System Based on Video Plus Depth Information,” Proc. of 37th Asilomar Conference on signals, systems and Computers, Vol. 2, pp. 1529-1533, Nov. 2003.
[4] K. Mueller, P. Merkle, T. Wiegand, “3-D Video Representation Using Depth Maps,” Proceedings of the IEEE, 2011.
[5] Tao Zhang, “3D Image format identification by image difference,” IEEE Int. Conf. on Multimedia and Expo., pp. 1415-1420, 2010.
[6] C. Fehn, “Depth-image-based Rendering (DIBR), Compression, and Transmission for A New Approach on 3D-TV,” Proc. of the SPIE, vol. 5291, pp. 93-104, 2004.
[8] Yea-Shuan Huang, Fang-Hsuan Cheng and Yun-Hui Liang, “Creating Depth Map from 2D Scene Classification,” 3rd Int’l Conf. Innovative Computing Information and Control, pp.69, 2008.

延伸閱讀