靜態攝影機所拍攝影片內容之相對深度估計

在本篇論文裡，我們提出一套應用於固定式攝影機拍出影片的深度估計系統。藉由對現有的單張影像深度估計方法的擴展，我們希望能從影片中取出有用的時間資訊，將其加入單張影像深度估計演算法來產生更準確的背景深度的估計，接著給予移動物體適當的深度來產生整段影片深度的輸出。在固定式攝影機拍攝的影片之中，我們可以觀測到移動物體和場景中障礙物的遮蔽現象，然而在短時間之內資訊並不足夠，我們需要利用長時間累計的結果來做出正確的遮蔽邊界判斷。因此，我們利用統計的方式將移動物體的底部高度記錄於其所經過的地方，利用一段時間統計出來的結果來判斷靜止前景物體的位置及其深度範圍，我們將這些資訊加入單張影像遮蔽邊界演算法來產生邊界判斷及對應的相對深度輸出。對於移動的物體，我們用背景相減法將它們找出來並做基本的形態學處理，根據追蹤及遮蔽的判斷我們使用不同的方式給予它們深度並得到影片的深度輸出。

關鍵字

深度估計

並列摘要

In this thesis, we propose a method to estimate the depth of a video taken from a static camera. By extending the single image depth estimation algorithm, we explore the temporal knowledge in the video. Combining both single image knowledge and temporal knowledge, we get better depth estimation of the background image. After that, we estimate the proper depth of the moving objects and get the video depth output. In a video, some boundaries can be detected when moving objects are occluded by the static occluding objects. However, short term temporal knowledge is not enough and we need long term temporal knowledge to find them. In our approach, we get the statistics by recording the object bottom height in the pixels the the object has passed. Using this statistical information, we find static occluding objects and their maximum depth. We add these cues into the boundary detection algorithm to get the boundary result and the relative depth. For those moving objects, we use the background subtraction method to identify them and perform morphological operations for post processing. Depending on the states of moving objects, we use different ways to estimate their depth. Finally, we get the depth of the video contents.

並列關鍵字

depth estimation

參考文獻

[2] B. J. Super and A.C. Bovik, “Shape from Texture Using Local Spectral Moments” , IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 17, no. 4, pp. 333–343, April 1995.

[4] A. Torralba and A. Oliva, “Depth Estimation from Image Structure”, IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 24, no. 9, pp. 1226–1238, September 2002.

[5] A. Saxena, S. H. Chung and A. Y. Ng, “3-D Depth Reconstruction from a Single Still Image,” International Journal of Computer Vision, vol. 76, no.1, pp. 53-69, 2008.

[6] A. Saxena, M. Sun and A.Y. Ng, “Make3D: Learning 3-D Scene Structure from a Single Still Image,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 31, no. 5, pp. 824–840, May 2009

[7] P. Felzenszwalb and D. Huttenlocher, “Efficient Graph-Based Image Segmentation,” International Journal of Computer Vision, vol. 59, no. 2, pp. 167–181, 2004.

國際替代計量

靜態攝影機所拍攝影片內容之相對深度估計

全文下載

主題瀏覽