基於深度學習之雙球型全景相機環景深度圖估測與分析

360度虛擬視角合成在虛擬實境應用中扮演著重要的角色，而深度是重建三維空間的關鍵資訊。在此份研究中，我們使用兩個360度環景相機組成一個360度的立體雙相機系統，由兩個不同視角拍攝全景影像，接著我們利用這兩個球型影像來估測環景深度。我們基於現有的深度學習網路PSMNet開發針對球型影像的深度估測流程，為了訓練球型影像的視差估測網路，我們基於SYNTHIA數據集建構了一個有視差標籤的360度雙相機數據集。此外，我們研究了雙全景相機的深度估測極限。球型影像中的視差是物體對於兩相機的角度差，定義與一般平面影像不同，因此在基線(baseline)上的物體球型視差為零，基於影像的解析度，我們導出了雙環景相機可估測深度的最遠距離。我們也研究了雙球形相機物體被遮擋(occlusion)的問題，推導了可靠深度估測的最小距離。最大與最小距離都與基線(baseline)長度有關，這個特性幫助我們在設計一個立體雙眼相機時，設定一個合適的基線長度。在實驗結果中，我們對合成影像及真實拍攝影像做深度估測，並在有深度標籤的合成影像中評估效能。我們在SYNTHIA測試數據集中達到了2.18%的KITTI D1錯誤率，低於PSMNet在KITTI數據集上的測試結果，最後，我們以Facebook 3D相片的工具及我們估測的深度來合成不同的視角圖，3D 相片呈現了我們在深度估測上良好的表現。

關鍵字

深度估測；深度學習；球型；魚眼；全景；雙眼；立體；視差

並列摘要

The 360 degree virtual view synthesis plays an important role in Virtual Reality and the depth map is the key information to reconstruct the 3D world. In this study, we use two spherical cameras to form a 360° stereo system, which can capture all the surrounding scene in two views. We then use these two spherical images to estimate the spherical depth map. We developed a depth estimation procedure on the spherical stereo images using an existing neural network, PSMNet. To train the network for spherical disparity estimation, we built a panorama stereo image dataset based on the SYNTHIA dataset, which has disparity ground truth. More importantly, we investigated the limits of spherical image depth estimation. Different from the disparity definition on the perspective view stereo, the spherical disparity is measured as the angle difference of the same object point on two views. Thus, the object aligned with the baseline has zero spherical disparity. Due to image plane pixel resolution, the maximum sensing distance for spherical disparity estimation was derived. Also, we studied the occlusion problem of a surface in spherical stereo, and derived the minimum reliable sensing distance. Both distance limits are functions of baseline. These properties help us in choosing an appropriate baseline length for constructing a spherical stereo. In our experiments, we performed depth estimation on both synthetic images and real scene images, and evaluated the performance on synthetic images with the ground truth depth. In the SYNTHIA test set, we can achieve an error rate of 2.18% using the KITTI benchmark D1 error criterion, which is lower than the original PSMNet tested on the KITTI dataset. At the end, we generated the synthetic views using Facebook 3D photo tools and our estimated depth maps. The good subjective quality of the synthesized images indicates that our estimated depth map is rather accurate.

並列關鍵字

Depth estimation ； Deep learning ； Neural network ； Spherical ； Fisheye ； Panorama ； Omnidirectional ； Binocular ； Stereo ； Disparity

參考文獻

[1] Y.-C. Liu, K.-Y. Lin, and Y.-S. Chen, "Bird’s-eye view vision system for vehicle surrounding monitoring," in International Workshop on Robot Vision, 2008, pp. 207-218: Springer.

Google Scholar

[2] B. Zhang et al., "A surround view camera solution for embedded systems," in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops, 2014, pp. 662-667.

Google Scholar

[3] A. Eichenseer, M. Bätz, J. Seller, and A. Kaup, "A hybrid motion estimation technique for fisheye video sequences based on equisolid re-projection," in Image Processing (ICIP), 2015 IEEE International Conference on, 2015, pp. 3565-3569: IEEE.

Google Scholar

[4] A. Eichenseer, M. Bätz, and A. Kaup, "Motion estimation for fisheye video sequences combining perspective projection with camera calibration information," in Image Processing (ICIP), 2016 IEEE International Conference on, 2016, pp. 4493-4497: IEEE.

Google Scholar

[5] S. K. Gehrig, "Large-field-of-view stereo for automotive applications," in Proc. of Workshop on Omnidirectional Vision, Camera Networks and Nonclassical cameras (OMNIVIS2005), 2005.

Google Scholar

國際替代計量

基於深度學習之雙球型全景相機環景深度圖估測與分析

全文下載

主題瀏覽