透過您的圖書館登入
IP:3.145.7.116
  • 學位論文

基於深度學習的魚眼相片人物偵測

Deep Learning-based Human Detection for Fisheye Image

指導教授 : 杭學鳴

摘要


由於球型全景相機具有寬視角的優勢,因此非常適合應用在安全監視系統上。然而魚眼鏡頭造成圖像邊緣有強烈的失真,使得物體檢測和分割更加困難。為了解決失真所帶來的影響,我們利用魚眼相片數據集來訓練深度學習架構。魚眼相片數據集主要有兩部分建構而成:第一部分由實際用 Ricoh Theta S 球型相機拍攝的相片所組合而成,並且我們人工標註照片有檢測物件(人)的地方。數據集的第二部分則透過將現有的數據集做魚眼失真模擬。 為了檢測魚眼相片中的人,我們用PVANET 訓練了3個模型: (1) 使用一般照片作訓練集的原始PVANET 模型,(2) 僅使用魚眼照片作訓練集的PVANET 模型以及 (3) 混合一般照片和魚眼照片作訓練集的PVANET模型。而實驗結果顯示,混合訓練的模型有較好的表現。 此外我們加入了另一個分類集合來辨別人是否跌倒,對於醫療保健來說,是相當實際的應用。透過恰當的訓練數據集,我們提出的模型在魚眼照片的人物偵測上平均可以達到90%的準確率。

並列摘要


Because the spherical panoramic camera offers the advantage of a wide angle-of-view, it is suitable for the application in security surveillance. However, the strong distortions near the borders of the fisheye images make object detection and segmentation more difficult. In order to cope with the problem, we build a fisheye image dataset to train the deep-learning algorithms. Two types of fisheye images are created. The first set is captured by using the Ricoh Theta S and then, we label the ground truth using a semi-automatic approach. The second set is created by imposing an artificial fisheye distortion images on the PASCAL VOC dataset images. To detect the human object in a fisheye image, we adopt a well-known CNN architecture, PVANET. We compare 3 models: (1) the original PVANET trained using perspective view image dataset, (2) the PVANET trained using the fisheye image datasets, and (3) the PVANET trained using both perspective view images and fisheye images. The experimental results indicate that the mixed training model gives the best results. Also, we add another class in the training set to detect a fallen human. This is a useful case for the healthcare scenario. With a proper training dataset, we can achieve a high success rate of 90%.

參考文獻


[1] M. L. S. Johansson, "Image processing for pedestrian detection using a high mounted wideangle camera," Master thesis, Chalmers University of Technology,University of Gothenburg, Göteborg, Sweden, 2014.
[2] Y.-C. Su and K. Grauman, "Learning spherical convolution for fast features from 360 imagery," in Advances in Neural Information Processing Systems, 2017, pp. 529-539.
[3] S. Abraham and W. Förstner, "Fish-eye-stereo calibration and epipolar rectification," ISPRS Journal of photogrammetry and remote sensing, vol. 59, pp. 278-288, 2005.
[4] Ricoh Theta S. Available: https://fotocentreindia.com/product/ricoh-theta-s-spherical-vr-digital-camera-in-mumbai-india/
[5] W. Hou, M. Ding, N. Qin, and X. Lai, "Digital deformation model for fisheye image rectification," Optics express, vol. 20, pp. 22252-22261, 2012.

延伸閱讀