Title

用於監視攝影畫面的物體偵測演算法

Translated Titles

An Object Detection Algorithm for Video Surveillance

DOI

10.6342/NTU201801479

Authors

林柏維

Key Words

電腦視覺 ; 機器學習 ; 影像處理 ; 物體偵測 ; Computer Vision ; Machine Learning ; Image Processing ; Object Detection

PublicationName

臺灣大學土木工程學研究所學位論文

Volume or Term/Year and Month of Publication

2019年

Academic Degree Category

碩士

Advisor

陳柏華

Content Language

英文

Chinese Abstract

近年來物體偵測相關的演算法主要以準確度為主要的研究方向,著 重在能夠更準確的偵測物體的位置,並且增加物體偵測的種類,但是 準確度提高的同時往往也需要更強的運算效能,對於硬體設備的要求 也相對提高,難以大量應用。另外,需要使用到物體偵測的影像中, 有一大部分為固定鏡頭的監視影像,影格與影格之間的僅有部分區域 產生變化,不需要針對完整的畫面重新進行偵測。本研究之目的為提 升針對固定鏡頭之物體偵測效率,並且分析不同準確度要求下,對於 偵測效率的影響。在偵測效率的提升分為兩個部分,一是減少非必要 之感興趣區域 (Region of Interest, RoI) 數量,二是分析不同特徵描述子 對於分類器運算時間以及準確度的影響。 對於減少 RoI 數量的部分,本研究首先採用高斯混合模型 (Gaussian Mixture Model, GMM) 進行前後景分離來移除影像中不變的區域,對 於分離出來的黑白遮罩圖進行線性模糊移除噪音點,再利用角點偵 測演算法 (Features from Accelerated Segment Test, FAST) 偵測不同區塊 前景的邊緣,最後由鄰近的特徵點合併成 RoI。相較於傳統的 sliding window 產生出數十萬個 RoI,以及 Selective Search 產生出約 2000 個 RoI,本研究結果能將 RoI 減少為數十個的情況下依然能保有需要偵測 的物體視窗,並且產生出來的視窗並沒有大小以及長寬比的限制。 在不同特徵描述子對於分類器運算時間以及準確度影響的部分,本 研究採用 AdaBoost 作為分類器,測試了影像梯度、LUV 色彩空間以 及不同切角數量之 HOG 特徵組合,並且用三個影像資料集作為測試 樣本,在精確度由 0.986 下降為 0.9505 的情況下,運算時間下降為原 先的 17%,並且提出不同精確度對應之運算時間的圖表。 在整體的偵測流程上,我們首先提供了不同階段在運算時間所佔的 比例。在純粹使用 CPU 運算的情況下,若使用 OpenCL 進行加速可以 達到平均 160 FPS,在沒有 OpenCL 的情況下也可以達到 60 FPS 的運 算效率。本研究提出之結果可做為往後物體偵測研究中,需要滿足不 同精確度以及不同運算時間之參考。

English Abstract

In recent years, object detection related algorithms mainly focus on im- proving detection accuracy as the main objective. They also focused on more accurately detecting the position of an object, and increasing the number of object classes. However, when improving the detection accuracy, the algo- rithms are becoming more complicated and requires more computing power. Some of them can only be implemented on high-end pieces of equipment. Therefore, they may not always be possible for deployment in real world applications. Besides, a great amount of the videos for object detection are stationed with a fixed angle, which means most pixels stay still between frames. In this case, there is no need for re-detection of the entire image. The purpose of this study is to improve the object detection efficiency for the fixed-angle camera and analyze the tradeoff of detection efficiency and accuracy. There are two main parts for improving the detection efficiency. One is reducing the unnecessary number of Region of Interest (ROI) for the image classification. T he other part is reducing the classification time through analyzing how the image feature descriptors affect the classification time and accuracy. For the part of reducing unnecessary RoI, this research first applies the Gaussian Mixture Model (GMM) to subtract the background and then linear resize the foreground mask image to reduce the noise points. Then we use Features from Accelerated Segment Test (FAST) to detect the edges of each foreground part as feature points. Finally, we use these points to generate our ROIs. Comparing to the traditional sliding window method which could gen- erate about hundreds of thousands of ROI, and Selective Search which will generate about 2000 ROI, the method of this study will only generate ROI within a hundred, reducing the irrelevant ROIs. In addition, The window size and the aspect ratio are not constraints in our approach. In this study AdaBoost was used as the classifier and the combinations of image gradient, LUV color space, and HOG as the feature descriptors. We used three image dataset as the testing data. The result showed that the classi- fication time can be reduced to 17% while the precision only decreased from 98.6% to 95.05%. We also propose the table of the different classification time with its corresponding precision. For the whole detection process, the time ratio of each processing part is proposed. The algorithm we proposed can run average 160 FPS using only CPU with OpenCL acceleration, and 60 FPS without OpenCL acceleration. The result of this study can serve as a reference for object detection research which needs to satisfy certain precision or computing time constraint.

Topic Category 工學院 > 土木工程學研究所
工程學 > 土木與建築工程
Reference
  1. [1] Song Bai, Xiang Bai, and Qi Tian. Scalable person re-identification on supervised smoothed manifold. In CVPR, volume 6, page 7, 2017.
  2. [2] Sławomir Bak and Peter Carr. One-shot metric learning for person re-identification. In IEEE Conference on Computer Vision and Pattern Recognition, 2017.
  3. [3] Rodrigo Benenson, Markus Mathias, Radu Timofte, and Luc Van Gool. Pedestrian detection at 100 frames per second. In Computer Vision and Pattern Recognition (CVPR), 2012 IEEE Conference on, pages 2903–2910. IEEE, 2012.
  4. [4] G. Bradski. The OpenCV Library. Dr. Dobb’s Journal of Software Tools, 2000.
  5. [5] Leo Breiman. Random forests. Mach. Learn., 45(1):5–32, October 2001. ISSN 0885-6125. . URL https://doi.org/10.1023/A:1010933404324.
  6. [6] Xianbin Cao, Changxia Wu, Pingkun Yan ,and Xuelong Li. Linear svm classification using boosting hog features for vehicle detection in low-altitude airborne videos. In Image Processing (ICIP), 2011 18th IEEE International Conference on, pages 2421– 2424. IEEE, 2011.
  7. [7] Zhe Cao, Tomas Simon, Shih-En Wei, and Yaser Sheikh. Realtime multi-person 2d pose estimation using part affinity fields. In CVPR, volume 1, page 7, 2017.
  8. [8] Jiaxin Chen, Yunhong Wang, Jie Qin, Li Liu, and Ling Shao. Fast person re- identification via cross-camera semantic binary transformation. In IEEE Conf. on Computer Vision and Pattern Recognition, 2017.
  9. [9] Weihua Chen ,Xiaotang Chen ,Jianguo Zhang ,and Kaiqi Huang. Beyond tripletloss: a deep quadruplet network for person re-identification. In The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), volume 2, 2017.
  10. [10] Corinna Cortes and Vladimir Vapnik. Support-vector networks. Machine learning, 20(3):273–297, 1995.
  11. [11] Arthur Daniel Costea, Andreea Valeria Vesa, and Sergiu Nedevschi. Fast pedestrian detection for mobile devices. In Intelligent Transportation Systems (ITSC), 2015 IEEE 18th International Conference on, pages 2364–2369. IEEE, 2015.
  12. [12] Thomas Cover and Peter Hart. Nearest neighbor pattern classification. IEEE trans- actions on information theory, 13(1):21–27, 1967.
  13. [13] Ivo M Creusen, Rob GJ Wijnhoven, Ernst Herbschleb, and PHN de With. Color exploitation in hog-based traffic sign detection. In Image Processing (ICIP), 2010 17th IEEE International Conference on, pages 2669–2672. IEEE, 2010.
  14. [14] Navneet Dalal and Bill Triggs. Histograms of oriented gradients for human detection. In Computer Vision and Pattern Recognition, 2005. CVPR 2005. IEEE Computer Society Conference on, volume 1, pages 886–893. IEEE, 2005.
  15. [15] Navneet Dalal and Bill Triggs. Inria person dataset. Online: http://pascal. inrialpes. fr/data/human, 2005.
  16. [16] Piotr Dollár, Zhuowen Tu, Pietro Perona, and Serge Belongie. Integral channel fea- tures. 2009.
  17. [17] Piotr Dollár, Christian Wojek, Bernt Schiele, and Pietro Perona. Pedestrian detection: A benchmark. In Computer Vision and Pattern Recognition, 2009. CVPR 2009. IEEE Conference on, pages 304–311. IEEE, 2009.
  18. [18] Piotr Dollár, Serge J Belongie, and Pietro Perona. The fastest pedestrian detector in the west. In Bmvc, volume 2, page 7. Citeseer, 2010.
  19. [19] Piotr Dollár, Ron Appel, Serge Belongie, and Pietro Perona. Fast feature pyramids for object detection. IEEE Transactions on Pattern Analysis and Machine Intelli- gence, 36(8):1532–1545, 2014.
  20. [20] Fida El Baf, Thierry Bouwmans, and Bertrand Vachon. A fuzzy approach for back- ground subtraction. In Image Processing, 2008. ICIP 2008. 15th IEEE International Conference on, pages 2648–2651. IEEE, 2008.
  21. [21] Pedro F Felzenszwalb, Ross B Girshick, David McAllester, and Deva Ramanan. Object detection with discriminatively trained part-based models. IEEE transactions on pattern analysis and machine intelligence, 32(9):1627–1645, 2010.
  22. [22] Jerome Friedman, Trevor Hastie, Robert Tibshirani, et al. Additive logistic regres- sion: a statistical view of boosting (with discussion and a rejoinder by the authors). The annals of statistics, 28(2):337–407, 2000.
  23. [23] Nir Friedman and Stuart Russell. Image segmentation in video sequences: A prob- abilistic approach. In Proceedings of the Thirteenth conference on Uncertainty in artificial intelligence, pages 175–181. Morgan Kaufmann Publishers Inc., 1997.
  24. [24] David Gerónimo, Angel D Sappa, Daniel Ponsa, and Antonio M López. 2d–3d-based on-board pedestrian detection system. Computer Vision and Image Understanding, 114(5):583–595, 2010.
  25. [25] Ross Girshick. Fast r-cnn. arXiv preprint arXiv:1504.08083, 2015.
  26. [26] Ross Girshick, Jeff Donahue, Trevor Darrell, and Jitendra Malik. Rich feature hi- erarchies for accurate object detection and semantic segmentation. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 580–587, 2014.
  27. [27] Chris Harris and Mike Stephens. A combined corner and edge detector. In Alvey vision conference, volume 15, pages 10–5244. Citeseer, 1988.
  28. [28] Kalyan Kumar Hati, Pankaj Kumar Sa, and Banshidhar Majhi. Intensity range based background subtraction for effective object detection. IEEE Signal Processing Let- ters, 20(8):759–762, 2013.
  29. [29] Kaiming He, Georgia Gkioxari, Piotr Dollár, and Ross Girshick. Mask r-cnn. In Computer Vision (ICCV), 2017 IEEE International Conference on, pages 2980– 2988. IEEE, 2017.
  30. [30] Jonathan Krause, Michael Stark, Jia Deng, and Li Fei-Fei. 3d object representa- tions for fine-grained categorization. In 4th International IEEE Workshop on 3D Representation and Recognition (3dRR-13), Sydney, Australia, 2013.
  31. [31] Dangwei Li, Xiaotang Chen, Zhang Zhang, and Kaiqi Huang. Learning deep context-aware features over body and latent parts for person re-identification. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 384–393, 2017.
  32. [32] Thanh Binh Nguyen, Sun-Tae Chung, et al. A real-time pedestrian detection based on agmm and hog for embedded surveillance. Journal of Korea Multimedia Society, 18(11):1289–1301, 2015.
  33. [33] OpenCV. Opencv change logs. https://github.com/opencv/opencv/wiki/ ChangeLog. Accessed July 10, 2018.
  34. [34] Deepak Kumar Panda and Sukadev Meher. Detection of moving objects using fuzzy color difference histogram based background subtraction. IEEE Signal Processing Letters, 23(1):45–49, 2016.
  35. [35] Joseph Redmon and Ali Farhadi. Yolo9000: better, faster, stronger. arXiv preprint, 2017.
  36. [36] Shaoqing Ren, Kaiming He, Ross Girshick, and Jian Sun. Faster r-cnn: Towards real-time object detection with region proposal networks. In Advances in neural information processing systems, pages 91–99, 2015.
  37. [37] Edward Rostenand Tom Drummond. Machine learning for high-speed corner detection. In European conference on computer vision, pages 430–443. Springer, 2006.
  38. [38] Amr Suleiman, Yu-Hsin Chen, Joel Emer, and Vivienne Sze. Towards closing the energy gap between hog and cnn features for embedded vision. In Circuits and Systems (ISCAS), 2017 IEEE International Symposium on, pages 1–4. IEEE, 2017.
  39. [39] Jasper RR Uijlings, Koen EA Van De Sande, Theo Gevers, and Arnold WM Smeulders. Selective search for object recognition. International journal of computer vision, 104(2):154–171, 2013.
  40. [40] Yi Wang, Sébastien Piérard, Song Zhi Su, and Pierre Marc Jodoin. Improving pedestrian detection using motion-guided filtering. Pattern Recognition Letters, 96:106– 112, 2017.
  41. [41] Zoran Zivkovic. Improved adaptive gaussian mixture model for background subtraction. In Pattern Recognition, 2004. ICPR 2004. Proceedings of the 17th International Conference on, volume 2, pages 28–31. IEEE, 2004.
  42. [42] Zoran Zivkovic and Ferdinand Van Der Heijden. Efficient adaptive density esti- mation per image pixel for the task of background subtraction. Pattern recognition letters, 27(7):773–780, 2006.
  43. [43] 臺北市政府警察局. 臺北市新錄影監視系統第 1 期設置地點清冊. https: //data.gov.tw/dataset/61512. Accessed December 23, 2017.