透過您的圖書館登入
IP:18.209.209.28
  • 學位論文

使用物件檢測方法增強影像表示方式

Enhanced Image Representation using Object Detection

指導教授 : 鄭志宏
共同指導教授 : 謝哲光(Jer-Guang Hsieh)

摘要


深度學習網絡已經可以比人類更準確地對影像進行分類,證明了這項技術的力量。然而,當我們感知環境並與環境互動時,我們所做的不僅僅是識別視覺效果。此外,我們視野範圍內的每一件作品都經過本地化和分類。這些是機器無法像人一樣執行更複雜的任務。 幾年前,研究人員開發了 R-CNN (Regional based Convolutional Neural Network),通過利用 CNN 實現的計算機視覺方面的一些進步來解決物件識別、定位和分類問題。R-CNN 主要用於通過使用對像上的一組邊界框以及使用分類器來預測每個識別物件的類別概率來檢測和預測物件。 目前物件檢測方法最重要的進步之一是在以某種方式被阻隔、模糊或扭曲的影像中進行更好的物件檢測。物件檢測也是一項艱鉅的工作,因為它需要能夠準確地找到影像中的東西。 R-CNN 基於這樣一種想法,即在任何給定區域中只有一個主要興趣點。 R-CNN 使用選擇性搜索方法來查找物件並為區域提出建議。語義分割和定位是確定圖片中對像是什麼的重要模塊。研究人員通常使用物件定位方法 - Gradient-Weighted Class Activation Mapping ++(Grad-CAM++),該方法使用梯度和卷積層為影像上的重要點創建定位圖。本論文討論了結合 Grad-CAM++ 與 Mask 區域卷積神經網絡,一種在影像中查找和定位事物的方法 (GC-MRCNN) 。 所提出方法的主要好處是它優於同一領域的所有其他方法,並且可以在無監督環境中使用。所提出的檢測器基於 GC-MRCNN,可以實時發現和分類物件及其模式。此外,所提出的方法比 Grad-CAM 和 Grad-CAM++ 等傳統方法提供了更好的視覺表示。

並列摘要


Deep learning networks can already classify images more accurately than humans, proving the power of this technology. However, as we perceive and interact with the environment, we do a great deal more than just identify visuals. Additionally, each piece inside our range of vision is localized and categorized. These are far more complex tasks that machines cannot perform as well as people. Researchers developed (Regional based Convolutional Neural Network) R-CNN's a few years ago to cope with the problems of object identification, localization, and classification by leveraging some of the advances in computer vision made possible by CNNs. An R-CNN is mainly used to detect and predict the object by using a set of bounding boxes on objects and using a classifier to predict the class probabilities on each recognized object as well. One of the most important advances in object detection methods right now is better object detection in images that are blocked, blurry, or distorted in some way. Object detection is also a hard job because it requires being able to find things in images accurately. R-CNN is based on the idea that there will only be one main point of interest in any given area. R-CNN uses a selective search method to find objects and make proposals for regions. Semantic segmentation and localization are important modules for figuring out what an object is in a picture. Researchers usually use the object localization approach-Gradient-Weighted Class Activation Mapping ++ (Grad-CAM++), which uses a gradient and a convolution layer to create a localization map for important spots on an image. This thesis talks about Combined Grad-CAM++ with the Mask Regional Convolution Neural Network, a way to find and locate things in images (GC-MRCNN). The main benefit of the proposed method is that it outperforms all other methods in the same field and can be used in unsupervised settings. The proposed detector, which is based on GC-MRCNN, can find and classify objects and their patterns in real-time. Also, the proposed method gives a better visual representation than traditional methods like Grad-CAM and Grad-CAM++.

並列關鍵字

R-CNN Object detection Mask R-CNN GC-MRCNN Grad-CAM++

參考文獻


[1] J. B. Li, F. R. Schmidt, J. Z.Kolter. “Adversarial camera stickers: A physical camera-based attack on deep learning systems,” in Proceedings of the 36th International Conference on Machine Learning, Long Beach, California, PMLR 97, 2019. pp. 3896–3904.
[2] R. R. Selvaraju, M. Cogswell, A. Das, R. Vedantam, D. Parikh and D. Batra, "Grad-CAM: Visual Explanations from Deep Networks via Gradient-Based Localization," 2017 IEEE International Conference on Computer Vision (ICCV), 2017, pp. 618-626, Doi: 10.1109/ICCV.2017.74.
[3] B. Zhou, A. Khosla, A. Lapedriza, A. Oliva, A. Torralba, “Learning Deep Features for Discriminative Localization,” 2015 In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, NV, USA,2015.
[4] M. Lin, Q. Chen, S. Yan,” Network In Network,” IN Proceeding of International Conference on Learning Representations, arXiv 2014, arXiv:1312.4400v3.
[5] C. Szegedy, W. Liu, Y. Jia, P. Sermanet, S. Reed, D. Anguelov, D. Erhan, V. Vanhoucke, A. Rabinovich,” Going Deeper with Convolutions,” 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR),2015, pp.1-9, arXiv 2014, arXiv:1409.4842v1.

延伸閱讀