簡易檢索 / 詳目顯示

研究生: 陳桓杰
Chen, Huan-Chieh
論文名稱: 植基於深度學習於辨識物體之卷積神經網路架構
Objects Detection Based on Deep Learning with A Convolutional Neural Network Architecture
指導教授: 蔡正發
Tsai, Cheng-Fa
學位類別: 碩士
Master
系所名稱: 管理學院 - 資訊管理系所
Department of Management Information Systems
畢業學年度: 106
語文別: 中文
論文頁數: 67
中文關鍵詞: 卷積神經網路深度學習影像辨識
外文關鍵詞: convolutional neural network, deep learning, image recognition
DOI URL: http://doi.org/10.6346/THE.NPUST.MIS.023.2018.F08
相關次數: 點閱:99下載:2
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統
  • 近年來深度學習是熱門研究領域,常用於影像物體辨識、語言辨識、醫療疾病辨識等,輔助人類進行作業,甚至能夠取代人力作業,達到完全自動化生產。卷積神經網路是目前深度學習一種框架,有許多研究提出網路模型,針對物體使用對應的模型提取特徵,藉由深度學習使機器如同人類辨識事物,創造更多可能性。
    本研究利用Object Detection API建置即時物體辨識的模型,學習建置表現良好模型與進行觀察比較,由ImageNet下載的小型資料集進行訓練與測試,最後比較MobileNet、Inception v2、Inception v3結果,發現在10類資料集裡,以Inception v2表現較優良。

    In recent years, deep learning is popular research area. Deep learning technology is applied to image object recognition, language recognition, medical disease identification, and so forth. This technology can assist humans to work and even replace manual work.
    Nowadays, convolution neural network is a framework for deep learning. Many researchers are working to build models of convolution neural network. Using corresponding models to extract features for object. Deep learning enables machines to recognize objects as humans and create more possibilities.
    In this research, we use the Tensorflow Object Detection API to build the models of real-time object recognition. To learn how to build efficient models and compare them. Our datasets are downloaded from ImageNet. Finally, the results of MobileNet, Inception v2, and Inception v3 are compared. In testing 10 classes datasets, Inception v2 is the best among three of them.

    目錄
    摘要...................................................... I
    Abstract..................................................II
    謝誌......................................................III
    目錄...................................................... IV
    圖索引....................................................VII
    表索引....................................................X
    第壹章 緒論...............................................1
    第一節 研究背景...........................................1
    第二節 研究流程...........................................2
    第三節 研究範圍與限制.....................................3
    第四節 論文架構...........................................3
    第貳章 文獻探討 ..........................................4
    第一節 人工神經網路(Artificial Neural Networks, ANNs).....4
    第二節 卷積神經網路(Convolutional Neural Network, CNN)....5
    一、 卷積層(Convolutional layer)..........................6
    二、 線性整流層(Rectified Linear Units layer, ReLU layer).12
    三、 池化層(pooling layer)................................15
    四、 全連接層(Fully connected layer)......................17
    五、 損失函數層(loss layer)...............................17
    第三節 歷年著名模型與方法.................................18
    一、 LeNet-5 .............................................18
    二、 AlexNet..............................................19
    三、 VGG-16 and VGG-19....................................20
    四、 R-CNN、Fast R-CNN、Faster R-CNN......................21
    五、 YOLO ................................................22
    六、 SSD .................................................23
    七、 GoogLeNet Incepetion v1..............................25
    八、 Inception v2 ........................................27
    九、 Inception v3 ........................................28
    十、 MobileNets...........................................32
    第參章 研究方法 ..........................................35
    第一節 研究概念...........................................35
    第二節 執行步驟...........................................36
    一、 訓練階段.............................................37
    二、 測試階段.............................................40
    第肆章 實驗結果 ..........................................42
    第一節 實驗環境...........................................42
    第二節 資料集.............................................43
    第三節 模型與參數設定.....................................47
    第四節 實驗結果...........................................48
    一、 評估方式.............................................48
    二、 模型比較.............................................51
    三、 僅訓練一類別之結果 ..................................56
    四、 辨識系統畫面.........................................57
    第伍章 結論與未來展望 ....................................60
    第一節 結論...............................................60
    第二節 未來展望...........................................62
    參考文獻..................................................64
    作者簡介..................................................67

    參考文獻
    [1] G. Gando, T. Yamada, H. Sato, S. Oyama, and M. Kurihara, "Fine-tuning deep convolutional neural networks for distinguishing illustrations from photographs," Expert Systems with Applications, vol. 66, pp. 295-301, 2016.
    [2] S. Zheng, J. Xu, P. Zhou, H. Bao, Z. Qi, and B. Xu, "A neural network framework for relation extraction: Learning entity semantic and relation pattern," Knowledge-Based Systems, vol. 114, pp. 12-23, 2016.
    [3] K. Kamnitsas et al., "Efficient multi-scale 3D CNN with fully connected CRF for accurate brain lesion segmentation," Medical Image Analysis, vol. 36, pp. 61-78, 2017.
    [4] T. Kooi et al., "Large scale deep learning for computer aided detection of mammographic lesions," Medical Image Analysis, vol. 35, pp. 303-312, 2017.
    [5] J. Huang et al., "Speed/accuracy trade-offs for modern convolutional object detectors," CoRR, abs/1611.10012, 2016. [6] W. S. McCulloch and W. Pitts, "A logical calculus of the ideas immanent in nervous activity," The bulletin of mathematical biophysics, vol. 5, no. 4, pp. 115-133, 1943.
    [7] F. Rosenblatt, "The Perceptron: A Probabilistic Model for Information Storage and Organization in the Brain, Cornell Aeronautical Laboratory, Psychological Review," Psychological Review, vol. 65, no. 6, pp. 386-408, 1958.
    [8] D. E. Rumelhart, G. E. Hinton, and R. J. Williams, "Learning representations by back-propagating errors," Nature, vol. 323, p. 533, 10/09/online 1986.
    [9] Y. Lecun, L. Bottou, Y. Bengio, and P. Haffner, "Gradient-based learning applied to document recognition," Proceedings of the IEEE., vol. 86, no. 11, pp. 2278-2324, 1998.
    [10] A. Krizhevsky, I. Sutskever, and G. E. Hinton, "ImageNet Classification with Deep Convolutional Neural Networks," Advances in Neural Information Processing Systems 25, pp. 1097-1105, 2012.
    [11] K. Simonyan and A. Zisserman, "Very Deep Convolutional Networks for Large-Scale Image Recognition," CoRR, abs/1409.1556, 2014.
    [12] D. Scherer, A. Müller, and S. Behnke, "Evaluation of Pooling Operations in Convolutional Architectures for Object Recognition," Proceedings of the 20th international conference on Artificial neural networks: Part III, pp. 92-101, 2010.
    [13] B. Graham, "Fractional Max-Pooling," CoRR, abs/1412.6071, 2014.
    [14] J. T. Springenberg, A. Dosovitskiy, T. Brox, and M. A. Riedmiller, "Striving for Simplicity: The All Convolutional Net," CoRR, abs/1412.6806, 2014.
    [15] D. P. Kingma and J. Ba, "Adam: A Method for Stochastic Optimization," CoRR, abs/1412.6980, 2014.
    [16] R. Girshick, J. Donahue, T. Darrell, and J. Malik, "Rich feature hierarchies for accurate object detection and semantic segmentation," CoRR, abs/1311.2524, 2013.
    [17] J. R. R. Uijlings, K. E. A. van de Sande, T. Gevers, and A. W. M. Smeulders, "Selective Search for Object Recognition," International Journal of Computer Vision, vol. 104, no. 2, pp. 154-171, 2013.
    [18] R. B. Girshick, "Fast R-CNN," CoRR, abs/1504.08083, 2015.
    [19] S. Ren, K. He, R. B. Girshick, and J. Sun, "Faster R-CNN: Towards Real-Time Object Detection with Region Proposal," CoRR, abs/1506.01497, 2015.
    [20] J. Redmon, S. K. Divvala, R. B. Girshick, and A. Farhadi, "You Only Look Once: Unified, Real-Time Object Detection," CoRR, abs/1506.02640, 2015.
    [21] P. F. Felzenszwalb, R. B. Girshick, D. McAllester, and D. Ramanan, "Object Detection with Discriminatively Trained Part-Based Models," IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 32, no. 9, pp. 1627-1645, 2010.
    [22] W. Liu et al., "SSD: Single Shot MultiBox Detector," CoRR, abs/1512.02325, 2015.
    [23] L. C. Chen, G. Papandreou, I. Kokkinos, K. Murphy, and A. L. Yuille, "Semantic Image Segmentation with Deep Convolutional Nets and Fully Connected CRFs," CoRR, abs/1412.7062, 2014.
    [24] L. C. Chen, G. Papandreou, I. Kokkinos, K. Murphy, and A. L. Yuille, "DeepLab: Semantic Image Segmentation with Deep Convolutional Nets, Atrous Convolution, and Fully Connected CRFs," CoRR, abs/1606.00915, 2016.
    [25] F. Yu and V. Koltun, "Multi-Scale Context Aggregation by Dilated Convolutions," CoRR, abs/1511.07122, 2015.
    [26] C. Szegedy et al., "Going Deeper with Convolutions," CoRR, abs/1409.4842, 2014.
    [27] S. Ioffe and C. Szegedy, "Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift," CoRR, abs/1502.03167, 2015.
    [28] C. Szegedy, V. Vanhoucke, S. Ioffe, J. Shlens, and Z. Wojna, "Rethinking the Inception Architecture for Computer Vision," CoRR, abs/1512.00567, 2015.
    [29] A. G. Howard et al., "MobileNets: Efficient Convolutional Neural Networks for Mobile Vision Applications," CoRR, abs/1704.04861, 2017.
    [30] P. Henderson and V. Ferrari, "End-to-end training of object class detectors for mean average precision," CoRR, abs/1607.03476, 2016.
    [31] M. Everingham, L. V. Gool, C. K. I. Williams, J. Winn, and A. Zisserman, "The PASCAL Visual Object Classes (VOC) challenge," 2010.

    下載圖示
    QR CODE