透過您的圖書館登入
IP:3.12.161.77
  • 學位論文

卷積神經網路影像辨識系統架構設計

Architecture Design of Convolutional Neural Networks for Image Recognition

指導教授 : 陳良基

摘要


電腦視覺的相關研究已經開發多年,因為科技日新月異的關係,使我們進入巨量資料與智慧型裝置的時代,電腦視覺上的應用徹底地改變了每個人的生活。在電腦視覺的演進裡,像是3D電影的體驗,擴增實境(AR)、虛擬實境(VR),在醫療、遊戲以及生活上開啟了許多應用的情境,使人類的生活更加便捷並且多了許多不同的體驗。Google glass的開發讓我們在使用時,可以對周遭事物立即獲得相關的資訊,而這樣的應用情境之下,我們需要可以快速辨識物體的開發項目,因此我們的目標是:使得機器能夠即時解讀影像存在所代表的實質意義。 影像辨識的應用為許多虛擬環境的架設而言是一個很重要的開發,因為透過影像的辨識我們可以進一步理解整個虛擬實境的構造,並進一步與之互動。在日前的機器學習發展中,深度學習是個很熱門的議題,並且實現在許多的應用之上,透過整個神經網路層與層之間訊息的傳遞,取出整個問題最佳的解法。近年來,卷積神經網絡(CNN)已可提供強大的辨識能力,特別是在圖像識別和圖像檢測的應用中。基於CNN的方法在許多應用中取得了巨大的成功,並且已經廣泛用於計算機視覺。然而,它們龐大的計算需求,計算時所需要的資源消耗和記憶體存取的問題使它們難以部署在移動者或嵌入式系統上。因此大量參數的頻寬以及計所需要的記憶體量是架構設計的重點。如果我們想設計高幀速率的工作,讓每一筆數據盡可能達到最大利用是必須的。 在這項工作中,我們設計一個架構,加速卷積層神經網絡中的卷積層以及取樣層為ImageNet大規模做圖像分類。我們首先提出一個對於CNN模型計算時序的分析,結果顯示卷積層是最耗時的。然而在數據量化中使用隨機概率和閾值概率方法的實驗,來降低參數存取時所需要的記憶體用量,以降低頻寬和提高資源利用率。最後,通過一些架構設計在硬件上實現了最先進的CNN(VGG-16)架構,以匹配高幀速率的目標。該系統在200 MHz工作頻率下使用15位元的量化數據,並且實現了13.3 fps,系統的頻寬為1.44 GB / s,相較於先前的方法,透過對於計算數據有著較高的使用率,來達到高幀速的目標。 整體的來說,我們發展出了一個使用卷積神經網路的運算技巧並且可以即時做圖像辨識的系統,同時我們提出可以減少記憶體用量以及頻寬的硬體架構。

並列摘要


In the past few years, various methods have been proposed to solve the problems of image recognition. The system needs to learn the features of every image correctly while training. In recent years, Convolutional Neural Networks (CNNs) have emerged to provide powerful discriminative capability, especially in the world of image recognition and object detection. Methods that are based on CNN have achieved great success in numerous applications and have been widely used in computer vision. However, their massive computation requirements, being resource-consuming, and memory accesses make them hard to be deployed on mobile or embedded systems. The bandwidth of great deal of parameters that are needed to be computed is a big concern for architecture design. If we would like to design a high frame rate work, it is significant to make the utilization of data to be as large as possible. In this work, we design an architecture that accelerates the convolutional layers and max-pooling layers of the network for ImageNet large-scale image classification. We first present an computational timing analysis of CNN models and shows that convolutional layers are the most time-consuming. Then the experiments of using stochastic probability and threshold probability methods in data quantization are efficient for convolutional layers are proposed to reduce the bandwidth and improve the utilization of resources. Finally, a state-of-the-art CNN, VGG-16 is implemented in hardware through some architecture design to match the target of high frame rate. The system achieves 13.3 fps with using 15-bit data quantization, under 200 MHz operating frequency, which performs higher frame rate than previous works. The bandwidth of the system is 1.44 GB/s which has better utilization of data than previous approaches.

參考文獻


[3] “Gaussian function,” https://en.wikipedia.org/w/index.php?title=Gaussian
[6] “Convolutional neural network computational demonstation,” https://
applied to document recognition,” Proceedings of the IEEE, vol. 86, no. 11,
deep convolutional neural networks,” in Advances in neural information pro-
[9] K. He, X. Zhang, S. Ren, and J. Sun, “Deep residual learning for image

延伸閱讀