基於深度學習的物件追蹤算法與海豚偵測與識別

本論文包含兩部分，第一部分有關class-agnostic tracking。本論文提出兩個演算法，希望透過深度學習對於時序和影像資料的豐富特徵，解決目前class-agnostic tracking領域的問題。第二部分是將電腦視覺技術應用在環境保育領域，針對海豚資料的偵測及辨識。在trakcing上，本論文中提出FasterMDNet和RDisp。FasterMDNet將MD-Net中時間複雜度極高的online training以根據RNN為基礎的模型更新策略取代，並以重複的online training和back-propagation through time (BPTT)來訓練。Faster-MDNet比起MDNet，時間節省大約十倍左右。RDisp將已訓練好的物件偵測模型，加上ConvRNN，建立追蹤模型。此外，由於BPTT在訓練RDisp的缺陷，本論文提出以兩階段片段訓練取代BPTT來訓練RDisp。RDisp在GPU上的執行速度大約是25 fps，並能夠克服多種常見的影像變化。在海豚偵測與辨識上，主要的兩個演算法是Faster-RCNN和DenseNet。此外，在海豚名稱辨識上，單純的海浪背景讓訓練好的模型無法著重在海豚本身的細節特徵上。本論文提出結合基於深度學習和基於規則的saliency領域的演算法，來偵測海豚的區域，並把海面部分刪除，除去海面的影響。

關鍵字

物件追蹤；卷積遞歸神經網路；物件偵測；圖像分類；細粒度圖像分類； DenseNet ；媽祖魚

並列摘要

The first topic is class-agnostic visual tracking. The proposed algorithms attempt to tackle this problem via strong representative ability of temporal-spatial information in deep-learning techniques. The second topic is dolphin detection and identification. The FasterMDNet and the RDisp are proposed. The FasterMDNet replaces computation-costly online training by an RNN model adaptation and trains RNN model adaptation by repeated online training via back-propagation through time (BPTT). The temporal cost is reduced to around 10 times faster than MD-Net with little sacrificed on accuracy. The RDisp incorporates pretrained detection model with ConvRNN cells. A two-stage clip training is proposed to replace BPTT in training to solve some defects of BPTT. The RDisp runs at 25 frames per second with consistency under multiple circumstances. In the dolphin detection and identification, the trunks are the Faster-RCNN and the Densenet. In classification of dolphin names, interference of sea surfaces distracts the model from details of dolphins. The ensemble of deep-learning based and rule-based saliency detection algorithms with a soft gaussian threshold is proposed to create the dolphin mask to remove the pixels of sea surfaces.

並列關鍵字

visual tracking ； convolutional recurrent neural network ； detection ； classification ； Taiwanese Humpback Dolphin ； DenseNet ； saliency

參考文獻

Chapter 1. Introduction

Google Scholar

[1] H. Nam and B. Han, “Learning multi-domain convolutional neural networks for visual tracking,” CoRR, vol. abs/1510.07945, 2015. arXiv: 1510.07945. [Online]. Available: http://arxiv.org/abs/1510.07945.

Google Scholar

[2] H. Nam, M. Baek, and B. Han, “Modeling and propagating cnns in a tree struc- ture for visual tracking,” CoRR, vol. abs/1608.07242, 2016. arXiv: 1608 . 07242. [Online]. Available: http://arxiv.org/abs/1608.07242.

Google Scholar

[3] M. Everingham, L. Van Gool, C. K. Williams, J. Winn, and A. Zisserman, “The pas- cal visual object classes (voc) challenge,” International journal of computer vision, vol. 88, no. 2, pp. 303–338, 2010.

Google Scholar

[6] K. Simonyan, A. Vedaldi, and A. Zisserman, “Deep inside convolutional networks: Visualising image classification models and saliency maps,” arXiv preprint arXiv:1312.6034, 2013.

Google Scholar

國際替代計量

基於深度學習的物件追蹤算法與海豚偵測與識別

主題瀏覽