透過您的圖書館登入
IP:3.138.124.143
  • 學位論文

用於跨相機追蹤系統之非監督式行人重識別

Unsupervised Person Re-Identification in Multi-Camera Tracking

指導教授 : 簡韶逸

摘要


跨相機追蹤是一個關注如何在相機網路中追蹤多重目標的問題。它是打造智慧監視系統不可或缺的關鍵技術之一。這項技術最主要面對的挑戰,來自於物聯網時代所產生出的超巨量資料。如何減少處理這些資料所需的運算量和傳輸頻寬是現今非常熱門的研究主題。在本篇論文中,我們針對跨相機追蹤提出三個貢獻。 首先,我們提出一個非監督式學習的行人重識別演算法,來幫助跨相機追蹤系統在沒看過的環境也能運作。行人重識別專注於用圖像特徵來辨識行人。雖然現有的行人重識別演算法已經能夠達成不錯的成果,但是大部分的演算法都需要有該環境下的資料標籤才能訓練其模型。並且我們還發現,及使用多重環境資料所訓練出的模型也不一定能在沒看過的環境達成良好的結果。因此,我們提出名為C3M的方法來幫助我們在沒有資料標籤的環境中,自動取得訓練資料。C3M包含了三個收集訓練資料的方法:跨軌跡收集、跨相機收機、跨場域收集。這三種方法善加利用了圖像資訊以外的背景資訊(例如時空資訊、場域資訊)來推測兩張圖片的標籤是否屬於同一人。透過以上蒐集到的訓練資料,我們的模型使用兩階段的訓練方式,在沒有資料標籤的環境中循序漸進的學到具有鑑別力的圖像特徵。我們透過詳盡的實驗證明我們的方法不僅超越了現有最頂尖的非監督式學習方法,甚至能夠跟監督式學習的演算法一較高下。 第二,我們提出一套衡量方式來評測以軌跡為基底的跨相機追蹤演算法。我們將跨相機追蹤演算法轉換成為一個群聚問題,來阻絕單一相機內偵測、追蹤行人所產生出的誤差。如此一來,我們的衡量方式便可以準確的量測跨相機的追蹤表現。我們先從假設性的小範例中展示我們的衡量方法較現有的方法更好之處,並且也從實際的實驗數據中證實我們的方法能準確量測出演算法的表現。 最後,我們提出一個適用於物聯網硬體的分散式跨相機追蹤系統。在這個系統的最核心之處,我們使用分散式的跨相機追蹤框架。我們透過終端之間互相交換圖像以及時空的資訊,來達成以軌跡為基底的跨相機追蹤。如此一來,我們便省去昂貴的中央伺服器,同時也能減少傳輸頻寬上的負擔。另外,我們也將這套系統實作在行動式處理器上來展示這個概念的可行性。在詳細的軟體以及硬體分析當中,我們的系統展現了即時的跨相機追蹤能力,同時也達成傑出的追蹤表現。

並列摘要


Multi-Camera Tracking focuses on tracking multiple targets in a camera network. It is the critical underlying technology for building intelligent surveillance systems. The main challenge of this task originates from the quantity of data generated by the Internet of Things. It is a hot research topic of how to extract useful information from such ultra big data and reduce the cost of both computation and transmission at the same time. In this thesis, we present three contributions to the Multi-Camera Tracking community. First, we propose an unsupervised person Re-Identification method to facilitate Multi-Camera Tracking in unseen environments. Person re-identification addresses the problem of recognizing people across cameras with visual appearance. While existing supervised methods yield promising results on labeled datasets, such supervised setting is impractical to unlabeled domains. In addition, training a general model on multiple datasets does not guarantee satisfactory performance on unseen domains. To solve the problem, we propose C3M to mine training data from unseen domains. C3M, which comprises Cross-Track Mining, Cross-Camera Mining, and Cross-Domain Mining, takes advantage of the context (e.g. space-time and domain) information to discover positive and negative pairs. We progressively learn discriminative features with the extracted training data through a two-stage optimization process. Extensive experiments show that our method not only outperforms existing unsupervised methods but is also comparable to the state-of-the-art supervised methods. Second, we present a set of new evaluation measures for benchmarking Track-based Multi-Camera Tracking. We propose to isolate the error from Detection and Single-Camera Tracking by formulating cross-camera association as a clustering process. F-measure can then be used to evaluate tracking performance. We demonstrate in toy examples that our proposed measures provide notable advantages over previous ones. Real-world experiments also reveal that our proposed measures can accurately measure the performance of Track-based Multi-Camera Tracking. At last, we introduce a distributed Multi-Camera Tracking system to process surveillance data efficiently in an Internet of Things infrastructure. At the core of this system, we accomplish Track-based Multi-Camera Tracking by exchanging visual features and spatial-temporal information between edge devices. By doing so, we eliminate the need for expensive centralized servers and save precious transmission bandwidth. In addition, our framework is also general enough to incorporate any person Re-Identification algorithms. We implement the framework on mobile processors to demonstrate the viability of such a concept. We carry out tracking evaluation as well as efficiency analysis on both software and hardware. Our framework achieves outstanding performance and is able to run in real-time on mobile hardware.

參考文獻


W. Chen, L. Cao, X. Chen, and K. Huang, “An equalized global graph model-based approach for multicamera object tracking,” IEEE Transactions on Circuits and Systems for Video Technology (TCSVT), vol. 27, no. 11, pp. 2367–2381, 2017.
C.-W. Wu, M.-T. Zhong, Y. Tsao, S.-W. Yang, Y.-K. Chen, and S.-Y. Chien, “Track-clustering error evaluation for track-based multi-camera tracking system employing human re-identification,” in Proceedings of IEEE Conference on Computer Vision and Pattern Recognition Workshop (CVPRW). IEEE, 2017, pp. 1416–1424.
M. Kristan, A. Leonardis, J. Matas, M. Felsberg, R. Pflugfelder, L. Cˇ . Zajc, T. Voj´ır, G. Bhat, A. Lukeˇziˇc, A. Eldesokey et al., “The sixth visual object tracking vot2018 challenge results,” in Proceedings of European Conference on Computer Vision (ECCV). Springer, 2018, pp. 3–53.
A. Milan, L. Leal-Taix´e, I. Reid, S. Roth, and K. Schindler, “Mot16: A benchmark for multi-object tracking,” arXiv preprint arXiv:1603.00831, 2016.
S. Zhang, E. Staudt, T. Faltemier, and A. K. Roy-Chowdhury, “A camera network tracking (camnet) dataset and performance baseline,” in Proceedings of IEEE Winter Conference on Applications of Computer Vision (WACV). IEEE, 2015, pp. 365–372.

延伸閱讀