透過您的圖書館登入
IP:3.12.71.237
  • 學位論文

用於即時跨相機追蹤系統之具困難樣本校正之非監督式行人重識別

Unsupervised Person Re-Identification with Hard Samples Rectification in a Real-time Multi-Camera Tracking System

指導教授 : 簡韶逸

摘要


跨相機追蹤在智慧城市中是一個關鍵的技術,目標為在一個相機網路下追蹤所有出現的行人。然而跨相機追蹤這個問題太過困難,在最後一步行人匹配的步驟中衍伸出了一個新的研究主題,也就是行人重識別。行人重識別的目標在於利用外表的資訊來辨別不同相機的下的每個人。雖然藉助於卷積神經網路的興起,使用監督式學習方法可以得到很好的成績,但是在非監督式領域自適應這個問題上,因為目標域缺少標籤資料,領域自適應仍然具有相當大的挑戰性。除了匹配的準確性之外,能夠實作一個可行的跨相機追蹤系統也是物聯網監控應用的關鍵的一環。然而,隨著物聯網設備的普及,跨相機系統將會需要在邊緣裝置上運行,來減少網路上的延遲以及數據傳輸量,並實現更高效能的即時性應用。 在這篇論文中,我們提出了一個針對行人重識別具被困難樣本校正的非監督式演算法(HSR),解決了聚類分析容易受到困難樣本的影響而表現不佳的問題。我們提出的HSR包含兩個面向,一個是跨相機的困難正樣本收集,能夠幫助辨別不同相機下的同一個人; 另一個是透過檢查局部同質性來區別具有相似外觀的不同人,也就是困難負樣本。藉由我們的兩個面向的訓練方法,可以修正那些困難樣本並且用準確的標籤資料訓練模型以提高性能。我們進行了大量實驗來證明了我們的方法比起現在最先進的非監督式方法表現得還要更好。 此外,我們提出了一個有效的跨相機追蹤系統架構並運行在物聯網硬體上,來證明系統在邊緣裝置上執行的可行性。我們利用系統的傳遞途徑跟系統中每個運算模組的特性,來減少追蹤系統所需要的大量運算資源。藉由有效地分配運算資源,我們所提出的架構可以實現良好的追蹤成積,並且能夠在邊緣裝置上即時的運行。我們提供全面的實驗以說明跨相機系統中每個組件之間的相關性,並證明了我們提出的系統實用性。

並列摘要


Multi-Camera Tracking (MCT) is a crucial technology in an envisioned smart city which aims to track multiple people through a network of cameras. While MCT is a notoriously difficult problem to solve, a popular research topic has derived from the final step of the matching scheme, person re-identification (re-ID), which address the problem of recognizing people across cameras with visual appearance. Although person re-ID has received great improvement due to the rise of the Convolution Neural Network (CNN) with the supervised learning methods, the task of unsupervised cross-domain re-ID is still challenging owing to the lack of labelled data in the target domain. In addition to the matching accuracy, being able to implement an workable MCT system is also critical factor for IoT surveillance applications. However, as IoT devices become more widespread, the MCT system will need to implement on edge devices to reduce network latency and data transmission and enable for more efficient real-time applications. In this thesis, we propose a unsupervised learning scheme of Hard Samples Rectification (HSR) for person re-ID which resolves the weakness of original clustering-based methods being vulnerable to the hard positive and negative samples in the dataset. Our proposed HSR contains two learning facets, an inter-camera mining technique which helps recognize the same person under different camera views (hard positive), and a part-based homogeneity technique that makes the re-ID model identify different person but with similar appearance (hard negative) by examining the local homogeneity. By jointly rectifying the hard samples with our dual-faceted learning scheme, the re-ID model can learn on more accurate hard cases to improve the performance. Extensive experiments on two large-scale benchmarks demonstrate the superiority of our HSR over state-of-the-art methods. Furthermore, we proposed the multi-camera tracking system on a real-world hardware with an efficient framework to demonstrate the viability on edge devices. Specifically, we leverage the system pipeline and the characteristic of each operator of MCT system to eliminate the need for tremendous amount of computational resources. By effectively allocate the computing power, our proposed framework achieves favorable performance and is able to run in real-time on mobile hardware. Comprehensive experiments are conducted to illustrate the correlation between each component in MCT and show the utility of our proposed MCT system.

參考文獻


Y. Sun, L. Zheng, Y. Yang, Q. Tian, and S. Wang, “Beyond part models: Person retrieval with refined part pooling (and a strong convolutional baseline),” in Proceedings of European Conference on Computer Vision (ECCV), 2018, pp. 480–496. 1
G. Wang, Y. Yuan, X. Chen, J. Li, and X. Zhou,“Learning discriminative features with multiple granularities for person re-identification,” in 2018 ACM Multimedia Conference on Multimedia Conference. ACM, 2018, pp. 274–282. 1
Z. Zheng, X. Yang, Z. Yu, L. Zheng, Y. Yang, and J. Kautz, “Joint discriminative and generative learning for person re-identification,” in Proceedings of IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2019, pp. 2138–2147. 1, 11
W. Xiang, J. Huang, X. Qi, X. Hua, and L. Zhang, “Homocentric hypersphere feature embedding for person re-identification,” in Proceedings of IEEE International Conference on Image Processing (ICIP). IEEE, 2019, pp. 1237–1241. 2
P. Peng, T. Xiang, Y. Wang, M. Pontil, S. Gong, T. Huang, and Y. Tian,“Unsupervised cross-dataset transfer learning for person re-identification,” in Proceedings of IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2016, pp. 1306–1315. 3, 11, 21

延伸閱讀