透過您的圖書館登入
IP:44.222.249.19
  • 學位論文

攝影機網路之目標物追蹤與視覺化顯示

Target Tracking and Monitoring in a Camera Network

指導教授 : 洪一平

摘要


攝影機網路已廣泛應用於視訊安全監控系統中,例如:機場安全監控、車站安全監控、交通流量監控等。其主要優點在於可以監控大範圍的區域。可是隨著攝影機的數量愈來愈多,對於使用者而言,要同時觀看如此多的畫面是非常困難的。在這篇論文中,我們將探討攝影機網路安全監控系統中,於中控室監控時所需面對的主要研究課題。首先,對於跨攝影機間的事件連結,我們提出一自動學習演算法,以進行多攝影機間之目標物自動連續追蹤。於追蹤過程中的攝影機畫面切換,我們提出一主觀式平順轉場技術,以幫助使用者於攝影機切換過程中能持續監控目標物。對於大範圍與高解析度監控之應用,有別於傳統的昂貴設置方式,我們提出一多重解析度顯示設計-大小眼觀察家系統。 對於多攝影機之目標物追蹤研究,我們主要探討多攝影機之監看區域彼此間沒有重疊的情形,其困難點在於如何學習攝影機兩兩之間的時空關係與亮度轉換函式。目前該領域現有技術主要透過事先收集訓練資料並藉由人工點對應關係的方式進行學習,不過其只能應用於短時間監控或是該監控環境不會改變時。當監控環境會逐漸改變時,例如:光線變化,則這些方法會無法適應環境改變以導致追蹤錯誤。而在這篇論文中,我們提出一自動且能適應性學習的演算法,因此更能將方法應用於長時間的安全監控。 對於使用者於攝影機網路監控畫面追蹤目標物。傳統監控系統會於主要監控畫面進行直接畫面切換,可是當我們於多攝影機間進行持續追蹤,畫面會不斷切換。對於使用者而言,頻繁的直接畫面切換會造成很大的監控負擔,會很難去聯想目前使用者在環境中是從哪裡走到哪裡。因此,在這篇論文,我們提出一主觀式平順轉場技術,藉由產生攝影機間切換時的虛擬畫面,以幫助使用者更能了解當攝影機切換時的目標物移動情形。而有別於傳統視訊轉場技術,我們的方法可處理多攝影機間的監控區域是比較不同甚至不重疊的情形。 最後,我們提出一個同時具有大範圍與高解析度監控特性的多重解析度顯示系統–大小眼觀察家。該系統可同時達到高解析度顯示、高畫面更新率與低建置成本,其靈感來自於人眼視覺,只於使用者感興趣的區域顯示高解析度畫面。我們也提出一使用者測試實驗。於該實驗中,我們將所提出系統與現有方法進行比較。而實驗結果顯示,使用我們的系統,確實能有效提升使用者的監控效率。

並列摘要


Camera network have been widely used in visual surveillance applications, such as airport or railway security, traffic monitoring, and etc. The main benefit of multi-camera system is that it can monitor the activities of targets over a large area. However, to security guards or users, the difficulty of monitoring such a system increases with the increase of cameras, especially when the events happen among multiple cameras. In this dissertation, we investigate two major tasks of monitoring in the command center display. One is to track targets in a camera network with computer automation. The other is to develop displaying techniques to help users to monitor the events in a camera network more easily. First, to track targets across networked cameras, we focus on the situations where the view fields of cameras are not necessarily overlapping each other. One of the major problems of tracking across non-overlapping cameras is to learn the spatio-temporal relationship and the appearance relationship, where the appearance relationship is usually modeled as a brightness transfer function. Traditional methods learning the relationships by using either hand-labeled correspondence or batch-learning procedure are applicable when the environment remains unchanged. However, in many situations such as lighting changes, the environment varies seriously and hence traditional methods fail to work. In this dissertation, we propose an unsupervised method which learns adaptively and can be applied to long-term monitoring. Second, when monitoring the tracking activity in the camera network, the traditional surveillance systems usually switch the main camera view from one to another directly, but it makes users difficult to be aware of the trajectory of the target in the environment when switching views many times. In this dissertation, we propose a novel egocentric view transition approach, which synthesizes the virtual views during the period of switching cameras and eases the mental effort for users to understand the events. An important property of our system is that it can be applied to the situations of where the view fields of transition cameras are not close enough or even exclusive. Finally, for large-scale and high-resolution monitoring, we proposed a multi-resolution display with steerable focus, e-Fovea,. Large-scale and high-resolution monitoring systems are ideal for many visual surveillance applications. However, existing approaches have insufficient resolution and low frame rate per second, or have high complexity and cost. We take inspiration from the human visual system and propose a multi-resolution design, e-Fovea, which provides peripheral vision with a steerable fovea that is in higher resolution. In this dissertation, we further present two user studies, with a total of 36 participants, to compare e-Fovea to two existing multi-resolution visual monitoring designs. The user study results show that for visual monitoring tasks, our e-Fovea design with steerable focus is significantly faster than existing approaches and preferred by users.

參考文獻


[59] O. Javed, Z. Rasheed, K. Shafique, and M. Shah, “Tracking across Multiple Cameras with Disjoint Views,” in Proc. IEEE International Conference on Computer Vision, 2003, pp. 952-957.
[15] X. Cao, C. Forlines, and R. Balakrishnan, “Multi-User Interaction using Handheld Projectors,” in Proc. ACM Symposium on User Interface Software and Technology, 2007, pp. 43-52.
[138] W. Zhao, R. Chellappa, A. Rosenfeld, and P.J. Phillips, “Face Recognition: A Literature Survey,” ACM Computing Surveys, 2003, pp. 399-458.
[26] L. W. Chan, W. S. Ye, S. C. Liao, Y. P. Tsai, J. Hsu, and Y. P. Hung, “A Flexible Display by Integrating a Wall-Size Display and Steerable Projectors,” in Proc. International Conference on Ubiquitous Intelligence and Computing, 2006.
[2] P. K. Atrey, M. S. Kankanhalli, and R. Jain, “Timeline-based information assimilation in multimedia surveillance and monitoring systems,” in Proc. ACM Workshop on Video Surveillance and Sensor Networks, 2005, pp. 103-112.

延伸閱讀