貝氏階層式結構於視訊監控之研究與應用

在本論文中，我們提出以貝氏階層式結構為基礎的分析方法，讓視訊監控系統得以用一致的架構，同時分析影像內容以及推論空間中場景的資訊。在真實的場景中，為了實現一套穩健的視訊監控系統，往往會面臨許多挑戰，諸如物體間相互遮蔽、前景物體與背景物體外貌相似而產生的混淆、透視投影所造成的物體形變、陰影的變化、還有外在光線變化造成的影像變異。在這篇論文中，我們發現，透過將空間場景適當的參數化，並同時依據場景模型和擷取到的影像資料來進行分析，系統將能更輕易地處理前面所提及的變異因素。在貝氏階層式架構中，我們透過階層式表示法將以像素特徵為基礎的資訊、以區域影像內容為基礎的資訊、與以物件特性為基礎的資訊，透過機率的方式進行有系統的整合，以支援影像內容的分析與場景資訊的推論。透過所提出的貝氏階層式架構，前面所提到的許多變異因素可以被有效地解決，除此之外，某些變異因素還可進一步變成有效的線索來協助三維場景資訊的推論。在本論文中，我們將貝氏階層式架構實際應用在停車場空位偵測系統以及多攝影機視訊監控系統。在停車場空位自動偵測的系統上，實際的戶外停車場監控場景往往受到許多變因的影響，進而降低了系統的正確性，這些變因包含: (a)戶外變化劇烈的環境光源; (b)陰影的影響; (c)透視法上幾何投影所產生的變形; (d)停放車輛之間產生的相互遮蔽問題。藉由所提出的貝氏階層式結構，我們可以有系統地將前述的許多變因加入停車空位的推論過程中，以降低這些變因對系統效能的影響。我們的貝氏階層式結構透過建立參數化的空間場景模型來描述空間中的遮蔽現象、幾何上的投影變形、以及陰影等變因所形成的影響，同時也將環境光線變化所造成的色彩變動視為一種色彩分類的問題，並藉由分類程序的建立來描述光線的變化。實驗結果顯示，我們的系統可以穩定地偵測空位的位置、有效地標記並區分影像中屬於地面或車輛的區域、確切地標記屬於陰影的區域、以及克服光線變化所衍生的問題。另一方面，在多攝影機視訊監控系統中，我們自動地定位、標記、與對應在不同攝影機監控範圍內的多個物體，同時有效壓抑因為幾何深度上的不確定性所產生的假物體。多攝影機視訊監控系統在真實的應用場景中，往往面臨一些具挑戰性的議題: (a) 場景中未知物體的數量; (b)物體間的相互遮蔽; 以及(c)假物體的出現。有別於過去的方法，我們提出了一套包含資訊整合與場景推論的兩步驟策略。在資訊整合的步驟中，我們整合來自多攝影機的資訊以建立一機率分佈，藉以描述物體出現於地面某一位置的可能性。在場景推論的步驟中，我們應用貝氏階層式結構將場景模型納入考量，透過此結構，我們將物件在影像內的標記議題、物件在多攝影機間的對應議題、以及假物件的消除議題整合為單一的最佳化問題。此外，我們進一步採用期望-最大化架構來調整出更好的物體三維模型，透過貝氏階層式結構與期望-最大化架構的結合，我們可以得到更好的系統效能。實驗結果顯示，我們的系統可以自動地決定場景中的運動物體數量、有效地標記並對應出不同攝影機影像中的多個物體、準確地定位物體在三維場景中的位置、並且能有效地清除假物件。在本論文中，我們驗證了以貝氏階層式結構為基礎的影像分析架構可以有效地應用到視訊監控的分析與應用上。透過此架構，我們將像素層級的色彩資訊、像素間的區域層級資訊、以及以物體為基本單位的物件層級資訊有系統地整合在一起，這樣的整合讓系統可以擁有更多的資訊，並可以針對較複雜的影像內容進行準確的推論分析。

關鍵字

貝氏推論； HASH(0xa0056a4) ；視訊監控； HASH(0xa005794) ；物件偵測

並列摘要

In this dissertation, we present a Bayesian hierarchical framework (BHF) to simultaneously deal with 3-D scene modeling and image analysis in a unified manner. In practice, to develop a robust video surveillance system, many challenging issues need to be taken into account, such as occlusion effect, appearance ambiguity between foreground and background, perspective effect, shadow effect, and lighting variations. In this dissertation, we find a way to handle these challenging issues by modeling 3-D scene in a parametric form and by integrating scene model and image observation together in the inference process. In the proposed hierarchical framework, we systematically integrate pixel-level information, region-level information, and object-level information in a probabilistic way for the semantic inference of image content and 3-D scene status. Under this BHF framework, occlusion effect, appearance ambiguity, perspective effect, shadow effect, and lighting variations can be well handled. Actually, in the BHF framework, occlusion effect, perspective effect, and shadow effect may even provide useful clues to support 3-D scene inference. In this dissertation, the BHF framework is applied to two video surveillance systems: a vacant parking space detection system and a multi-camera surveillance system. In the vacant parking space detection system, the challenges come from dramatic luminance variations, shadow effect, perspective distortion, and the inter-occlusion among vehicles. With the proposed BHF, those issues can be well modeled in a systematic way and can be effectively handled. In detail, the proposed BHF scheme depicts the occlusion pattern, perspective distortion, and shadow effect by building a parametric scene model. On the other hand, the color fluctuation problem caused by luminance variation is treated as a color classification problem. With the BHF scheme, the detection of vacant parking spaces and the labeling of scene status are regarded as a unified Bayesian optimization problem subject to a shadow generation model, an occlusion generation model, and an object classification model. The system accuracy was evaluated by testing over a few outdoor parking lot videos captured from morning to evening. Experimental results showed that the proposed framework can systematically detect vacant parking spaces, efficiently label ground and car regions, precisely locate shadowed regions, and effectively handle luminance variations. On the other hand, in the application of multi-target detection and tracking over a multi-camera system, the main goal is to locate, label, and correspond multiple targets with the capability of ghost suppression over a multi-camera surveillance system. In practice, the challenges of this kind of system come from the unknown target number, the inter-occlusion among targets, and the ghost effect caused by geometric ambiguity. Instead of directly corresponding objects among different camera views, the proposed framework adopts a fusion-inference strategy. In the fusion stage, we formulate a posterior distribution to indicate the likelihood of having some moving targets at certain ground locations. In the inference stage, the scene model is inputted into the proposed BHF, where the target labeling, target correspondence, and ghost removal are regarded as a unified optimal problem subject to 3-D scene priors, target priors, and image observations. Moreover, the target priors are iteratively refined based on an expectation-maximization (EM) process to further improve the system performance. The system accuracy is evaluated via both synthesized videos and real videos. Experimental results showed that the proposed system can systematically determine the target number, efficiently label and correspond moving targets, precisely locate their 3-D locations, and effectively tackle the ghost problem. With simulations over these two applications, we verified that the proposed BHF scheme can be well applied to various kinds of video surveillance applications. This BHF framework provides the flexibility to properly integrate pixel-level, region-level, and object-level information into a unified inference process. With the integrated information from multiple aspects, we will be able to handle more complicated tasks with improved accuracy and robustness.

並列關鍵字

Bayesian Inference ； HASH(0xa0058fc) ； Video Surveillance ； HASH(0xa0059c8) ； Object Detection

參考文獻

[1] Stauffer, C. and Grimson, W., “Adaptive Background Mixture Models for Real Time Tracking,” IEEE Conference on Computer Vision and Pattern Recognition, pp. 246-252, 1999.

[2] P. Power and J. A. Schoonees, “Understanding Modeling Background Mixture Models for Foreground Segmentation,” Image and Vision Computing, pp. 267-271, 2002.

[3] D. Lowe, “Distinctive Image Features from Scale Invariant Key Points,” International Journal of Computer Vision, pp. 91-110, 2004.

[4] K. Mikolajczyk and C. Schmid., “Scale and Affine Invariant Interest point Detectors,” International Journal of Computer Vision, pp. 63-86, 2004.

[5] David Lowe and David G., “Object Recognition from Local Scale-invariant Features,” International Conference on Computer Vision, pp. 1150–1157, 1999.

被引用紀錄

徐祥豐（2003）。一個用於MPEG-4視訊壓縮編碼之視訊物件分割法的評估準則〔碩士論文，中原大學〕。華藝線上圖書館。https://doi.org/10.6840/cycu200300471

國際替代計量

貝氏階層式結構於視訊監控之研究與應用

全文下載

主題瀏覽