物件切割在許多的先進的應用中扮演著重要的腳色,例如人機互動的運用、車輛監控與視訊壓縮等都需要倚賴這樣的技術。本篇論文運用時域、空間域的資訊以及區塊的追蹤技術,從影像中擷取出重要的物件。首先,基於資訊可以互相補償的特性,我們採用了顏色、邊緣偵測、像素移動向量與核心技術模組來擷取影像中的物件。由於這些資訊可以互相補償彼此的缺點,所以即便是在影片因為像機晃動的情況下,也有不錯的切割效果。另外根據顏色的差異程度的多寡,可將相似的像素合併成為區塊,並透過區塊相鄰圖的分析,來合併的有相似特性的區塊以達到物件切割的效果。對於區塊的合併,我們採用貝氏歸納法則,找尋最佳的合併組合。最後的核心技術分析,則是結合了蒙地卡羅法則與機率統計的觀點在連續的圖片中追蹤物件的變化,其優點是可以準確定位物件的位置並提供有效的物件資訊來補償上述技術的不足,近而增加切割的正確性。詳細的核心分析技術與結果在第二、三章會有詳細的討論。 實驗結果證明,我們所提出的方法,對於物件切割的技術,有相當不錯的效果。
Video object segmentation plays an important role in many advanced application such as human-computer interaction, video surveillance, content-based video coding. In this paper we proposes a semantic video object segmentation system which combines spatiotemporal video segmentation and region tracking together to extract important semantic objects from videos. At beginning, the paper uses multiple cues to segment video frames to different regions. The cues include color, edges, motions, and kernel-based models. Since these features are complementary to each other, all desired regions can be well segmented from input frames even though they are captured from a non-stationary camera. Then, according to spatial information of each segmented region, we can construct a region adjacency graph (RAG) which can well record the relative relations between each region. Based on the RAG, we propose a Bayesian classifier which can group regions by properly checking their spatial and temporal similarities such that different regions will be merged and associated together to form a meaningful object. Since we include a kernel-based analysis into the designed classier, all desired semantic objects can be well extracted from video sequences. The kernel-based analysis can provide rich information for segmenting semantic objects if they are still in the background and cannot be identified using other features like motions. Experimental results have proved the superiority of the proposed method in object segmentation.