自動化影像摘要是一項很有趣的研究,然而,要在影像之間做到自動化的檢測salient object並且兼顧效能與準確性不是件容易的事情。在本篇論文中,我們利用影像內容的自動化co-salient object detection,來取代傳統手動標籤物件的過程,進而達成可區分scene與scene間特徵物影像摘要的應用。為了兼顧效能與準確性,我們使用了Patch-base的方法,預先利用一部分訓練好的sparse的salient patches當成前處理的Patch資料庫,並在第一階段對影像間互相分析時,利用sparse資料庫patches間Kullback-Leibler divergence的演算法的相似性,這來初步找出co-saliency area。為了改善前處理所預先訓練的Patch資料庫可能對於影片間co-salient object的response並不完整,對於影像之中我們提出了一套update資料庫的機制,在相同的scene裡,由影片中saliency area裡面sparse coding所訓練出來的新的資料庫,來對於這個影像片段做第二階段的搜尋並嘗試找出更完整的co-salient object area。實驗結果證明,這套機制不僅可以對於有興趣的影像去自動化搜尋該影片中具相似性質的片段做排序,或是自動比對片段間共同擁有的特徵物件做標註當成片段的摘要都有很好的效果。
Automatic video annotation is a critical step for content-based video retrieval and browsing. Detecting the focus of interest such as co-occurring objects in video frames automatically can benefit the tedious manual labeling process. However, detecting the co-occurring objects that is visually salient in video sequences is a challenging task. In this paper, in order to detect co-salient video objects efficiently, we first use the preattentive scheme to locate the co-salient regions in video frames and then measure the similarity between salient regions based on KL-divergence. In addition, to update preattentive patch set for co-salient objects, sparse coding is used for dictionary learning and further discrimination among co-salient objects. Finally a set of primary co-salient objects can be found across all video frames using our proposed filtering scheme. As a result, a video sequence can be automatically parsed based on the detection of co-occurring video objects. Our experiment results show that the proposed co-salient video objects modeling achieves high precision value about 85% and reveals its robustness and feasibility in videos.