透過您的圖書館登入
IP:3.142.196.27
  • 學位論文

視覺注視預測與影像顯著區偵測之高等演算模型

Advanced Computational Models for Human Fixation Prediction and Visual Saliency Detection

指導教授 : 林巍聳
若您是本文的作者,可授權文章由華藝線上圖書館中協助推廣。

摘要


視覺注視預測和影像顯著區偵測技術可應用於物體偵測、物體追蹤、影像壓縮、影像分割等。視覺注視預測技術的研究,著重於探討人類眼睛注視的位置,在實驗上受試者會被要求觀察一些影像,並以眼動追蹤設備記錄眼睛的凝視點和掃視路徑,從而歸納出特定的規則,這些實驗結果顯示,人眼注視的位置具有隨機特性,注視的位置包括顯著的區域和一些分散的孤立點。影像顯著區偵測技術的研究,著重於偵測影像中顯著的物體或區域,在實驗上,研究者經常要求受試者在一些影像中,標記出吸引他們注意的物體,這些實驗結果顯示影像顯著區大都是一些具有鮮明邊界且緊密的區域。然而,文獻所記載的有關這兩種視覺注意力的演算模型都只有很有限的預測和偵測能力,主要原因有三個,其一為像素點與像素點之間的關連性太複雜,其二為不容易定義出有代表性的特徵,其三為缺少有效的特徵結合方式。本論文以人類的視覺行為和影像的基本特性為基礎,分別發展視覺注視預測和影像顯著區偵測的高等演算模型。在視覺注視預測模型方面,我們發展了一個名為『基於機率的機器學習視覺注意力』的模型,基於對人類視覺注視行為的四個基本假設,透過貝式機率推導出特徵偏好、特徵分佈與位置偏好這三種影響注視點分佈的特性。我們在不同的尺度下計算亮度、顏色、方向的中央週圍對比做為特徵,並以支持向量機來計算特徵偏好。我們使用消息理論與相似度計算來計算在影像中的特徵分佈。我們統計所有影像中注視點的分佈來計算位置偏好。最後這融合這三項特性來達到良好的視覺注視預測。在影像顯著區偵測模型方面,我們發展了一個名為『使用區塊的中階特徵之顯著區域偵測』的模型,首先使用超級像素法將影像分割為小區塊,再從這些小區塊之間的相互關係中擷取出相似度、緊密度、影像邊界相關度做為特徵,透過支持向量機的學習以及結果的平滑化,以極少的特徵數達到良好的顯著區域偵測。實驗結果顯示發展的兩個模型有非常良好的注視預測能力和顯著偵測準確度。

並列摘要


Human fixation prediction and salient region detection are useful techniques which have many applications such as object detection, object tracking, image compression, and image segmentation. The human fixation prediction is the discussion of where human looks at. In the experiments, subjects were asked to take a free-looking task, eye-tracker equipment was used to record the subjects’ eyes fixations and saccades, and then the rules and patterns were obtained by analyzing the recording data. The results of these experiments show that human eyes have random movement tendencies, the fixations are not only located in salient regions but also in some isolated points. The salient region detection, on the other hand, is the discussion of how to detect dominant objects or regions in images. In the experiments, subjects were usually asked to mark the objects in images which attract their attention. The results of the experiments demonstrate that salient regions are usually compact and having clear boundaries. However, existing computational models of these two types of visual attention have limited ability to perform perfect predictions and detections, due to three main reasons. First, the relationship between pixels and pixels is complex. Second, representative features are hard to define. The third issue is the lack of effective combination methods. In this thesis, advanced computational models of human fixation prediction and salient region detection are developed based on human visual behaviors and image basic properties. In human fixation prediction, we develop a model called Probability-based visual Attention using machine Learning (PAL), which is based on four basic assumptions of human fixation behavior, and derives three properties including feature-prior, feature-distribution, and position-prior using Bayesian principle. Center-surround of intensity, colors and orientations in multi-scale are used as features, and support vector machine is used to compute the feature-prior. Information theory and similarity computation are used to compute feature-distribution in image. Statistics of fixations distribution in many images is used to represent position-prior. Finally the saliency maps are established by fusing these three properties. In salient region detection, we develop a model called Block-based Saliency detection using Mid-level features (BSM), which uses superpixels method to segment images into small blocks as a pre-process, and extracts three representative features including uniqueness, compactness, and boundary information from the relationship of the blocks. Finally only a few features are combined using support vector machine and the results are smoothed by a refinement method to obtain the saliency maps. Experiments show the excellent abilities of predicting human fixations and detecting salient regions on these two models.

參考文獻


[Achanta, 2012] Achanta R, Shaji A, Smith K, Lucchi A, Fua P, Susstrunk S. "SLIC superpixels compared to state-of-the-art superpixel methods". Pattern Analysis and Machine Intelligence, IEEE Transactions on 2012, 34(11):2274-2282.
[Avidan, 2007] Avidan S, Shamir A. "Seam carving for content-aware image resizing". ACM Transactions on graphics (TOG) 2007, 26(3):10.
[Avraham, 2010] Avraham T, Lindenbaum M. "Esaliency (Extended Saliency): Meaningful Attention Using Stochastic Image Modeling". IEEE Transactions on Pattern Analysis and Machine Intelligence 2010, 32(4):693-708.
[Bian, 2009] Bian P, Zhang L. "Biological plausibility of spectral domain approach for spatiotemporal visual saliency". in: Advances in Neuro-Information Processing. edn.: Springer; 2009: 251-258.
[Bian, 2010] Bian P, Zhang L. "Visual saliency: a biologically plausible contourlet-like frequency domain approach". Cognitive neurodynamics 2010, 4(3):189-198.

延伸閱讀