透過您的圖書館登入
IP:3.17.184.90
  • 學位論文

運用機器學習結合影像時空特徵之視覺注意力模型研究

Learning-Based Fusion of Spatiotemporal Visual Attention Cues for Video

指導教授 : 陳宏銘

摘要


視覺注意力是一項人類視覺系統中的重要特色,它可以幫助影像處理與壓縮的技術做的更好。 在本論文中,我們提出一個可從影像中擷取低階和高階特徵之計算機模型,並以機器學習的方式結合這兩階特徵,進而達到預測影像中人眼視覺注意力之分布情形,其中,低階特徵(色彩、方向、運動)之採用是基於對人類視覺細胞的研究,而作為高階特徵的人臉之採用是基於對人類溝通模式的研究,實驗結果證實此兩階特徵整合性模型之整體表現會比只用單階特徵之模型來的更為穩定。 在過去預測注意力分布之模型中,被預測顯著的區域會有和實際人眼注視位置有所誤差的情形發生,對此我們提出之模型能夠學習特徵和被視覺注意區塊之間的關係,並利用此關係進而減少潛在誤差情形的發生,另一方面,為了增進此模型之學習效能,我們會根據人眼注視分佈的情況,選出具代表性之訓練樣本。實驗結果證實本篇研究所提出之模型可有效地預測人眼注意之分布。

並列摘要


Visual attention is an important characteristic of human visual system, useful for image processing and compression. This paper proposes a computational scheme that adopts both low-level and high-level features to predict visual attention from video signal. The low-level and high-level features are fused by using machine learning. The adoption of low-level features (color, orientation, and motion) is based on the study of visual cells, whereas the adoption of human face as a high-level feature is based on the study of media communications. We show that such a scheme is more robust than those using purely single low- or high-level features. Unlike conventional techniques, our scheme is able to learn the relationship between features and visual attention to avoid perceptual mismatch between the estimated saliency and the actual human fixation. We also show that selecting the representative training samples according to the fixation distribution improves the efficacy of regressive training. Experimental results are shown to demonstrate the advantages of the proposed scheme.

參考文獻


[1] C. Maioli, I. Benaglio, S. Siri, K. Sosta, and S. Cappa, “The integration of parallel and serial processing mechanisms in visual search: Evidence from eye movement recordings,” European Journal of Neuroscience, vol. 13, pp. 364–372, Jan. 2001.
[2] J. M. Findlay, “Saccade target selection during visual search,” Vision Research, vol. 37, pp. 617–631, 1997.
[3] G. Rizzolatti, L. Riggio, I. Dascola, and C. Umilta, “Reorienting attention across the horizontal and vertical meridians: evidence in favor of a premotor theory of attention,” Neuropsychologia, vol. 25, no. lA, pp. 31-40, 1987.
[4] Y. S. Wang, C. L. Tai, O. Sorkine, T. Y. Lee, “Optimized scale-and-stretch for image resizing,” ACM Trans. Graph, vol. 27, no. 5, 2008.
[5] H. Li and K. N. Ngan, “Saliency model-based face segmentation and tracking in head-and-shoulder video sequences,” Journal of Visual Communication and Image Representation, vol. 19, no. 5, 2008.

延伸閱讀