視覺注意力是一項人類視覺系統中的重要特色,它可以幫助影像處理與壓縮的技術做的更好。 在本論文中,我們提出一個可從影像中擷取低階和高階特徵之計算機模型,並以機器學習的方式結合這兩階特徵,進而達到預測影像中人眼視覺注意力之分布情形,其中,低階特徵(色彩、方向、運動)之採用是基於對人類視覺細胞的研究,而作為高階特徵的人臉之採用是基於對人類溝通模式的研究,實驗結果證實此兩階特徵整合性模型之整體表現會比只用單階特徵之模型來的更為穩定。 在過去預測注意力分布之模型中,被預測顯著的區域會有和實際人眼注視位置有所誤差的情形發生,對此我們提出之模型能夠學習特徵和被視覺注意區塊之間的關係,並利用此關係進而減少潛在誤差情形的發生,另一方面,為了增進此模型之學習效能,我們會根據人眼注視分佈的情況,選出具代表性之訓練樣本。實驗結果證實本篇研究所提出之模型可有效地預測人眼注意之分布。
Visual attention is an important characteristic of human visual system, useful for image processing and compression. This paper proposes a computational scheme that adopts both low-level and high-level features to predict visual attention from video signal. The low-level and high-level features are fused by using machine learning. The adoption of low-level features (color, orientation, and motion) is based on the study of visual cells, whereas the adoption of human face as a high-level feature is based on the study of media communications. We show that such a scheme is more robust than those using purely single low- or high-level features. Unlike conventional techniques, our scheme is able to learn the relationship between features and visual attention to avoid perceptual mismatch between the estimated saliency and the actual human fixation. We also show that selecting the representative training samples according to the fixation distribution improves the efficacy of regressive training. Experimental results are shown to demonstrate the advantages of the proposed scheme.