全景影片視覺顯著性預測與視覺偏差

全景影片已經被廣泛應用於沈浸式內容、虛擬導覽和監控系統等許多領域，相較於平面影片，全景影片涵蓋了更多的資訊，要在資訊爆炸的全景影像中預測出顯著性區域更為困難。本文中，我們提出了一個視覺顯著性預測模型，它可以直接預測等距長方投影影片中的顯著性區域。過去的方法採用循環神經網路的架構作為視覺顯著性預測模型，不同於過去的方法，我們使用三維卷積於編碼器並泛化SphereNet卷積核以構建解碼器。我們進一步分析存在於不同全景影片資料集以及不同類型全景影片中視覺偏差的資料統計性，這為我們提供了對融合機制設計的見解，該融合機制以自適應方式將預測的顯著圖與視覺偏差相融合。我們提出的模型在各個資料集（例如：Salient360!，PVS，Sport360）都有最佳的結果。

關鍵字

視覺顯著性預測；深度學習；全景影片

並列摘要

360◦ video has been applied in many areas such as immersive content, virtual tours, and surveillance systems. Comparing to the field of view prediction on planar videos, the explosive amount of information contained in the omnidirectional view on the entire sphere poses an additional challenge towards predicting highsalient regions in 360◦videos. In this work, we propose a visual saliency prediction model that directly takes 360◦videos in the equirectangular format. Unlike previous works that often adopted recurrent neural network(RNN) architecture towards the saliency detection task, in this work we utilize 3D convolution to a spatialtemporal encoder and generalize SphereNet kernels to construct a spatialtemporal decoder. We further study the statistical properties of viewing biases present in 360◦datasets across various video types, which provides us with insights towards the design of a fusing mechanism that incorporates the predicted saliency map with the viewing bias in an adaptive manner. The proposed model yields stateofthearts performance, as evidenced by empirical results over renowned 360◦visual saliency datasets such as Salient360!, PVS, and Sport360.

並列關鍵字

Visual Saliency Detection ； deep learning ； panorama videos

參考文獻

[1] M. Almquist, V. Almquist, V. Krishnamoorthi, N. Carlsson, and D. Eager. The Prefetch Aggressiveness Tradeoff in 360° Video Streaming, page 258–269. Association for Computing Machinery, New York, NY, USA, 2018.

Google Scholar

[2] Y. Bai and D. Wang. On the comparison of trilinear, cubic spline, and fuzzy interpolation methods in the highaccuracy measurements. IEEE Transactions on fuzzy Systems, 18(5):1016–1022, 2010.

Google Scholar

[3] A. Borji, H. R. Tavakoli, D. N. Sihite, and L. Itti. Analysis of scores, datasets, and models in visual saliency prediction. In Proceedings of the IEEE international conference on computer vision, pages 921–928, 2013.

Google Scholar

[4] Z. Bylinskii, T. Judd, A. Oliva, A. Torralba, and F. Durand. What do different evaluation metrics tell us about saliency models? IEEE transactions on pattern analysis and machine intelligence, 41(3):740–757, 2018.

Google Scholar

[5] Q. Chang, S. Zhu, and L. Zhu. Temporalspatial feature pyramid for video saliency detection. arXiv preprint arXiv:2105.04213, 2021.

Google Scholar

延伸閱讀

Huang, P. Y. (2018). 高效率不確定性預測應用於影像語意分割 [master's thesis, National Tsing Hua University]. Airiti Library. https://www.airitilibrary.com/Article/Detail?DocID=U0016-1803201914450775
陳瀚凱、管倖生（2012）。動態圖像信息在模糊邊緣與明度對比分析上之視覺評量方法研究。設計學報，17(1)，59-77。https://doi.org/10.6381/JD.201203.0060
林資婷（2019）。影像的知識與建構-來自心理距離與視線熱區的證據〔碩士論文，淡江大學〕。華藝線上圖書館。https://www.airitilibrary.com/Article/Detail?DocID=U0002-0207201921432600
Jiang, G. Y., Mao, X. Y., Yu, M., Peng, Z. J., & Shao, F. (2014). Visual Perception Based Objective Stereo Image Quality Assessment for 3D Video Communication. Research Journal of Applied Sciences, Engineering and Technology, 7(14), 2827-2837. https://www.airitilibrary.com/Article/Detail?DocID=20407467-201404-201507060016-201507060016-2827-2837
Ou, T. S. (2010). Perceptual-Based Rate Control for Video Coding [master's thesis, National Taiwan University]. Airiti Library. https://doi.org/10.6342/NTU.2010.00866

國際替代計量

全景影片視覺顯著性預測與視覺偏差

全文下載

主題瀏覽