OmniScribe: 360°影片沉浸式口述影像創作系統

視障者通常會藉由，具正常視力的口述者經過認知、選擇、進而描述影片中的關鍵視覺要素後，所製作成的口述影像，利用聆聽的方式去理解原始影片的內容。 360° 影片是一種新興的影像傳播形式，觀影者能透過環繞的畫面獲得身歷其境的體驗。然而，360° 影片全方位的特性使得口述者難以獲悉整體的視覺內容和剖析空間細節，而這些資訊往往是視障者建立沉浸感的重要因素。透過與專業口述師的討論與發想，我們確立了數個描述 360° 影片的關鍵挑戰，並藉此一步步設計出了OmniScribe──一套致力於輔助口述者，為 360° 影片創作沉浸式口述影像的專業系統。 OmniScribe使用AI生成的影像內容感知疊層，幫助口述師更精確地掌握 360° 影片的內容以及細節；此外，OmniScribe使口述師能夠為視障者製作「空間化口述」和標示「沉浸式標籤」，進而讓閱聽者通過我們開發的手機app，享受沉浸式的口述影像。在一項共計 11 位新手和專業口述師參與的實驗中，我們展示了OmniScribe在優化創作口述影像工作流程的價值；此外，我們透過一項共計 8 位盲人參與的實驗，初步證實了利用OmniScribe製作的口述影像相較於標準口述，更能使視障者在 360° 影片中獲得沉浸感。最後，我們在文末討論了促使 360° 影片更加泛用的設計方向。

關鍵字

360°影片；口述影像；多媒體；盲人；視力障礙；資訊聲音化；電腦視覺

並列摘要

Blind or visually impaired (BVI) people typically access videos via audio descriptions (AD) crafted by sighted describers who comprehend, select, and describe crucial visual content in the videos. 360° video is an emerging storytelling medium that enables immersive experiences that people may not possibly reach in everyday life. However, the omnidirectional nature of 360° videos makes it challenging for describers to perceive the holistic visual content and interpret spatial information that is essential to create immersive ADs for blind people. Through a formative study with a professional describer, we identified key challenges in describing 360° videos and iteratively designed OmniScribe, a system that supports the authoring of immersive ADs for 360° videos. OmniScribe uses AI-generated content-awareness overlays for describers to better grasp 360° video content. Furthermore, OmniScribe enables describers to author spatial AD and immersive labels for blind users to consume the videos immersively with our mobile prototype. In a study with 11 professional and novice describers, we demonstrate the value of OmniScribe in the authoring workflow; and a study with 8 blind participants reveals the promise of immersive AD over standard AD for 360° videos. Finally, we discuss the implications of promoting 360° video accessibility.