透過您的圖書館登入
IP:3.145.163.58
  • 學位論文

應用於運動影片以其內容為基礎的關注評價及賽程時況標記之研究

A Study on Content-based Attention Ranking and In-game Stats Tagging for Sports Videos

指導教授 : 黃仲陵 黃正能

摘要


由於視訊資料大量數位化,以致於數位資料量與日俱增,現今所制定之各種視訊及影像壓縮標準,使得數位資料能以最精簡的編碼容量來儲存,因此,資料的描述與儲存已不成問題,現階段最急迫的問題是如何提供使用者在最短的時間內有效率地找到最符合其需求的數位內容(digital content)。倘若所有的數位資料必須由人類親自去分類與標記,無形之中耗費了許多資源。有鑒於此,數位內容的自動分析,儼然成為一項重要的課題。我們嘗試去建立一座橋樑來連結電腦世界中的低階特徵與人類世界中的語意資訊之間的鴻溝(semantic gap)。能夠應用在即時的網路文字廣播及各式的後製分析及處理,有效取代舊有的人工分類、人工挑選精采片段等時間消耗量大、重覆性高的工作。我們的研究主題,嘗試去建構一個系統能夠對輸入影片自動做賽程時況的標記(in-game stats tagging)及語意分析(semantic analysis)並依照觀眾的感觀(perception)特性來對運動節目片段做關注評價(attention ranking)分析。近年來,由於許多台灣旅外運動選手的傑出表現,運動節目已成為時下最受歡迎的視訊節目。另一方向,由於運動節目具有高重覆性、高相關性的鏡頭特性,在分析處理時較為有利,所以我們選擇運動節目做為我們研究的範疇。 我們提出一個Attention Rank (AR)數值來表示一個視訊畫面吸引觀眾的可能性,AR結合視覺關注模組(Visual Attention Model, VAM)及文字關注模組(Contextual Attention Model, CAM)來完成推導,並配合攝影機移動關注模組(Camera Motion Attention Model)作權重的調整。我們以物體為基礎的方法來表示每一個參與畫面的物體平均貢獻到該畫面對觀眾的吸引力值。其中VAM又可以分為空間域、時間域及人臉特徵。而對於文字語意上的特徵,我們建構一個文字關注模組CAM來模擬使用者對於賽程的感興趣程度。CAM所使用之統計資料是由記分板(Superimposed Caption Box, SCB)取得重要的賽程統計數據。我們提出一個方法來推演這些統計資料吸引觀眾的程度高低,統計資訊對於觀眾感興趣程度可以分成,正比、反比、及特定情況,再利用多個文字映像矩陣來取得AR。除此之外,使用者的回饋訊息我們也加以參考,以利推演每一種特徵對使用者的關注指數,可增進搜尋視訊資料的精確度。 運動節目中,有許多重要的資訊會利用即時的記分板(Superimposed Caption Box, SCB)嵌於畫面的某一角落,以提供觀眾瞭解目前比賽情況,而記分板上的資料是屬於最精簡而又最重要的資訊。因此,SCB內的資訊對於我們分析運動節目的內含是不可或缺的一個元素。不同類型的運動節目,所顯示的內容也有所不同,呈現的方式、位置、圖樣也會隨著電視公司而異。雖然目前有相當多的研究學者愈來愈重視計分板上的資訊,但幾乎所有的資訊萃取方法都是由人工給定數字的位置及其含意,即給定一個樣板(template),但如此只是變相的文字辨識(OCR)問題,對於萃取高階資訊的貢獻不大,況且,隨著不同的影片輸入,就必須重新建構一次樣板,這是十分費時也是缺乏彈性的作法。因此我們提出一個一般性(general)的演算法。適用於各種的運動節目並不限記分板的樣式。記分板的呈現方式包括有字元(character)及符號(symbol),對於偵測及定位這些字元與符號的位置及大小並不困難,困難點在於如何去給定它們的高階語意,例如,某一個數字物件(digital object)我們利用特定的文字定位及文字辨識來得知為”6”,但我們卻無法得知其意義為何。經由我們的觀察可以發現幾個記分板在於顯示字元資料的規則性:(1)某一個數字物件僅能歸類到單一提示物件(annotative object),(2)隸屬於相同一類的數字物件與提示字物件在擺設上會依循特定的相對關係,如水平或垂直。(3)並非所有的數字物件會伴隨一個提示物件。籍由以上三點的規則性,我們利用在處理空間標記(labeling)相當著名的Relaxation Labeling演算法來解決我們的問題。

並列摘要


The demand for multimedia applications is increasing even beyond the capabilities of best-effort transmission networks. Therefore, the trend is toward constructing a content-oriented multimedia server that is capable of handling high volumes of content as well as of fulfilling high performance and various user preference requirements. Researchers have been trying to integrate context and content for multimedia mining and management, which is crucial for multimedia communication. The attention analysis of multimedia data is challenging since different models have to be constructed according to different attention characteristics. Effectively measuring the user attention on the videos is an important task in many multimedia applications, including multimedia information retrieval, users-content interaction, and multimedia searching. This thesis analyzes how people are excited about the watched video content and proposes a content-driven attention ranking strategy which enables the client users to iteratively browse the video according to their preference. The proposed attention rank (AR) algorithm, which is extended from the Google PageRank algorithm that sorts the websites based on the importance, can effectively measure the user interest (UI) level for each video frame. The degree of attention is derived by integrating the object-based visual attention model (VAM) with the contextual attention model (CAM), which can reliably take advantage of the human perceptual characteristics, and effectively identify the user-attentive video content. This thesis presents a method to integrate the content and context for sports video understanding. On one hand, visual information is the most intuitive feature of the human perception system. Modeling the visual attention provides a good solution for a better understanding of the video content. The considered visual features includes spatial, temporal, and facial feature maps. On the other hand, the game stat information in the sports video is the most of the subscribers are interested in. The captions embedded in sports video programs represent the key information of the video content. Taking advantage of prior implicit knowledge about sports videos, we proposed an automatic context extraction and interpretation system that can be used to tag the in-game stats for providing the on-going game situation to subscribers. The information of users’ feedback is utilized in re-ranking procedure to further improve the retrieving accuracy. A higher AR represents a stronger user interest. The AR is affected by two factors: intra-AR and inter-AR. In a frame-based analysis, the intra-AR of each frame is based on its visual and contextual attention characteristics. If there are many high-attention objects contained in a frame with a high-interest contextual description, it is highly probable that that frame has a high AR. From an event-based analysis scenario viewpoint, the inter-AR of each frame is affected by the relevant key-frames which are located in the same event.

參考文獻


[1] S.-F. Chang, A. Eleftheriadis, and D. Anastassiou, “Development of Coloumbia’s Video on Demand Tested,” Signal Processing: Image Commun., vol. 8, pp. 191–208, 1994.
[2] R. Boutaba and A. Hafid, “A Generic Platform for Scalable Access to Multimedia-on-Demand Systems,” IEEE Journal on Selected Areas in Communications, vol. 17, no. 9, Sep. 1999.
[3] S. Cherry, “Winner: The Battle for Broadband,” IEEE Spectrum, Jan. 2005.
[6] M. Abdel-Mottaleb, and S. Krishnamachari, “Multimedia Descriptions Based on MPEG-7 Extraction and Applications,” IEEE Trans. on Multimedia, vol. 6, no. 3, pp. 459-468, June 2004.
[7] J. Fan, A. K. Elmagarmid, X. Zhu, W. G. Aref, and L. Wu, “ClassView: Hierarchical Video Shot Classification, Indexing, and Accessing,” IEEE Trans. on Multimedia, vol. 6, no. 1, pp. 70-86, Feb. 2004.

被引用紀錄


黃雅鳳(2006)。以貝氏網路為基礎之能力指標測驗編製及補救教學動畫製作–以六年級數學領域之「分數小數」相關指標為例〔碩士論文,亞洲大學〕。華藝線上圖書館。https://www.airitilibrary.com/Article/Detail?DocID=U0118-0807200916282748
張見銘(2007)。融合不同專家貝氏網路優勢進行國小六年級數學領域代數之適性學習系統研發〔碩士論文,亞洲大學〕。華藝線上圖書館。https://www.airitilibrary.com/Article/Detail?DocID=U0118-0807200916282513

延伸閱讀