具語意基礎之電影與運動影片內容分析及組織

將內容分析技術推向語意層級是近年來在多媒體領域中急速發展的研究課題。此類技術的分析結果較能符合使用者的需求，也讓內容管理與應用變得更加有效率。有別於傳統以內容為基礎的檢索技術，數位內容語意分析結合圖型識別、機器學習的技術與特定製作原則、領域知識來彌合低階特徵值與高階語意之間的鴻溝。基於機器學習與圖型識別的技術，已有許多系統結合不同分類器、不同特徵值、或不同媒體型態的結果來進行語意分析。在本論文中，我們提出一個通用的架構來進行此類研究。其中，我們引入介於視聽特徵值與語意概念之間的中介資訊來輔助分析。我們發展了三個不同的系統，在電影、棒球影片、以及一般的運動影片中進行語意概念偵測。在動作電影中，我們透過聲音的資訊來偵測槍戰與飛車追逐等語意概念。我們採用統計方法來描述概念以及對應不同層次的語意。在棒球比賽中，我們基於畫面與語音的資訊，結合了以規則為基礎與以模型為基礎的方法來做語意概念偵測。總計有十三種不同的概念，如一壘安打、二壘安打、全壘打、三振等，可被偵測出來，也藉此我們可發展許多實際的應用。在一般的運動影片中，我們提出可用球的軌跡來輔助內容分析。一些新型態的語意概念，如棒球比賽中投手的球種，可因此被描述與偵測出來。這三大類研究都是基於我們所提的通用架構，也因此證實了此架構對於語意概念偵測的實用性。

關鍵字

語意分析；影片分析與組織；事件與概念偵測；視訊檢索

並列摘要

Conducting content analysis approaching semantics level is an emerging trend in multimedia researches. Such kind of analysis matches users’ needs and facilitates content management and utilization in a more effective and reasonable way. Unlike conventional content-based retrieval or indexing, works on semantics analysis integrate techniques of statistical pattern recognition and machine learning with specific production rules or domain knowledge to bridge the semantic gap between low-level features and high-level semantics. On the basis of machine learning and pattern recognition technologies, systems that combine analytical results from different classifiers, different features, or different modalities are developed. In this dissertation, we propose a general framework that introduces the idea of mid-level representation between audiovisual features and semantic concepts. Two types of techniques, i.e. statistical pattern recognition and rule-based decision, are combined to facilitate narrowing the semantic gap. We develop three systems that respectively conduct semantic concept detection in action movies, in broadcasting baseball games, and in sports videos. In action movies, we detect semantic concepts, such as gunplay and car-chasing scenes, through analyzing aural information. Statistical approaches are exploited to characterize concept modeling and to facilitate mapping between different semantic granularities. In baseball games, visual and speech information are combined, and a hybrid method that includes rule-based and statistical techniques is designed for semantic concept detection. Thirteen semantic concepts, such as single, double, homerun, and strikeout, are explicitly detected, and several realistic applications can therefore be built. In general sports videos, we extract the ball trajectory to be a new type of metadata for describing content characteristics. Some novel semantic concepts, such as pitch types in baseball games, can therefore be modeled and detected. These studies are the instances of the proposed general framework and demonstrate the realization of automatic semantic concept detection.

並列關鍵字

semantic analysis ； video analysis and organization ； event and concept detection ； video indexing

參考文獻

[Adai02] Adair, R.K., “The physics of baseball,” Harper Collins, New York, 2002.

[ASQA06] ASQA, Academia Sinica Question Answering System,

[Arij91] Arijon, D., “Grammar of the film language,” Sliman-James Press, 1991.

[Bach05] Bach, N.H., Shinoda, K., and Furui, S., “Robust highlight extraction using multi-stream hidden Markov models for baseball video,” Proceedings of IEEE International Conference on Image Processing, vol. 3, pp. 173-176, 2005.

[Baba04] Babaguchi, N., Kawai, Y., Ogura, T., and Kitahashi, T., “Personalized abstraction of broadcasted American football video by highlight selection,” IEEE Transactions on Multimedia, vol. 6, no. 4, 2004, pp. 575-586.

國際替代計量

具語意基礎之電影與運動影片內容分析及組織

全文下載

主題瀏覽