自動化視訊分析、編輯及總覽

中文摘要隨著多媒體與通訊技術的發展，我們正進入一個被大量多媒體訊息籠罩的新紀元。如此龐大的訊息量已經超出了我們的負載能力，因此促使我們追求一些能夠找出重要訊息同時除去無用訊息的機制。在本論文中，我們研究能夠幫助我們對視訊進行有效總覽的自動化視訊分析與編輯技術。本論文分為五章。在第一章中，我們討論數位媒體時代的新興議題與我們的期待。在第二章中，我們先回顧我們過去對網球與棒球視訊的場景偵測與分類技術，進而考慮額外的訊息源以拓展此技術。我們利用音效與文字捕捉到網球場上觀眾的歡呼聲和棒球進行中的比賽狀態。這樣的新技術使我們可以有效地設定規則來評估每個時刻的重要性並產生有意義的摘要。在第三章中，我們提出一個在壓縮域中偵測慢動作重播的方案。由於慢動作重播幾乎出現在每一種運動，其便成為高度有效的摘要內容。基於慢動作重播片段展現出特殊的運動向量形式與畫面間差距的大變異度，我們使用運動向量的統計量和離散餘弦轉換區塊間的關係來快速找出慢動作重播區段。在第四章中，我們探討運動領域外的視訊分析技術的可行性。我們提出一個架構來偵測視訊中的主要臉，包括臉部追蹤、切割、分群，以及利用主要臉來深度分析視訊。這個架構促成了角色訂製的摘要與總覽。我們也提出一個以視覺感知差異為基礎的方法來過濾無用的電視廣告區段。第五章總結全文並討論未來的可行方向。

關鍵字

多媒體；視訊；運動節目內容分析；一般節目內容分析；臉部搜索與比對

並列摘要

With the rapid growth in multimedia and communication technology, we are brought into a new era surrounded by a great amount of multimedia data. Such an amount has exceeded our capacities, and we need mechanisms to find the important content and discard the less important content. In this thesis, we study the analysis and summarization of videos. This thesis is composes of five chapters. In Chapter 1, we discuss the emerging issues in the digital media world, and introduce our expected goals. In Chapter 2, we give a review to our previous works on tennis and baseball scene detection and classification, and expand the works by considering extra information sources. We make use of audio and text to achieve high-level demands, capturing the audience cheering on the tennis court, and the game status of a baseball game. Such capabilities enable us to define rules to evaluate the importance of each moment, and generate meaningful summaries. In Chapter 3, we present a scheme to detect slow-motion replay segments in the compressed domain. Slow-motion replay exhibits particular motion-vector patterns and larges variation in the frame difference. We use the statistics of motion vectors and relation among DCT blocks to determine the occurrence of slow-motion replay segments rapidly. As slow-motion replays appear in almost all kinds of sports, they are highly helpful in summarization. In Chapter 4, we explore the possibilities of video analysis outside the sports domain. We show a framework to detect the major faces in video, including tracking, chopping, and clustering, and use the major faces to analyze the video in depth. This framework enables the character-customized summarization. We also propose a method to filter out the undesired commercial segments by the visually perceptual difference. In Chapter 5, we conclude this thesis and discuss our possible future directions.

並列關鍵字

multimedia ； video ； content analysis ； sports video analysis ； generic video analysis ； face searching and matching

參考文獻

[R2-7] P. Xu, L. Xie, S.-F. Chang, A. Divakaran, A. Vetro, and H. Sun, “Algorithms and system for high-level structure analysis and event detection in soccer video,” in

Chapter 1

[R1-2] S.-F. Chang, T. Sikora, and A. Puri, “Overview of the MPEG-7 standard,” IEEE Trans. on CSVT, 2001.

[R1-4] W. Li, “Overview of fine granularity scalability in MPEG-4 video standard,”IEEE Trans. on CSVT, 2001.

[R2-1] D. Zhong and S.-F. Chang, “Structure analysis of sports video using domain models,” in Proc. ICME’01, 2001

國際替代計量

自動化視訊分析、編輯及總覽

全文下載

主題瀏覽