劇本導向影像摘要方法之研究

自動影像摘要近年來引起廣泛的討論。相關研究可細分為兩類，一為以靜態圖片為基礎之影像摘要，一為動態影片之影像摘要。近年來因電腦計算能力及媒體儲存容量快速增長，動態影片摘要可快速產生。然而先前並無劇本導向影像摘要之研究及多部影片摘要之相關實驗。本論文將探討針對使用者需求所產生之多部影片影像摘要。我們採用語言方面之資訊和場景變化之資訊把影片分段。接著資訊檢索系統根據使用者需求找出相關之影片片段。影片sub-shot分群之結果用來衡量一影片片段視覺上新穎之程度。而結合這兩種分數可使我們選擇資訊上相關及視覺上生動的影片。為了達到視覺上的平順，片段的重新排列亦被我們所考慮。我們分析了每個片段的影像內容。根據導演的節奏及一些經驗法則，我們提出了一可達到視覺平順之演算法。在實驗中證明在我們提出的四個演算法中，此種方法最能有效達到視覺平順，而不失去相關的資訊。

關鍵字

影像摘要

並列摘要

Automatic video summarization methods have attracted research attentions for a long time. Previous works can be classified into two categories: keyframe-based video summarization and dynamic video summarization. Recently the rapid growth of computing power and storage capacity make it possible to generate dynamic video summaries much faster. However there is no previous work on generating video summaries according to specific user information needs and experiments on a multi-video environment. In this thesis we will explore the problem of script-based video summarization, in which the information needs are contained in a user script. We first use linguistic information and shot boundary detection results to divide videos into segments, which are the foundation stones of building the summary. Then information retrieval system retrieves relevant segments using the user script as queries, and captions of the segments as documents. After sub-shot clustering, visual importance scores are evaluated for each segment based on the clustering results of its constituent sub-shots. The relevant score and the visual importance score are combined to select both informative and vivid segments. To achieve better coherence, segment re-ordering is applied. We analyze the audio and video content, finding the editing rhythm and editing heuristics, and then develop an algorithm for visual coherence. Experiments show that this algorithm has better coherence compared with other text-based algorithm, without loss of informativeness.

並列關鍵字

Video ； Summarization

參考文獻

[2] C. Becchetti and L. R. Ricotti. Speech recognition, John Wiley & Sons, 1999.

[4] A. D. Bimbo. Visual information retrieval, Morgan Kaufmann, 1999.

[5] Y. L. Chang, W. Zeng, I. Kamel, and R. Alonso. “Integrated image and speech analysis for content-based video indexing.” Proceedings of IEEE International Conference on Multimedia Computing and Systems, pp. 306-313, Hiroshima, Japan, June, 1996.

[13] A. Hanjalic and H. J. Zhang. “An integrated scheme for automated video abstraction based on unsupervised cluster-validity analysis.” IEEE transactions of Circuits and System for Video Technology, vol. 9, no. 8, pp. 1280-1289, 1999.

[14] C. N. Li and S. A. Thompson. Mandarin Chinese – a functional reference grammar, Berkeley: University of California press, 1981.

國際替代計量

劇本導向影像摘要方法之研究

全文下載

主題瀏覽