一般在使用影音光碟時,只能按照章節來觀看,而不能對影片內容作檢索, 來找到想要看的片段。於是我們建立一個影片字幕檢索系統,希望能對影片內容 做一些搜尋,使用的影片是臺大文學講座系列影片。 臺大文學講座系列影片為臺大出版中心將近代文學的作家,如白先勇、葉維 廉、葉嘉瑩、高行健…等,於臺大演講的情況錄製成影音光碟,主要內容為大師 們文學創作的經歷,以及對文學、美學的想法。此系列光碟大部分含有演講手冊, 為了讓使用者在看到演講手冊中有興趣的部分時,能快速找到影片中的該片段, 而設計了此套字幕檢索系統。由於影片內容皆是演講,因此對字幕檢索也就是對 影片內容做檢索。 我們首先利用esrXP 取出包含字幕的圖片,並利用Microsoft Office Document Imaging 中的OCR 功能來辨識字幕圖片,將辨識結果送回esrXP 製作成字幕檔, 來取得字幕文字與字幕時間;並且利用最長共同子序列計算字幕與演講手冊句子 的相似度,來知道字幕與句子的對應關係,進而得到字幕的發言者以及字幕對應 到的演講手冊句子。 接著建立一個網站系統,利用HTML5 的video 標籤,讓使用者只要使用支援 HTML5 的瀏覽器即可觀看影片;在搜尋字幕以及觀看影片的時候,也可以看到當 下字幕所對應的演講手冊句子,而給予使用者更多資訊。另外,我們還引入多維 度的後分類導覽方式,幫助使用者能對搜索結果做更進一步的篩選。
When we watch videos with video discs like DVDs or VCDs, we can only watch by chapters. We cannot do some search on the content of video. So we provide a retrieval system for video subtitles, and hopefully do some progress on searching the video content. NTU Literary Lecture Series published by National Taiwan University Press are videos of speech giving by some modern literature writers in Taiwan. There are videos on DVDs and a speech manual for every video in NTU Literary Lecture Sreies. People may read speech manuals to scan the content of videos quickly. When people find a interesting paragraph and want to watch the part of video, they cannot easily do that. To solve this problem, we create the subtitle files of videos by esrXP which captures pictures of subtitles and Microsoft Office Document Imaging which does OCR on pictures to get the text of subtitles. Additionally, we match subtitles to the sentences in speech manual for giving more information to users. Then we access videos through web. By using video tag of HTML5 on webpage, users can easily watch the videos without any plug-in if they use HTML5-supported browsers like Google Chrome and Mozilla Firefox. When users watch videos, the sentences correspond to the subtitle will be displayed below the player. It will provide more information to users on selecting subtitles. We also provide the function of post-classification to users for filtering the retrieval results.