透過您的圖書館登入
IP:18.223.172.252
  • 學位論文

應用語法搜尋於電影採礦之設計

The Designing of a Syntax-based Retrieval System for Mining Movies

指導教授 : 郭經華

摘要


本系統最主要是提供一個可查詢語法的電影檢索系統。英文老師可以利用此系統來編制教材,提供給學生學習日常生活中常會用到的一些語法。為了提供語法查詢的功能我們必須先將電影字幕做一些前處理,例如:將字幕做詞性加註、詞性還原且將詞性加註 和詞性還原 後的資訊存成可擴充標示語言格式提供正規語言表示比對。為了提供一個完整包含語法搜尋結果的電影片段,系統也利用了一個簡單的圖片相似度的方法來實做場景偵測。 當我們利用正規語言表示來當作我們的查詢語言,正規語言表示比對將會耗費相當多的時間。因此,我們將電影字幕建置索引來降低正規語言表示所要比對的句子個數。關於索引建置,我們是利用單字字元的索引建置方法,此方法最主要包含了單字字元切割、有效索引與無前置後置集。此外,電影場景偵測部分,我們利用了連續兩張圖片的相似度來判斷是否有場景變化的發生。 在系統的實做的過程中,我們比對了未做索引、單字字元切割完後的索引 與無前置後置集的索引的數量與搜尋的時間,經過了實驗數據的分析與探討,充分驗證了當我們做完了無前置後置集後的索引 對於降低索引 的個數有著相當大的幫助。因此,當索引 數量降低,正規語言表示 比對所要花的時間相對的也降低了。在此一電影檢索系統中,單字字元的建置便顯的相當的重要,此也是本論文對於搜尋大量資料的索引建置的主要貢獻。

並列摘要


This paper will discuss how to build a movie retrieval system which can search English Grammar. English Teachers can design the teaching materials by this system. The teaching materials can provide some grammar examples which are used in daily life for students to learn. To achieve searching the English grammar in the movies, the movie subtitles will be processed before user’s query. For example, the movie subtitles will be processed by POS tagging、Lemmatizatize,and the information of POS tagging and Lemmatization will be saved to be XML Format. To provide a movie clip with the syntax result, our system also detects movie scene change which is implemented by the image similarity. When we use the regular expression as the query language, it will cost much time to match pattern. Therefore, we build the index of the movie subtitles to reduce the searching time. About the index construction, we use the k-gram indexing to be our approach which contains k-gram indexing、Useful index and Presuf-free set。Besides, we use the similarity of two continuous frames to detect the scene change. To test the actually system, we compare the searching time and the number of syntax result which is searched by the full、complete and the presuf-free indices. After examining and analyzing the results, we concluded through expand by sense, we could reduce the number of the indices and the searching time by constructing the k-gram indexing.. In this paper, we show how to construct the k-gram indexing before users search has a concrete contribution to the area of large database systems

參考文獻


[1] Jane King, “Using DVD Feature Films in the EFL Classroom,”,Computer Assisted Language Learning,Vol. 15, No. 5, pp 509-523, 2002.
[2] Erwin Tschirner, “Language Acquisition in the Classroom: The Role of Digital Video,” Computer Assisted Language Learning, Vol. 14, No. 3-4, pp 305-319, 2001.
[5] XML Path Language .Http://www.w3c.org/TR/Xpath.
[6] XML Path Language .Http://www.w3c.org/TR/Xquery.
[10] S.Park and H.J.Kim,”SigDAQ:an enhanced XML Query optimization technique ,” Journal of Systems and Software Vol 61,Issue:2,pp-91-103,March 15,2002 .

延伸閱讀