  • 學位論文


Improving the Syntax-base Retrieval System Using Collocation Indexing

指導教授 : 郭經華


在本論文中主要是設計一個語法查詢的系統,其中將語法搜尋系統結合電影,變成一個可查詢語法的電影檢索系統。這樣的一種結合將有助於英文老師的教學,跳脫以往對於語法死板教學的印象,透過本系統所提供的語法查詢系統,找出符合的電影場景與對白,增加學生學習的興趣。 要達到本系統所提供的語法查詢,必須先對電影的字幕做一些前置作業,例如:對字幕做詞性加註、詞性還原。經過詞性加註和詞性還原之後的電影對白,我們會利用可擴充標示語言格式將這些資訊保存起來,以提供搜尋引擎使用正規語言表示法去做比對。 由於我們利用正規語言表示法當作我們的查詢語言,因此在查詢語句做比對的時候將會耗費較多的時間。為解決這樣的問題,我們替電影字幕建置了索引表來降低正規語言所要比對的句子個數。在索引建置的部份,我們使用了兩種方法,其一是單字字元索引建置方法,此法包含了單字字元切割、有效索引與無前置後置集;其二是搭配詞索引,此法則包含了,搭配詞的建立及搭配詞的過慮。 在系統實做的部份,我們針對上述兩種索引表進行比較,比較加入了搭配詞索引表是否可以有效的改善系統搜尋的效能。而實驗結果顯示,使用了搭配詞索引表,的確可以有效的降低搜尋引擎所要比對的正規語言句子的個數。當索引值降低,正規表示法所需要比對的句數變少,則系統所要花的搜尋時間也就相對的降低了。因此在本論文中所提出的搭配詞索引概念,對系統效能的提升有正面的幫助,這也是本論文在索引建置與系統加速上主要貢獻。


The purpose of this paper is to design a syntax search system and to apply it to a movie search system. The concepts applied include those in the field of linguistics and collocation, to increase the speed of the syntax search system. First, we must process the keywords in the database by labeling them according to their part of speech. From the results of the process, we will construct a K-gram index and Collocation index. In this proposal we bring out a few examples of common English syntax rules and sentence structures as test models. After the run through, the K-gram index and the Collocation index are compared. We have found that part of the sentence, after having gone through the Collocation index search, has a far smaller sample space that the K-gram index alone, which is to say that the Collocation index is able to find the most correct result from fewer samples, thus minimizing the time cost in Query Match.


POS tagging Lemmatizing Collocation k-gram Indexing


[1] Jane King, “Using DVD Feature Films in the EFL Classroom,",Computer Assisted Language Learning,Vol. 15, No. 5, pp 509-523, 2002.
[2] Erwin Tschirner, “Language Acquisition in the Classroom: The Role of Digital Video," Computer Assisted Language Learning, Vol. 14, No. 3-4, pp 305-319, 2001.
[3] Chin-Hwa Kuo, David Wible, Nai-Lung Tsao, and Chen-Fu Chang, “A Video Retrieval System for Computer Assisted Language Learning,” AI-ED 2005, July 18-22, 2005.
[6] XML Path Language .Http://www.w3c.org/TR/Xpath.
[7] XML Path Language .Http://www.w3c.org/TR/Xquery.
