在資訊檢索、樣式比對和互動式查詢等相關研究領域中物體特徵的呈現是重要工作之ㄧ,因此本論文提出一套針對專業圖鑑語料中擷取物體特徵與比對之方法。 本研究首先利用台灣野鳥圖鑑中各鳥類特徵描述為訓練語料,並用以建立出該專業領域之特徵框架(key-feature frame)知識表示法;接著決定物體的特徵並依模糊集合概念進行字面向量與模糊化向量編碼;在實驗部份,利用不同語句結構的測試資料進行向量的相似度比對,實驗語料為兩大類,一類是專業書籍之專業語料,另一類為口語描述(包含專業人士與一般民眾)。 實驗結果發現:一般使用者之口語描述查詢結果在前十名(Top-10)精確率(precision ratio) 達到72.5%與包含率(inclusion ratio) 達到73%,顯示本研究方法的確適合於專業語料知識庫之建立及查詢之應用。未來若結合發展中的口語描述結構之表示及比對方法,將可提升口語化查詢之精確率。
The representation of object features is an important task in pattern recognition, information retrieval and interactive query, etc. This paper addresses how to utilize computational linguistics and fuzzy set techniques to automatically establish the knowledge base for semi-structural domain expertise. We use the descriptive sentences in a wild bird illustrated book as the training corpus and established the key-feature frame of the domain expertise. Then, we extract features of the objects and encode lexical and fuzzification vectors according to fuzzy set concepts. In the experiment, we calculate the similarity between training and testing corpus. Which includes descriptive sentences from expertise corpus and oral descriptions. The preliminary results show that the Top-10 precision and inclusion ratio reach 72.5% and 73% respectively. The results encourage us that the proposed approach is suitable for the representation and query of domain expertise. Our future works will be focus on promoting the precision ratio by integrating more sophisticate approaches for the representation and matching of oral description.