透過您的圖書館登入
IP:18.219.156.56
  • 學位論文

利用基因演算法輔助生物文件分類-以菇菌及毒蕈資料為例

Using Genetic Algorithms to Assist the Classification of Biological Documents: An Example of Mushroom and Toadstool Data

指導教授 : 呂威甫 朱彥煒
若您是本文的作者,可授權文章由華藝線上圖書館中協助推廣。

摘要


自然界中,生物的屬性相當雜,造成生物相關的文件分類不易,因此本研究運用基因演算法來輔助判定生物文件之分類。在生物文件的分類上,一開始我們先整理已知的生物詞庫與該領域專家所提供的詞組,建立該領域一般性的專業詞庫。但是要做到自動分類,一般性的專業詞庫是不夠的。為了增加專業詞庫的量以提升分類的準確度,我們採用基因演算法學習分析出更多有助於生物文件分類的專有名詞。而在適存度的設計上,我們考慮該詞在專業詞庫及一般詞庫出現的頻率及該詞的長度。本研究以農委會菇菌與毒蕈分類的文件為測試資料,在交叉驗證後的結果,只考慮一般性的專業詞庫的分類準確度為56%,而我們所提的方法可以將正確率提升至78%。藉由我們提出的分類架構,相信可以應用在其它性質的文件上。

並列摘要


In natural, properties of living organisms are complex, causing difficulties in the classification of biology related documents. Therefore this research utilizes genetic algorithms to aid in classification of biology document. On the classification of biology document, first we arrange known general database of biology related terms and then add terminology provided by experts in a specific field to create a general specialized terms database. However, a general specialized term database is insufficient for achieving the goal of auto-classification. Thus, we employ genetic algorithms to analyze and add more specific terms that can aid in the classification of biology documents, resulting in a detailed specialized terms database. For the fitness parameter, we consider a term’s frequency of appearance in the detailed specialized terms database versus a common terms database and the length of the term. This research used Council of Agriculture’s documents on edible and poisonous mushrooms as test data. After cross validation of the result, The accuracy of classification is 56% by only using general specialized term database, however, by using our approach can reach 78% accuracy. We believe that other types of documents can be classified based on the classification procedure presented.

參考文獻


[14] Maron, M. E., “Automatic Indexing: An Experimental
International ACM-SIGIR Conference onResearch and Development
in Information Retrieval, pp. 42–49, 1999.
Kaufmann, San Mateo, CA, 1993.
Processing of Chinese and Oriental Languages, Vol. 5, No. 2,

被引用紀錄


吳育陞(2010)。植基於適應性特徵之影像查詢系統與影像分類研究〔碩士論文,國立臺中科技大學〕。華藝線上圖書館。https://doi.org/10.6826/NUTC.2010.00017

延伸閱讀