透過您的圖書館登入
IP:216.73.216.225
  • 學位論文

應用預訓練語言模型於植物特徵描述的文本向量與分類研究

Research on Text Vectorization and Classification for Plant Traits Description Using Pre-trained Language Models

指導教授 : 楊智凱
本文將於2025/07/31開放下載。若您希望在開放下載時收到通知,可將文章加入收藏

摘要


本研究探討了語言模型在植物的特徵描述,應用在資料檢索系統中的可行性,尤其針對植物領域中資料儲存與檢索的現代化需求。隨著資料量的日益增加,紙本記錄方式逐漸被數位化資料儲存和檢索系統所取代,以提高檢索效率和資料的更新保存便利性。本研究分析了傳統的文本檢索方式與植物特徵檢索無法擴大規模的原因,並透過語言模型資料預處理,以改善植物學相關研究資料處理的效率,進而促進植物學研究和生物多樣性的深入探索。研究方法包括從臺灣植物誌第二版和中國植物誌中收集資料,進行資料重整和擴充增強,接著使用BERT (Bidirectional Encoder Representations from Transformers) 模型進行訓練和評估,並利用其基礎再訓練而成的SBERT (Sentence-BERT) 模型將文字投射到向量空間中,使用餘弦相似度計算向量之間的角度,藉此了解文字之間的相似性,最後將語言模型整合到植物特徵資料的檢索系統中。此研究結果不僅提升植物特徵檢索的效率和廣泛性,也為鑑定植物提供了除影像辨識之外的另一種選擇。

並列摘要


This study examines the application of language models in plant traits retrieval systems, addressing the modern needs for data storage and retrieval within the plant science domain. As data volumes grow, the traditional methods of paper-based record-keeping are gradually being replaced by electronic data storage and retrieval systems to enhance retrieval efficiency and facilitate the ease of data updating and preservation. The analysis also addresses why traditional textual retrieval methods fail to scale in the context of plant traits retrieval. By improving the efficiency of plant data processing through the preprocessing of data with language models, this approach promotes deeper exploration in botany research and biodiversity. The research methodology includes collecting data from Flora of Taiwan 2nd edition and Flora of China, conducting data reorganization and augmentation. To train and evaluate the BERT (Bidirectional Encoder Representations from Transformers) model, and utilize the SBERT (Sentence-BERT) model, which is fine-tuned from BERT, to project text into vector space, then calculate the cosine similarity between vectors to understand the similarity between texts. Ultimately, by integrating language models into the plant traits data retrieval system, this study not only boosts retrieval efficiency and breadth of searches but also provide an alternative to plant identification beyond image recognition.

並列關鍵字

Language models Plant traits retrieval Flora

參考文獻


文香英. 2005. 一種新的植物鑑定工具-交互式檢索表(Interactive Key) 簡介. 仙湖 3(1): 69-74.
呂福原、歐辰雄、曾彥學、王秋美. 2017. 臺灣樹木誌. 中華易之森林植物研究協會. 1271 pp.
李映璇. 2002. 布林邏輯在資料庫查詢上的實務性運用. 國立中央圖書館臺灣分館館刊 8(3): 34-44.
林其永. 2006. 台灣產鳳尾蘚屬以DELTA建構數位資料及支系分析. 國立成功大學. 生命科學系碩士論文. 台南市. 115頁.
林哲宇. 2008. 台灣產馬唐屬穎果形態觀察與幼苗葉部解剖. 國立成功大學. 生命科學系碩士論文. 台南市. 124頁.

延伸閱讀