透過您的圖書館登入
IP:18.216.32.116
  • 學位論文

基於主動式學習之古漢語斷句系統發展與應用研究

Development and Application of An Ancient Chinese Sentence Segmentation System Based on Active Learning

指導教授 : 陳志銘
本文將於2024/07/21開放下載。若您希望在開放下載時收到通知,可將文章加入收藏

摘要


本研究旨在開發支援數位人文研究之「基於主動式學習的古漢語文本斷句系統」,結合主動學習與機器學習演算法,透過人機合作模式降低建立自動化古漢語斷句建立模型時所需的訓練語料,並協助人文學者面對未解讀過的文獻能更有效率的進行斷句判讀作業。為了找出最合適建立「基於主動式學習的古漢語文本斷句系統」的的演算法與特徵模板,本研究設計第一個實驗採用了不同的演算法與特徵模板配合依序文本和主動學習兩種選擇文本方法所建立的斷句模型進行比較。實驗結果發現,條件隨機場(conditional random fields)與三字詞特徵模板在主動學習方法中能有效地進行學習,適合發展「主動學習斷句模式」。 第二個實驗邀請人文專長領域的學者使用「基於主動式學習的古漢語文本斷句系統」進行古漢語文本的斷句判讀,以人文學者各自標註資料建立的斷句模型進行比較分析,並輔以半結構式訪談深度了解人文學者對於本研究發展之系統輔以斷句的使用感受與建議。 實驗結果發現「基於主動式學習的古漢語文本斷句系統」確實能有效學習人文學者的斷句標註資料,並且模型預測能力能基於人機合作而不斷提升。此外,分析過程中發現模型的斷句預測能力與人文學者的標註種類比和相鄰字種類比有顯著負相關。最後,透過訪談結果歸納得知人文學者對於系統操作流程與介面具有正面評價,多數受訪者認為本系統的斷句預測功能在古漢語斷句上能提供有效之輔助功能。未來可考量增加命名實體模型或其他古漢語規則的特徵模板設計,以進一步提升斷句預測能力,也希冀能將發展的系統運用在人文領域教育上,發展為訓練古漢語斷句之數位人文教育平台。

並列摘要


This study aims to develop an “Ancient Chinese Sentence Segmentation System Based on Active Learning” for supporting digital humanities research, combine active learning and machine learning algorithms, reduce training corpora required for establishing an automatic ancient Chinese sentence segmentation model through human-computer cooperation model, and assist humanists in efficient sentence segmentation interpretation when facing literatures which have not been interpreted. To find out the most suitable algorithm and feature template for establishing the “Ancient Chinese Sentence Segmentation System Based on Active Learning”, the sentence segmentation models established by applying different algorithms and feature templates matched with sequential text and active learning are compared in the first experiment in this study. The experimental results reveal that conditional random fields and three-word feature templates could effectively precede learning in active learning that they are suitable for developing an “active learning sentence segmentation model”. Humanities researchers are invited to use the “Ancient Chinese Sentence Segmentation System Based on Active Learning” for the sentence segmentation interpretation of ancient Chinese texts. Sentence segmentation model established by individual humanist’s annotation data are compared and analyzed, and semi-structured interview is used for deeply understanding humanists’ use perception of sentence segmentation with the system developed in this study and suggestions. The experimental results show that the “Ancient Chinese Sentence Segmentation System Based on Active Learning” could effectively learn humanists’ sentence segmentation annotation data and the prediction ability of the model, based on human-computer cooperation, could be constantly promoted. Significantly negative correlations between sentence segmentation prediction ability and humanists’ annotation type ratio and adjacent word type ratio are discovered in the analysis process. According to the interviews, humanists present positive evaluation on the system operation process and interface. Most respondents consider that the sentence segmentation prediction function of the system could provide effective assistance in ancient Chinese sentence segmentation. Naming solid model or other feature template design with ancient Chinese rules could be increased to further promote the sentence segmentation prediction ability. It is also expected to apply the developed system to humanities education and develop the digital humanities education platform for training ancient Chinese sentence segmentation.

參考文獻


中文部分
牛紅廣 (2014)。關於古籍數字化性質及開發的思考。圖書館, (2), 107-108.
王力 (1976)。 古漢語通論 (Vol. 2)。中外出版社。
王丹。(2010)。古籍數字化與古典文學研究。社科縱橫,2,98-99。
李鐸、王毅(2005)。關於古代文獻信息化工程與古典文學研究之間互動關係的對話。文學遺產,1,126-137。

延伸閱讀