透過您的圖書館登入
IP:18.117.196.217
  • 學位論文

應用變形器模型於雙向文本改寫-以文言白話對照版史書為例

Using the Transformer Model for Bidirectional Text Rewriting - The Case of History Books With Aligned Classical and Modern Texts

指導教授 : 魏世杰
共同指導教授 : 周清江(Chi-Chang Jou)

摘要


中文在閱讀與寫作上,因為歷史的變遷與發展,書面語標準與風格經歷數次大幅變化,造成現代人的文言文理解能力較為缺乏。為了減少文言文與白話文之間的理解偏差,協助人們理解文言文,本研究選擇使用文言文與白話文的雙向文本改寫作為主題,以自然語言處理技術處理平行語料,利用變形器的深度學習模型生成對應的語句,輔助人們認識兩種文體之改寫規律。最後,以文字生成評估指標BLEU及ROUGE評估生成語句。根據實驗結果,本文作法有希望用於文言與白話的改寫用途,供古文學習,歷史文獻解讀之輔助。

並列摘要


For reading and writing, the Chinese written form has undergone several drastic changes in its standard and styles due to historical changes and developments. It is hard for modern people to comprehend classical texts. In order to reduce the comprehension gap between classical and modern texts and help people understand classical texts, this study chose the topic of rewriting classical and modern texts from each other. Natural language processing techniques were used to process the parallel corpora and build a deep learning model of transformer to generate corresponding sentences such that people get to know the rewriting rules between both genres of text. Finally, the generated sentences were evaluated by the text generation evaluation metrics BLEU and ROUGE. Based on the experimental results, our approach to rewriting in learning classical and modern texts shows great potential for use classical texts and understanding historical documents.

參考文獻


[01] 王曉坡 (2018) ,基於有限語料的文言文神經網絡機器翻譯研究,哈爾濱工業大學碩士論文。
[02] 李昀燕(2011),明清章回小說的分詞準則及命名實體標註,第十三屆漢語詞彙語意學研討會(CLSW2012), 頁16-21。
[03] 季紫荊,陳子睿,韓立帆,王鑫(2020),數位人文視域下面向歷史古籍的資訊抽取方法研究,大數據,2022, Vol.8: 26-39.。
[04] 胡韌奮,李紳,諸雨辰 (2021),基於深層語言模型的古漢語知識表示及自動斷句研究,中文資訊學報,35(4) ,8-15。
[05] 劉中祺 (2022),基於 Transformer 的文言文機器翻譯,華東師範大學碩士論文。

延伸閱讀