透過您的圖書館登入
IP:3.141.170.104
  • 學位論文

基於深度學習判定日文句型之研究

Research on Determining Japanese Sentence Patterns Based on Deep Learning

指導教授 : 魏世杰

摘要


對於初學者來說,日文句型複雜而容易混淆。例如,日文助詞有「は」、「が」、「に」和「で」等,同一助詞在不同語意下,常可分類出不同用途的句型。傳統判定句型需要借助詞性之輔助,但詞性本身判定不易,而最近流行的深度學習BERT語言模型具有詞性標記之能力,所以本文想藉由BERT模型在輸入句子之下,直接判定日文句型,同時標記句型判定之關鍵字詞,省下詞性判定之步驟。為了測試BERT標記句型之能力,本文將檢視如下四類基本句型:(1)判定基本句型及標示連續關鍵字詞,以動詞普通型為例;(2)判定基本句型及標示不連續關鍵字詞,以「たり」型為例;(3)判定外表字詞相同卻屬於不同句型之情況,以「のに」型為例;(4)判定及標示特定地點存在人事物相關句型,以「ある/いる」型為例。 實驗結果指出,以關鍵成功指數(CSI)衡量,經過微調訓練後,動詞普通型模型之訓練CSI為98.3%,驗證CSI為98.6%;「たり」型模型之訓練CSI為98.4%,驗證CSI為98.5%;「のに」型模型之訓練CSI為96.3%,驗證CSI為93.2%;「ある/いる」型模型之訓練CSI為98.4%,驗證CSI為78.6%。 上述結果顯示透過深度學習可正確判定本文所測試近八成以上的句型,解決部分正規式不易判定句型問題,其結果能提供未來製作句型分析系統之參考,協助學習者更加熟悉日文句型之運用,增長自我學習等能力。由本研究可看出深度學習的BERT語言模型用於句型判定及標示之潛力深厚,值得進一步探索,以達成輔助日語初學者之目的。

並列摘要


For beginners, Japanese sentence patterns are complex and confusing. For example, Japanese particles include "は"(ha), "が"(ga), "に"(ni) , "で"(de), and etc. The same particle can often be classified into different sentence patterns with different semantics. The traditional detection of sentence patterns requires the assistance of parts of speech, but it is in itself a difficult task to determine the parts of speech. The recently popular deep learning BERT language model has the ability to tag parts of speech, so we want to use the BERT model to directly determine the Japanese sentence pattern. Given an input sentence, the BERT model will detect a sentence pattern and mark those critical keywords of the pattern, thus bypassing the step of tagging parts of speech. In order to test the ability of BERT to detect sentence patterns, this study will examine the following four types of patterns: (1) basic sentence patterns with consecutive keywords, taking the plain present tense verb as example; (2) basic sentence patterns with non-consecutive keywords, taking the “たり”(tari) pattern as example; (3) different basic sentence patterns with the same surface form, taking the “のに”(noni) pattern as example; (4) sentence patterns related to the existence of men or things in a place, taking the “ある/いる”(aru/iru) pattern as example. The experimental results show that in terms of the critical success index (CSI), after fine-tuning, the verb model has a CSI of 98.3% in training, and 98.6% in validation; the “たり”(tari)model has a CSI of 98.4% in training, and 98.5% in validation; the “のに” (noni)model has a CSI of 96.3% in training, and 93.2% in validation; the “ある/いる”(aru/iru) model has a CSI of 98.4% in training, and 78.6% in validation. It is verified that through deep learning, more than 80% of sentence patterns in test can be correctly determined. It can also solve the problem that some sentence patterns are hard to detect by regular expressions. This experience can be used for building an analysis system of Japanese sentence patterns, thereby enriching the self-learning ability of Japanese language learners. The study reveals that the BERT language model has great potential for sentence pattern detection and tagging. It is worthy of further exploration to help more beginners in learning Japanese.

參考文獻


中文文獻:
王曉龍、關毅(2005)。計算機自然語言處理。清華大學出版社。
毛文偉(2012)。日語自動詞性賦碼器的信度研究。外語電化教學CAFLE。145(5)。
朱德熙(1982)。語法講義。北京:商務印書館。
林懷逸、劉箴、柴玉梅、劉婷婷、柴艷傑(2019)。基於詞向量預訓練的不平衡文本情緒分類。中文信息學報,33 (5),132-142。

延伸閱讀