透過您的圖書館登入
IP:216.73.216.156
  • 學位論文

結合知識圖譜及對比學習以強化RAG技術之法律問答系統

Enhancing Legal Question-Answering Systems with Knowledge Graphs and Contrastive Learning for Robust Retrieval-Augmented Generation

指導教授 : 張志勇
共同指導教授 : 郭經華(Chin-Hwa Kuo)

摘要


法律及、規章及條款字句嚴謹,對一般人而言,不易從字面了解及意涵。然而,對律師而言,工作繁忙,還要額外提供法律諮詢的工作非常辛苦,隨著大語言模型的發展快速,其對語意的了解及回答的生成,已具有一定的水準,發展一套針對法律及規章的問答機器人,有其必要性及可行性,雖然如此,仍具有許多挑戰。這包括非專業者詢問時,其用字與法規的用字不同,難以有效檢索資料。此外高度相似的法律關鍵字,也增加了查詢的難度與錯誤率。 本論文透過知圖譜結合對比學習強化RAG技術,優化法律問答模型。本論文的做法共分4階段,在第一階段中,我們透過建立基礎法規知識圖譜來表現法律規章,並使用SVM模型進行初步篩選加強效率;在第二階段中,我們透過法規QA資料集建立應用情境知識圖譜來表現日常用語;在第三階段中,我們訓練RoBERTa及GCN結合知識圖譜後做對比學習解決法規語義語法的相似性問題;在第四階段中,我們透過Self Instruction增強訓練數據集並訓練大語言模型強化模型對法規的記憶、理解及應用能力。 本研究首次為台灣法律領域提供完整的大語言模型應用框架,並建立專為台灣法律設計的知識圖譜和資料集,融合RAG技術後提高法律問答的精確度和效率,實驗結果顯示,結合RAG的模型後使用效果提升了23.74%。

並列摘要


The wording in legal regulations and clauses is often rigorous, making it difficult for the general public to comprehend the meaning and implications. However, for lawyers, who are already burdened with a heavy workload and additional legal consultation duties, the complexity is even more pronounced. With the rapid development of large language models, their ability to understand semantics and generate responses has reached a significant level of maturity. Therefore, it is both necessary and feasible to develop a legal and regulatory QA chatbot. Despite this, numerous challenges remain. These include the difference in vocabulary used by non-professionals when asking questions, which may not align with the terminology found in legal texts, making it difficult to retrieve relevant information effectively. Additionally, the high similarity between legal keywords increases the difficulty and error rate of queries. This paper proposes optimizing a legal QA model through a combination of knowledge graphs and contrastive learning-enhanced RAG technology. The approach is divided into four stages. In the first stage, we construct a basic legal knowledge graph to represent legal regulations and use an SVM model for preliminary filtering to enhance efficiency. In the second stage, we develop an application scenario knowledge graph based on a legal QA dataset to represent everyday language usage. In the third stage, we train a model using RoBERTa and GCN combined with a knowledge graph for contrastive learning to address the semantic and syntactic similarity issues in legal language. In the fourth stage, we enhance the training dataset through self-instruction and train a large language model to improve the model's memory, understanding, and application of legal regulations. This research is the first to provide a complete large language model application framework for Taiwan’s legal field, establishing a knowledge graph and dataset specifically designed for Taiwan’s legal system. By integrating RAG technology, the accuracy and efficiency of legal QA are significantly improved. Experimental results show that the model’s performance increased by 23.74% after incorporating RAG.

並列關鍵字

RAG Knowledge Graph Contrastive learning

參考文獻


[1] FAWEI, Biralatei. NLP-Based Rule Learning from Legal Text for Question Answering. Asian Journal of Research in Computer Science, 2024, 17.7: 31-40.
[2] BENNETT, Zachary; RUSSELL-ROSE, Tony; FARMER, Kate. A scalable approach to legal question answering. In: Proceedings of the 16th edition of the International Conference on Articial Intelligence and Law. 2017. p. 269-270.
[3] MUNSHI, Amr Abdullah, et al. Automated Islamic jurisprudential legal opinions generation using artificial intelligence. Pertanika Journal of Science and Technology, 2022, 30.2: 1135-1156.
[4] SONG, Dezhao, et al. On the effectiveness of pre-trained language models for legal natural language processing: An empirical study. IEEE Access, 2022, 10: 75835-75858.
[5] MASALA, Mihai, et al. jurBERT: A Romanian BERT model for legal judgement prediction. In: Proceedings of the Natural Legal Language Processing Workshop 2021. 2021. p. 86-94.

延伸閱讀