檢索增強生成適應性指標研究

大型語言模型（LLM, large language model）持續快速發展，為彌補模型的不足與因應不同情境下的使用需求，檢索增強生成（RAG, Retrieval-Augmented Generation）等模型已被成熟運用。然而，大型語言模型如何處理「檢索文檔」（Retrieved Documents）尚屬於一個「黑盒子」，其決策過程封閉且不透明，限制了解釋性與可追蹤性。本論文提出了檢索增強生成適應性指標（RAG Adaptability Metric），藉由提示工程（Prompt Engineering）使生成模型（Generation Model）在生成回答時同時輸出支持文檔（Supporting Documents），讓模型同時指出其認為的產生回答的依據。本論文提出了檢索增強生成適應性指標（RAG Adaptability Metric），通過提示工程（Prompt Engineering）使生成模型（Generation Model）在生成回答時同時輸出支持文檔（Supporting Documents），讓模型指出其認為的產生回答的依據。研究中也發現，生成模型在發現檢索文檔不足以支持生成回應時，會轉而採用生成模型訓練過程學習的知識，即記憶化參數來生成回應內容，在部分情境中，記憶化參數生成的回應是可接受的，但同時存在生成幻覺（Generative Hallucination）的風險，因此，本研究藉由取得支持文檔與提問、檢索文檔、回答間的內容相關性產生檢索增強生成適應性指標，賦予檢索增強生成模型，生成過程的解釋性與可追蹤性。研究結果顯示，檢索增強生成適應性指標適用於多種檢索方法與不同的生成模型，在識別潛在有風險的生成結果上表現良好，並且可以協助辨別大型語言模型是否依照檢索文檔生成回答，還是發生「拒絕回答」或「自行產生回答」的情境，提供調適模型或訓練模型的參考依據。

關鍵字

檢索增強生成；檢索增強生成適應性指標；檢索增強生成提示工程

並列摘要

Large Language Models (LLMs) are continuously advancing at a rapid pace. To address the shortcomings of these models and cater to various contextual usage needs, models like Retrieval-Augmented Generation (RAG) have been maturely utilized. However, how LLMs handle 'retrieved documents' remains a 'black box', with a decision-making process that is closed and opaque, limiting explainability and traceability. This paper proposes the RAG Adaptability Metric, which uses prompt engineering to enable the generation model to output supporting documents when generating responses, thus allowing the model to indicate the basis for its responses. The study found that when the retrieval documents are insufficient to support the generated response, the model tends to rely on the knowledge learned during its training process, i.e., memorized parameters, to generate the response content. In some scenarios, responses generated from memorized parameters are acceptable, but there is a risk of generative hallucination. Therefore, this study introduces the RAG Adaptability Metric by obtaining the relevance between supporting documents, queries, retrieved documents, and responses, enhancing the explainability and traceability of the RAG process. The results show that the RAG Adaptability Metric is applicable to various retrieval methods and different generation models, performing well in identifying potentially risky generated responses. It can help distinguish whether the LLM generates responses based on retrieved documents or if it occurs in situations of 'refusal to respond' or 'self-generated responses', providing reference for adjusting or training the models.

並列關鍵字

RAG ； RAG Adaptability Metric ； RAG Prompt Engineering

參考文獻

[1] A. Asai, Z. Wu, Y. Wang, A. Sil, and H. Hajishirzi. Self-rag: Learning to retrieve, generate, and critique through self-reflection, 2023.

Google Scholar

[2] J. Chen, H. Lin, X. Han, and L. Sun. Benchmarking large language models in retrieval-augmented generation, 2023.

Google Scholar

[3] S. Es, J. James, L. Espinosa-Anke, and S. Schockaert. Ragas: Automated evaluation of retrieval augmented generation, 2023.

Google Scholar

[4] Y. Gao, Y. Xiong, X. Gao, K. Jia, J. Pan, Y. Bi, Y. Dai, J. Sun, and H. Wang. Retrieval- augmented generation for large language models: A survey, 2023.

Google Scholar

[5] Z. JI, N. LEE, R. FRIESKE, T. YU, D. SU, Y. XU, E. ISHII, Y. BANG, W. DAI, A. MADOTTO, and P. FUNG. Survey of hallucination in natural language generation, 2022.

Google Scholar

國際替代計量

檢索增強生成適應性指標研究

全文下載

主題瀏覽